Why IoT Failed & What Pete Warden Is Doing About It

By Om Malik

Pete Warden is the kind of fellow you would run into at the old eTech conference or at Foo Camp—an old-school engineer who is equally at home hacking hardware and writing code in the latest languages. It was at an eTech conference that I first met Warden. I can’t remember the context, but we have been in touch ever since. He would come to my events (when I hosted them), and I would email him to confer on some story ideas I was contemplating.

We often swapped emails about the rise of cheap-and-cheerful chips such as the Arduino. We marveled at the potential of the Internet of Things. He eventually sold his first startup to Google and went to work there. We lost touch. His writing about technical issues, however, has been a big influence on how I think about connected devices and their utility. He is a man who knows the Internet of Things and its foibles quite well.

After a seven-and-a-half-year stint at Google and working on TensorFlow, he left to start a new company, cleverly named Useful Sensors. This is Warden’s fourth startup. Manjunath Kudlur, another TensorFlow alumnus, is his co-founder in this new company. Useful has created a new speech-to-text model called Moonshine. It is five times faster than Whisper, the current darling of voice-to-text models.

They recently launched Torre, an instant language translator that runs entirely locally, for speed and privacy.This makes it very useful in scenarios that are pretty low-tech. For instance, a Spanish speaker can talk to an English-speaking doctor in their own language, and the model translates it into English. This facilitates a more accurate conversation. In comparison, something like Google Translate needs to do all that in their cloud, and it creates a delay. Warden thinks of these unusual scenarios because he spent a lot of time at Google and understands the importance of “voice” as an interface to devices.

I recently caught up with Pete to talk about Useful (and Moonshine). We discussed the aging Siri, Alexa, and Hey Google. During our conversation, he talked about the reasons why IoT failed, and how AI and small language models (like his Moonshine) could actually create a world of intelligent devices.

Honestly, he had me at “toaster” and a perfect grilled sandwich, but you should read on to learn how Pete is thinking about the connected future. He knows what he’s talking about.

Om Malik: What is this thing you’re showing me?

Pete Warden: This is our product. It’s doing simultaneous English and Spanish translations. We’re piloting it with nonprofits because we want to help people, especially those who don’t have English as their first language, interact in settings like parent-teacher meetings. Often, kids translate for their parents, which is also common in doctor’s offices, local government, and with paramedics. Existing tools like Google Translate are outdated—they’re like typing on a keyboard where you press a key, and two seconds later, something shows up. It’s not conversational.

Om: Why hasn’t Google improved this?

PW: Google Translate isn’t a priority. It doesn’t tie directly into their revenue streams, like search, so it gets minimal resources compared to their core products. It’s a nice-to-have feature for them.

Om: Don’t they have the imagination to see this as a wedge into the age of AI, where tools like Gemini could become more useful for non-English speakers?

PW: I hope they realize that. So far, they haven’t. It seems like the company is no longer run by nerds but by MBAs, and a lot of decision-making is reactionary, driven by short-term financial goals. For example, Google pioneered transformers but didn’t resource them properly or release open-source models. When OpenAI gained traction, Google pulled back even further instead of leaning in, like Meta did with open source. It feels like a lack of leadership.

Om: So, your product is running locally, right?

PW: Yes, entirely locally. Right now, it’s running on a tablet because it’s a nice form factor, but it can run on phones or any device.

Om: How does it achieve such low latency?

PW: It’s because it runs locally. We’ve developed our own speech-to-text models from scratch, which are as accurate as OpenAI’s Whisper but about five times faster. It’s called Moonshine.

The broader vision is to make voice a standard way to interact with devices. Voice interfaces should be private and fast. Nobody worries about connecting a keyboard to the internet, yet current voice interfaces often rely on the cloud. This adds unnecessary latency and raises privacy concerns. Voice should be as natural as talking to someone in real life, but systems like Alexa, Siri, and Google Assistant haven’t delivered that experience.

Om: Why do you think that is?

PW: Companies like Google and Amazon built voice tools to serve their business models. At Google, everything revolved around driving search queries to the cloud. I worked on Google Assistant and saw that eliminating the wake word or enabling local processing didn’t align with their metrics. At Amazon, Alexa was tied to shopping and Prime subscriptions. User needs were secondary.

Om: What about devices like microwaves or lamps? Can your models work there?

PW: That’s the goal. We’re designing models small enough to run on microcontrollers, aiming for under 10MB. Imagine every device with a switch or button having a voice interface. The models wouldn’t necessarily perform translation but would handle basic commands.

Om: Your product currently does translation. How does that work?

PW: Right now, it’s focused on text translation because voice playback can be tricky in conversations. For example, if it’s reading back translations in real-time, it might interrupt the speakers. Instead, we’ve designed it to work like subtitles—two devices back-to-back on a desk showing translations in text. It’s approachable and intuitive. A Babelfish-like device is technically feasible but needs better real-world integration.

Om: What about microcontrollers? Why worry about that when we’re moving toward powerful, low-cost chips with constant connectivity?

PW: Connectivity has energy costs that haven’t decreased much, unlike computing. Even Bluetooth Low Energy (BLE) consumes significant power for continuous connections, which drains batteries. Doing things locally avoids this. Additionally, connectivity often requires user authentication—Wi-Fi passwords, SIM cards—which creates barriers.

Om: But we’re on the cusp of energy harvesting and more capable, low-power chips. Doesn’t that open new possibilities?

PW: Energy harvesting is still limited to microwatts, while most chips consume watts. That’s why I’m focusing on ultra-low-power embedded chips. They’ll be the first to benefit from energy harvesting.

Om: Do you think small models everywhere will redefine AI?

PW: Yes. Small models will power devices that better understand us and our needs. For example, lights that learn your habits or TVs that pause automatically when you leave the room. These models don’t need to be massive—they just need to be good at specific tasks.

Om: This ties into your critique of IoT. Why do you think IoT failed?

PW: The problem was that IoT products were driven by business goals like recurring revenue, not user needs. Companies focused on adding internet connectivity to devices without offering meaningful functionality. For example, a connected dishwasher doesn’t solve any real user pain points—it’s just connected for the sake of it.

Om: Let’s go back to the Internet of Things (IoT). You’ve mentioned that the Internet itself is the reason the IoT revolution didn’t happen. Why do you think that was the case? Do you think all these companies, whether they were making switches or power strips, should have been putting the Internet in those devices? What went wrong in your mind? It seems everybody wanted to recreate that scene from The Big Bang Theory—turning the lamp on via the Internet.

PW: Yes, exactly.

Om: But in your mind, what went wrong in the thinking around this? What’s the big lesson for AI that we can take from IoT’s failures?

PW: For me, the number one issue is that the executives allocating resources for IoT products were primarily focused on generating recurring revenue. That was their starting point. They looked at tech companies, saw how much money they were making, how highly valued they were, and how they were getting people to pay $10 a month for services. These executives thought, “Okay, we want that too.”

At the same time, technologists saw the clear adoption of the Internet in PCs and then phones and believed they could replicate that by connecting other devices to the Internet. But at no point were product designers, user experience professionals, or proxies for actual users part of the conversation.

As a result, resources were allocated, projects were started, and we ended up with poorly thought-out products. One of my favorite examples is a connected dishwasher. I talked to an engineer who worked on one, and he said that after six months of development, the team still couldn’t come up with a single compelling reason why a user would want their dishwasher connected to the Internet. There was no functionality that actually improved users’ lives or made any sense.

So, it was a very top-down approach. You had executives saying, “We want subscription revenue,” and technologists saying, “We saw this work with PCs and phones, so it’s bound to work here.” This combination steamrolled anyone from the user experience side who could have said, “Wait, this doesn’t solve a real problem.”

Om: Was there any IoT product that got it right, in your opinion?

PW: One example that people don’t think about much is TVs. They’re all ubiquitously connected now, and users just expect their TVs to be Internet devices. That’s a success story—though it comes with caveats.

Om: Caveats like terrible interfaces and razor-thin margins?

PW: Exactly. Plus, they’re a privacy nightmare and a security nightmare. I’ll say something you might not, given your position with your new company—I don’t expect companies like TCL, Belkin, or others in the consumer electronics space to prioritize user privacy, software efficacy, or security. They’re operating on such low margins that they simply don’t care.

Even with Wi-Fi router companies, we see the same issues. Whether it’s Eero or others, the standards for user experience and security are inconsistent. I personally hate that I have to use a Google router, but at least I know they’re putting effort into security. Google has the money and reputation to uphold, whereas many other consumer electronics companies are under-resourced.

Om: And they barely have any software engineers or user experience designers.

PW: Exactly. That’s why so much of it is terrible.

PW: You know, devices will likely get more features, like improved voice input, and they’ll benefit from advancements in this area. But I really believe AI can make a massive difference in what you described as the invisible interface.

This idea revolves around all the devices in our lives understanding us better and acting on that understanding to help us. It’s still unclear exactly what that will look like, but it will be fundamentally different. It won’t be like traditional computing—it’ll feel much more like interacting with another person. In fact, it might not even involve talking. Instead, these devices will simply do what you’d expect or hope they’d do, as if they were intelligent.

Om: How does this change traditional computing? Right now, we have devices with kernels, layers on top, DLLs in Windows, or service calls in iOS and Android. Then there’s the UX layer on top. What happens to those service calls and DLLs in this new AI-driven environment? What becomes more relevant?

PW: In computing, we’ve been piling new things on top of old systems. For example, the way computers currently communicate with each other is through rigid, structured protocols like Wi-Fi or Ethernet.

What’s exciting about an AI-driven future is that it’s not just about devices understanding us; it’s also about devices understanding each other.

For instance, if a smoke alarm goes off, a toaster that knows it’s burning something could pop up the toast and notify the smoke alarm, “Don’t worry, it’s not a house fire.” That could prevent the alarm from making an ear-splitting noise that sends everyone into a panic.

Or let’s say you’re in a room and you look at a lamp and say, “On.” There’s a speaker next to the lamp, and it might think you’re talking to it instead. In this case, the speaker and lamp would need to negotiate and figure out who you’re addressing.

Om: That would require a level of communication between devices we don’t have right now.

PW: Exactly. Devices will need to figure out the context of what’s happening around them using AI, even if they’re from completely different companies. Current protocols, like Matter, might capture some of these interactions, but AI will add a layer of contextual understanding. For example, devices will recognize that there are people in the kitchen, the toaster is active, and this isn’t an emergency.

Another example is when multiple devices, like cameras or sensors, detect something unusual, such as someone breaking into a house. The devices could work together to determine that something anomalous is happening and respond accordingly.

Om: It’s fascinating how this shifts the role of devices.

PW: Yes, and much of this will rely on input from AI sensors or augmented reality systems that describe the world around the devices.

Om: Yesterday, I saw some skepticism about Apple’s approach to integrating screens, Apple Intelligence, and Siri. But I was thinking, maybe this is just Apple’s way of entering homes and selling more sensors and devices.

PW: That’s a valid perspective. It’s a strategy to create a foothold for more sensors and intelligent systems in people’s homes.

Om: Thanks, Pete, it was great to catch up with you and get a glimpse into the future as you see it.

Editor’s Note: No newsletter will be published next week as we will be taking a break for the Thanksgiving Weekend.

Start the conversation:

KenG

Nov. 24, 2024, noon

You guys are so right about IoT. I love the internet as much as the next guy (probably more), but connectivity on a recently acquired dishwasher and microwave is mind-numbingly worthless. Also, the requirement to manage network-connected devices through a 3rd party server in the cloud (and an account and possibly app) is infuriating and unnecessary (while introducing added security risk through the server and the open port). Besides being a kludge, it's a horrible business model most of the time. Having the option to remotely manage a router is great, but requiring remote management through the manufacturer's server is a deal breaker that serves neither the user or the manufacturer.

I can go on complaining about how horrible IoT has been, but you have done a good job of that, so I have to at least add to the chorus.

And I think Pete has hit the nail on the head with his assessment of google troubles (they sound like Intel after their last founder retired) & the terrible product decisions made by all those IoT product companies. I look forward to seeing Moonshine in real products.

Reply Report

Jud Valeski

Nov. 24, 2024, afternoon

on the connectivity challenge front, https://www.helium.com seems to have made significant inroads on the low-bandwidth front (which IoT'ish devices could/should use), but I can't tell how many device manufacturers are actually using it.

while I agree that current IoT connectivity frameworks (cloud accounts and server interactions) are overkill, I do actually like knowing when my home appliance do certain things, and being able to put them in vacation mode when I'm away from the house and forgot to do so when there.

Reply Report

Nov. 23, 2024, 7:30 p.m.

Why IoT Failed & What Pete Warden Is Doing About It

Crazy Stupid Tech