What DeepSeek Means for Everyone

By Om Malik

The ink had not even dried on the whopping $500 billion AI Project Stargate when a small meteor hit the planet “AI.” DeepSeek, a company associated with High-Flyer, an $8 billion Chinese hedge fund, released an open-source AI reasoning model, DeepSeek R1. It was immediately seen not only as an attack on OpenAI, the US’s AI giant, but an attack on American innovation writ large. The launch of R1 put renewed focus on DeepSeek’s news from December when it claimed that it had achieved OpenAI-like capabilities for a mere $6 million for its V3 model.

And just like that, DeepSeek changed the AI narrative from one requiring big, bad data centers, billions of dollars in investments, and access to hundreds of thousands of the latest and greatest Nvidia chips to one requiring a fraction of that investment. It raised questions about the economics of cutting-edge foundational models. And most certainly, it made many rethink their preconceived ideas of what the trajectory of “AI” was.

Specifically, DeepSeek said that with merely $6 million, it had trained its AI model for 3 percent to 5 percent of what American AI companies spent to do the same thing. Never mind the fact that this overstates DeepSeek’s total spend. The DeepSeek-V3 model was developed using a cluster of 2,048 Nvidia H800 GPUs, with each of those H800 units costing about $30,000 each, totaling just over $61 million. The $6 million for the final training run (excluding all other expenses) had the right impact.

How could DeepSeek be almost competitive with OpenAI and Anthropic? After all, large players, including OpenAI, Anthropic, and Google, had spent billions on their infrastructure and models. The hyperventilation of Silicon Valley’s chattering classes was remarkable to watch. The stock market gasped too — nearly a trillion in stock market valuation (mostly in Nvidia stock) vanished. Even the smartest AI leaders suddenly seemed to be talking gibberish.

I think the hysteria is hugely overblown. And I'm not alone. If you read the original paper, which I suspect many (especially in the investment community) have not, two things are clear: DeepSeek has done something clever that will help lower the cost of the AI revolution for everyone, and they've shared how they've done it. They took already available ideas and applied a "manufacturing mentality" to make everything better, faster, cheaper, and more widely available. That's good, not bad.

Sure, if a Chinese company had developed a new proprietary AI chip and used that in a proprietary way to cut AI development costs by a factor of 10, that would be reason to be concerned. But that doesn't seem like what's going on.

Indeed, one of the biggest worries among AI startups for the past year has been the price and availability of GPU processing. If this innovation lowers the costs and increases the availability of GPU processing, that's good for everyone. Amazon Web Services and Microsoft, in fact, almost immediately announced DeepSeek AI offerings.

“Initial versions of any software product are crude. They are built for ease of development, not performance or economy of processing,” said Beau Vrolyk, a longtime infrastructure hardware and software investor and executive at Warburg Pincus, Silicon Graphics, Xerox, and ARETE. “Once the fundamental algorithms are better understood, astonishing efficiency can be gained by rewriting the code. What is just starting to happen is the optimization phase, where the various bits of code are rewritten to run either faster or using fewer resources.”

Pete Warden, whose company is focused on small, highly efficient AI models, pointed out that, at least in the investment community, how much money you spent on GPUs was seen as a moat. Hopefully, after DeepSeek, “the rules of software engineering will start asserting themselves,” Warden said. “My hope is that it actually drives people to think more about efficiency across the board.”

What’s clever about what DeepSeek has done is that they’ve figured out a way to squeeze out more performance from Nvidia’s chips by going a level deeper and tinkering with how the chips work. In short, this is better engineering, and it has allowed them to overcome the constraints imposed on them due to US chip controls. In doing so, they have shown the world a new approach to building AI models much more cheaply.

With their DeepSeek R1 model (which competes with OpenAI’s o1, for example), the company has taken a step further by using a combination of two techniques — retrieval and generation. It is as if, instead of relying solely on memory, you can check your notes or Google things as you work on a problem to provide a better answer.

DeepSeek started by questioning the conventional wisdom of big AI models. Colloquially speaking, large language models used by current AI leaders work like one giant brain, which tries to do everything at once. It sure is powerful, but takes a lot of compute power and energy. And it is expensive. The Chinese engineering team decided to use an approach called Mixture of Experts (MoE). So instead of putting the mega brain to work, they decided to pick a few specialists to do the job. Instead of activating the whole brain every time, it picks the relevant specialists for the job at hand.

For example, if you ask the model a math question, you get a small army of math tutors. Geography questions get geography experts. DeepSeek has a brain (671 billion parameters) that is close enough in size to the big models such as those from OpenAI and Anthropic, but only a partial number (37 billion) are activated at a time. As a result, they can produce results to queries that are faster, cheaper, and thus scalable. That is why they could get away with using only 2,048 Nvidia chips that are not on the bleeding edge.

While it is not even remotely close to perfect, DeepSeek R1 is a big step toward modular and adaptive AI. This is a way to bring efficiency and speed into AI, especially when it comes to making it available in real-time for real applications. For instance, if you ask an AI that doesn’t use the retrieval technique who won the 2024 T20 Cricket WorldCup, and it was only last updated in December 2023, it wouldn’t give you the right answer. With retrieval, it can search a live cricket database and give you a more current answer.

Historically, chip costs decrease by a factor of three every year, and algorithmic improvements have accelerated from 2x to about 4x per year. So, an 8x reduction in cost per year is expected. If DeepSeek trained their model for $6 million, that’s comparable to a $48 million training cost a year ago.

“Algorithmic efficiency gains are available to anyone who seriously looks for them,” said Sean Gourley, founder of Primer, an AI company based in San Francisco. Gourley, who has spent over a decade and a half in the weeds of natural language processing, machine learning, and algorithms, pointed out that algorithmic efficiency improvements and reductions in compute costs have been ongoing.

If anything, DeepSeek “confirms that algorithmic improvements continue to outpace expectations,” and “we’re still nowhere near saturation in AI performance,” Gourley added. DeepSeek’s approach means that now anyone can start thinking about innovative architectural design and efficient training strategies; it’s possible to develop high-performing AI models without relying on the most advanced and expensive hardware.

DeepSeek’s novel approach to its competition reminds me of two previous instances in the past when an upstart decided to upend the established order by taking a different approach, not only by optimizing hardware but also by rethinking the software and letting go of legacy approaches.

In 1998, during the first Internet boom, Cisco Systems announced its high-capacity router, the GSR 12000. The demand was through the roof. That same year, a tiny upstart, Juniper Networks, launched its own high-capacity router, which didn’t have the impressive numbers of Cisco’s GSR router.

Juniper’s M40 router managed to achieve 40 Gbps throughput versus Cisco’s promised but undelivered 60 Gbps. While Cisco was trying to go upstream with its enterprise routers, Juniper focused purely on Internet traffic. It began with special chips optimized for routing. They wrote their own modular operating system, and the router did a few things very well. Juniper separated routing of data packets from packet forwarding, which made it very efficient. It used distributed processing and many other hardware and software-level improvements that allowed the company to outgun Cisco and grow very fast.

If you squint a little, DeepSeek too has done something similar — built custom training architecture that is more efficient and figured out how to better utilize memory during inference. It has allowed the company to use less compute and better parameter utilization. While Juniper's disruption was hardware-centric, DeepSeek's is primarily in software architecture and training methodology. Still, the two show how by focusing on architectural innovation, companies can carve out their own path and even leapfrog others.

A few years after Juniper’s launch, Google took a fresh approach to building out its infrastructure. Instead of buying expensive super servers from companies like HP, Sun, and IBM, the company started with cheap-and-cheerful servers, removed unnecessary components, and added custom power supplies for efficiency. It used commodity parts but focused on custom-engineering the integration. They built their own network fabric and used software-defined networks and distributed system techniques to find the cheapest way to serve search results to millions of users. Similarly, DeepSeek has rethought fundamental architectures to achieve better efficiency, rather than just incrementally improving existing approaches.

In the end, Google’s approach to infrastructure led to what we view as modern cloud computing and how companies approach infrastructure scaling, just as Juniper helped create the modern high-performance networking industry. Juniper’s idea to separate the control/data plane and jumpstart software-defined networking is now commonplace in every network.

Our petabyte world doesn’t function without that one original rethink of software. Similarly, what Google did two decades ago has helped us scale the internet to what it has become today. Google’s approach to infrastructure has resulted in new data center designs and helped influence and create the Hadoop, Kubernetes, and NoSQL movements.

These two examples remind us that the real impact of a new approach takes time to manifest. DeepSeek’s optimizations are a good reminder that we are still in the early phase of the AI journey, and despite all the hoopla and hype, a lot of work needs to be done, and as a result, many opportunities to innovate remain.

While DeepSeek’s savings on training might look like a big deal today, it is the savings on inference, which has to happen at runtime, will be the key. It will drive major demand as the whole planet moves over the coming years to take advantage of this new power.

“In the long run, inference is going to take a huge amount of compute and cost, and humanity will need boatloads of it to improve productivity and solve problems,” Bill Gross, CEO of ProRata, told CrazyStupidTech in an email. “Just like we drove the cost of electricity down with massive efficiency gains from James Watt and more going forward, and that was hugely valuable, these efficiency gains that DeepSeek has shown will trigger the whole market to learn to do more of the same.”

Gross is right.

AI is shifting from being pure technology to productization. It is time to stop with the AGI speculation and instead focuson practical applications. Many leading AI companies continue to push the AGI narrative to keep investment capital flowing into their coffers—largely because their business models remain focused on selling API access. The real transformation will be in how AI is put to work. Just as the internet itself evolved from the post-dot-com bust era into search, work, e-commerce, and social, AI too will have meaningful impacts in areas that need help, such as material science, semiconductors, and medicine.

The truth emerging from DeepSeek's approach is that you don't need the absolute cutting-edge, most expensive AI models for most real-world applications—you need efficient, well-engineered products that can solve complex problems. That means OpenAI and Anthropic will have to quickly figure out how to become product-first companies and stop focusing solely on an AGI future.

Additional reporting by Fred Vogelstein.

Start the conversation:

nl@startupsidekick.co

Feb. 3, 2025, morning

You right, the deepseek hype is overblown. SemiAnalysis has the best technical analysis https://semianalysis.com/2025/01/31/deepseek-debates/#

Reply Report

dineshbvadhia@hotmail.com

Feb. 3, 2025, morning

The analogies are misplaced because Juniper didn't open source their router software, and Google didn't open source their search engine software.

Deepseek R1 is available under the permissive MIT licence, which allows anyone to use, modify, and commercialise the model without restrictions. Millions of developers across the world will tinker and adapt R1 to create applications. This is the big kahuna.

Reply Report

danablankenhorn@gmail.com

Feb. 3, 2025, afternoon

There's nothing new here, but the U.S. AI industry had a conniption. Dividing tasks into functions is simple parallel processing. That's what DeepSeek did. It parsed the LLM task into pieces, and built expertise around each separate piece. Then it gave what it did away. I think the giving away, the open source bit, is what really freaked people out.

It's amazing in retrospect how quickly the innovation around OpenAI became a land grab, and how quickly open source shut all that down, through DeepSeek. Someone give Eric Raymond a cigar.

Reply Report

Feb. 2, 2025, 6 p.m.

What DeepSeek Means for Everyone

Crazy Stupid Tech