Finance News

How DeepSeek, a small Chinese artificial intelligence startup, shocked Silicon Valley

A small Chinese artificial intelligence lab shocked the world this week by revealing the technical recipe for its cutting-edge model, turning its reclusive leader into a national hero and defying U.S. attempts to thwart China’s high-tech ambitions.

DeepSeek, founded by hedge fund manager Liang Wenfeng, released its R1 model on Monday, explaining in a detailed paper how to build a large language model on a bootstrap budget that can learn and improve automatically without human supervision.

U.S. companies such as OpenAI and Google DeepMind have pioneered the development of inference models, a relatively new field of artificial intelligence research that attempts to match models to human cognitive abilities. In December, San Francisco-based OpenAI released a complete version of its o1 model but kept its methodology secret.

DeepSeek’s R1 release sparked a heated debate in Silicon Valley over whether better-resourced U.S. artificial intelligence companies, including Meta and Anthropic, can defend their technological superiority.

Meanwhile, Liang has become a focus of national pride at home. This week, he was the only AI leader chosen to attend a public meeting of entrepreneurs alongside Li Qiang, China’s No. 2 man. Entrepreneurs are told to “concentrate on breakthroughs in key core technologies.”

In 2021, Liang began purchasing thousands of Nvidia graphics processing units for his artificial intelligence side project while running his quantitative trading fund, High-Flyer. Industry insiders believe this is an eccentric act by a billionaire looking for a new hobby.

“When we first met him, he was a very nerdy guy with a bad haircut talking about building a 10,000-chip cluster to train his own models. We didn’t take him seriously,” says one of Liang’s business partners. explain.

“He couldn’t express his vision clearly, he just said: I want to build this, it will change the rules of the game. We think this is only possible with giants like ByteDance and Alibaba,” the person added.

Liang’s status as an outsider in the field of artificial intelligence was an unexpected source of strength. At High-Flyer, he built his fortune by using artificial intelligence and algorithms to identify patterns that could affect stock prices. His team specializes in using Nvidia chips to make money from stock trading. In 2023, he launched DeepSeek, announcing his intention to develop human-level artificial intelligence.

“Liang has built a great infrastructure team that really understands how chips work,” said one founder of a rival company at LLM. “He brought his best talent from hedge funds to DeepSeek.”

After Washington banned Nvidia from exporting its most powerful chips to China, local AI companies were forced to find innovative ways to maximize the computing power of a limited number of homegrown chips—a problem Liang’s team already knew how to solve.

“The engineers at DeepSeek know how to unlock the potential of these GPUs, even if they are not state-of-the-art,” said an artificial intelligence researcher close to the company.

Industry insiders say DeepSeek’s unique focus on research makes it a dangerous competitor because of its willingness to share its breakthroughs rather than protect them for commercial gain. DeepSeek has yet to raise money from outside funds or take significant steps to monetize its model.

“DeepSeek operates similarly to the early DeepMind,” said an artificial intelligence investor in Beijing. “It’s purely focused on research and engineering.”

Liang is personally involved in DeepSeek’s research, and he uses the proceeds from his hedge fund transactions to pay high salaries for the best artificial intelligence talent. DeepSeek, like TikTok parent company ByteDance, is known for offering the highest salaries to Chinese artificial intelligence engineers and has employees in offices in Hangzhou and Beijing.

“DeepSeek’s offices feel like a college campus for serious researchers,” the business partner said. “The team believed in Liang’s vision: to show the world that Chinese people can be creative and create something from scratch.”

DeepSeek and High-Flyer did not respond to requests for comment.

Liang described DeepSeek as a unique “local” company, staffed by PhDs from China’s top schools, Peking University, Tsinghua University and Beihang University, rather than experts from U.S. institutions.

Last year, he said in an interview with domestic media that his core team “has no people who have returned from overseas. They are all locals… We must cultivate top talents ourselves.” DeepSeek’s status as a purely Chinese LLM company has won praise in the country.

DeepSeek claims it used just 2,048 Nvidia H800s and $5.6 million to train a model with 671 billion parameters, a fraction of what OpenAI and Google cost to train similarly sized models.

Ritwik Gupta, an artificial intelligence policy researcher at the University of California, Berkeley, said that DeepSeek’s recently released model shows that “there is no moat for artificial intelligence capabilities.”

“The first person to train a model has to spend a lot of resources to get there,” he said. “But latecomers can get there cheaper and faster.”

Gupta added that China has a larger talent pool of systems engineers than the United States, who understand how to make the most of computing resources to train and run models more cheaply.

Industry insiders said that although DeepSeek has achieved impressive results with limited resources, it is still an open question whether it can continue to remain competitive as the industry develops.

Its big backer, High-Flyer, has lagged in returns in 2024, which one person close to Liang attributed to the founder’s focus on DeepSeek.

Its U.S. rivals aren’t sitting still. They are building giant “clusters” of Nvidia’s next-generation Blackwell chips, and their computing power may once again create a performance gap with Chinese competitors.

This week, OpenAI said it is creating a joint venture called Stargate with Japan’s SoftBank and plans to spend at least $100 billion to build artificial intelligence infrastructure in the United States. Elon Musk’s xAI is massively expanding its Colossus supercomputer to include more than 1 million GPUs to help train its Grok AI model.

“DeepSeek has one of the largest advanced computing clusters in China,” Liang’s business partner said. “They have enough capacity for now, but not for much longer.”

Additional reporting by Ding Wenjie in Beijing

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button
×