Chinese Firm Trains Massive AI Model for Just $5.5 Million

Chinese AI startup DeepSeek has released what appears to be one of the most powerful open-source language models to date, trained at a cost of just $5.5 million using restricted Nvidia H800 GPUs.

The 671-billion-parameter DeepSeek V3, released this week under a permissive commercial license, outperformed both open and closed-source AI models in internal benchmarks, including Meta’s Llama 3.1 and OpenAI’s GPT-4 on coding tasks.

The model was trained on 14.8 trillion tokens of data over two months. At 1.6 times the size of Meta’s Llama 3.1, DeepSeek V3 requires substantial computing power to run at reasonable speeds.

Andrej Karpathy, former OpenAI and Tesla executive, comments: For reference, this level of capability is supposed to require clusters of closer to 16K GPUs, the ones being brought up today are more around 100K GPUs. E.g. Llama 3 405B used 30.8M GPU-hours, while DeepSeek-V3 looks to be a stronger model at only 2.8M GPU-hours (~11X less compute). If the model also passes vibe checks (e.g. LLM arena rankings are ongoing, my few quick tests went well so far) it will be a highly impressive display of research and engineering under resource constraints.

Does this mean you don’t need large GPU clusters for frontier LLMs? No but you have to ensure that you’re not wasteful with what you have, and this looks like a nice demonstration that there’s still a lot to get through with both data and algorithms.

Recent Posts

Recent Comments

Trump Backers, Including Elon Musk, Clash With Far Right Over Immigrant Workers and H-1B Visas

The FTC’s Microsoft antitrust probe reportedly focuses on software bundling

The Batman 2 is delayed to 2027, but Mickey 17’s release date is moving up

Categories

Archives

Recent Posts

Recent Comments

Trump Backers, Including Elon Musk, Clash With Far Right Over Immigrant Workers and H-1B Visas

The FTC’s Microsoft antitrust probe reportedly focuses on software bundling

The Batman 2 is delayed to 2027, but Mickey 17’s release date is moving up

Categories

Archives

Chinese Firm Trains Massive AI Model for Just $5.5 Million

Leave a Reply Cancel reply

Archives

Categories