How did DeepSeek build its A.I. with less money?

2025-02-13 skanto Comments 0 Comment

The Chinese start-up used several technological tricks, including a method called “mixture of experts” to significantly reduce the cost of building the technology.

A.I. companies typically train their chatbots using supercomputers packed with 16,000 specialized chips or more. But DeepSeek said it needed only about 2,000.

DeepSeek engineers needed only about $6 million in raw computing power, roughly one-tenth of what Meta spent in building its latest A.I. technology.

What exactly did DeepSeek do?

How are A.I. technologies built?

Companies like the Silicon Valley chipmaker Nvidia originally designed these chips to render graphics for computer video games. But GPUs also had a knack for running the math that powered neural networks.

As companies packed more GPUs into their computer data centers, their A.I. systems could analyze more data.

But the best GPUs cost around $40,000, and they need huge amounts of electricity. Sending the data between chips can use more electrical power than running the chips themselves.

How was DeepSeek able to reduce costs?

It did many things. Most notably, it embraced a method called “mixture of experts.”

If one chip was learning how to write a poem and another was learning how to write a computer program, they still needed to talk to each other, just in case there was some overlap between poetry and programming.

With the mixture of experts method, researchers tried to solve this problem by splitting the system into many neural networks: one for poetry, one for computer programming, one for biology, one for physics and so on. There might be 100 of these smaller “expert” systems. Each expert could concentrate on its particular field.

Many companies have struggled with this method, but DeepSeek was able to do it well. Its trick was to pair those smaller “expert” systems with a “generalist” system.

The experts still needed to trade some information with one another, and the generalist — which had a decent but not detailed understanding of each subject — could help coordinate interactions between the experts.

There is math involved in this?

Remember your math teacher explaining the concept of pi. Pi, also denoted as π, is a number that never ends: 3.14159265358979 …

You can use π to do useful calculations, like determining the circumference of a circle. When you do those calculations, you shorten π to just a few decimals: 3.14. If you use this simpler number, you get a pretty good estimation of a circle’s circumference.

DeepSeek did something similar — but on a much larger scale — in training its A.I. technology.

Typically, chips multiply numbers that fit into 16 bits of memory. But DeepSeek squeezed each number into only 8 bits of memory — half the space. In essence, it lopped several decimals from each number.

This meant that each calculation was less accurate. But that didn’t matter. The calculations were accurate enough to produce a really powerful neural network.

That’s it?

Well, they added another trick.

After squeezing each number into 8 bits of memory, DeepSeek took a different route when multiplying those numbers together. When determining the answer to each multiplication problem — making a key calculation that would help decide how the neural network would operate — it stretched the answer across 32 bits of memory. In other words, it kept many more decimals. It made the answer more precise.

Supiami

The Hidden Life of TREES