{"id":1940,"date":"2025-02-13T09:42:20","date_gmt":"2025-02-13T00:42:20","guid":{"rendered":"https:\/\/skanto.co.kr\/?p=1940"},"modified":"2025-02-13T09:42:20","modified_gmt":"2025-02-13T00:42:20","slug":"how-did-deepseek-build-its-a-i-with-less-money","status":"publish","type":"post","link":"https:\/\/skanto.co.kr\/?p=1940","title":{"rendered":"How did DeepSeek build its A.I. with less money?"},"content":{"rendered":"\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p class=\"wp-block-paragraph\">The Chinese start-up used several technological tricks, including a method called &#8220;mixture of experts&#8221; to significantly reduce the cost of building the technology.<\/p>\n<\/blockquote>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><img decoding=\"async\" src=\"https:\/\/static01.nyt.com\/images\/2025\/02\/11\/multimedia\/00deepseek-howto-cfpb\/00deepseek-howto-cfpb-jumbo.jpg?quality=75&amp;auto=webp\" alt=\"\"\/><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\">A.I. companies typically train their chatbots using supercomputers packed with 16,000 specialized chips or more. But DeepSeek said it needed only about 2,000.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">DeepSeek engineers needed only about $6 million in raw computing power, roughly one-tenth of what Meta spent in building its latest A.I. technology.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">What exactly did DeepSeek do?<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">How are A.I. technologies built?<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Companies like the Silicon Valley chipmaker Nvidia originally designed these chips to render graphics for computer video games. But GPUs also had a knack for running the math that powered neural networks.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">As companies packed more GPUs into their computer data centers, their A.I. systems could analyze more data.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">But the best GPUs cost around $40,000, and they need huge amounts of electricity. Sending the data between chips can use more electrical power than running the chips themselves.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">How was DeepSeek able to reduce costs?<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">It did many things. Most notably, it embraced a method called \u201c<strong>mixture of experts<\/strong>.\u201d<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">If one chip was learning how to write a poem and another was learning how to write a computer program, they still needed to talk to each other, just in case there was some overlap between poetry and programming.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">With the mixture of experts method, researchers tried to solve this problem by splitting the system into many neural networks: one for poetry, one for computer programming, one for biology, one for physics and so on. There might be 100 of these smaller \u201cexpert\u201d systems. Each expert could concentrate on its particular field.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Many companies have struggled with this method, but DeepSeek was able to do it well. Its trick was to pair those smaller \u201cexpert\u201d systems with a \u201cgeneralist\u201d system.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The experts still needed to trade some information with one another, and the generalist \u2014 which had a decent but not detailed understanding of each subject \u2014 could help coordinate interactions between the experts.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">There is math involved in this?<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Remember your math teacher explaining the concept of pi. Pi, also denoted as \u03c0, is a number that never ends: 3.14159265358979 \u2026<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">You can use \u03c0 to do useful calculations, like determining the circumference of a circle. When you do those calculations, you shorten \u03c0 to just a few decimals: 3.14. If you use this simpler number, you get a pretty good estimation of a circle\u2019s circumference.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">DeepSeek did something similar \u2014 but on a much larger scale \u2014 in training its A.I. technology.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Typically, chips multiply numbers that fit into 16 bits of memory. But DeepSeek squeezed each number into only 8 bits of memory \u2014 half the space. In essence, it lopped several decimals from each number.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This meant that each calculation was less accurate. But that didn\u2019t matter. The calculations were accurate enough to produce a really powerful neural network.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">That\u2019s it?<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Well, they added another trick.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">After squeezing each number into 8 bits of memory, DeepSeek took a different route when multiplying those numbers together. When determining the answer to each multiplication problem \u2014 making a key calculation that would help decide how the neural network would operate \u2014 it stretched the answer across 32 bits of memory. In other words, it kept many more decimals. It made the answer more precise.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The Chinese start-up used several technological tricks, including a method called &#8220;mixture of experts&#8221; to significantly reduce the cost of building the technology. A.I. companies typically train their chatbots using supercomputers packed with 16,000 specialized chips or more. But DeepSeek said it needed only about 2,000. DeepSeek engineers needed only about $6 million in raw computing power, roughly one-tenth of what Meta spent in building its latest A.I. technology. What exactly did DeepSeek do? How are A.I. technologies built? Companies&#8230;<\/p>\n<p class=\"read-more\"><a class=\"btn btn-default\" href=\"https:\/\/skanto.co.kr\/?p=1940\"> Read More<span class=\"screen-reader-text\">  Read More<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_import_markdown_pro_load_document_selector":0,"_import_markdown_pro_submit_text_textarea":"","footnotes":""},"categories":[14,7],"tags":[48,179],"class_list":["post-1940","post","type-post","status-publish","format-standard","hentry","category-sw-development","category-7","tag-ai","tag-deepseek"],"_links":{"self":[{"href":"https:\/\/skanto.co.kr\/index.php?rest_route=\/wp\/v2\/posts\/1940","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/skanto.co.kr\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/skanto.co.kr\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/skanto.co.kr\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/skanto.co.kr\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1940"}],"version-history":[{"count":1,"href":"https:\/\/skanto.co.kr\/index.php?rest_route=\/wp\/v2\/posts\/1940\/revisions"}],"predecessor-version":[{"id":1941,"href":"https:\/\/skanto.co.kr\/index.php?rest_route=\/wp\/v2\/posts\/1940\/revisions\/1941"}],"wp:attachment":[{"href":"https:\/\/skanto.co.kr\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1940"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/skanto.co.kr\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1940"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/skanto.co.kr\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1940"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}