{"id":1905,"date":"2025-01-28T21:08:14","date_gmt":"2025-01-28T12:08:14","guid":{"rendered":"https:\/\/skanto.co.kr\/?p=1905"},"modified":"2025-01-29T15:19:17","modified_gmt":"2025-01-29T06:19:17","slug":"deepseek","status":"publish","type":"post","link":"https:\/\/skanto.co.kr\/?p=1905","title":{"rendered":"DeepSeek"},"content":{"rendered":"\n<p>DeepSeek\u2019s breakthrough on cost challenges the \u201cbigger is better\u201d narrative that has driven the A.I. arms race in recent years by showing that relatively small models, when trained properly, can match or exceed the performance of much bigger models.<\/p>\n\n\n\n<p>That, in turn, means that A.I. companies may be able to achieve very powerful capabilities with far less investment than previously thought. And it suggests that we may soon see a flood of investment into smaller A.I. start-ups, and much more competition for the giants of Silicon Valley. (Which, because of the enormous cost of training their models, have mostly been competing with each other until now)<\/p>\n\n\n\n<p>There are other, more technical reasons that everyone in Silicon Valley is paying attention to DeepSeek. In the research paper, the company reveals some details about how R1 was actually built, which include some cutting-edge techniques in model distillation. (Basically ones, making them cheaper to run without losing much in the way of performance)<\/p>\n\n\n\n<p>DeepSeek also included details that suggested that it had not been as hard as previously thought to convert a \u201cvanilla\u201d A.I. language model into a more sophisticated reasoning model, by applying a technique known as <strong>reinforcement learning<\/strong> on top of it.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>The most surprising part of DeepSeek-R1 is that it only takes ~800k samples of \u2018good\u2019 RL reasoning to convert other models into RL-reasoners. Now that DeepSeek-R1 is available people will be able to refine samples out of it to convert any other model into an RL reasoner.<\/p>\n<\/blockquote>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>This is a big deal because it says that if you want to control AI systems you need to not only control the basic resources(e.g, compute, electricity), but also the platforms the systems are being served on(e.g., proprietary websites) so that you don\u2019t leak the really valuable stuff &#8211; samples including chains of thought from reasoning models.<\/p>\n<\/blockquote>\n\n\n\n<p>If you\u2019re Meta &#8211; the only U.S. tech giant that releases its models as free open-source software &#8211; what prevents Deep Seek or another start-up from simply taking your models, which you spent billions of dollars on, and distilling them into smaller, cheaper models that they con offer for pennies?<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Aha Moment of DeepSeek-R1-Zero<\/h3>\n\n\n\n<p>This aha moment underscores the power and beauty of reinforcement learning: rather than explicitly teaching the model on how to solve a problem, we simply provide it with the right incentives, and it autonomously develops advanced problem-solving strategies. The \u201caha moment\u201d serves as a powerful reminder of the potential of RL to unlock new levels of intelligence in artificial systems, paving the way for more autonomous and adaptive models in the future.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Description without jargon<\/h3>\n\n\n\n<p>R1 surfed the web endlessly (pre-training) then <span style=\"text-decoration: underline;\">read a reasoning manual<\/span> made by humans(SFT &#8211; supervised fine-tuning), and finally did some self-experimentation (RL + TTC &#8211; test-time compute). R1-Zero, in contrast, <span style=\"text-decoration: underline;\">didn\u2019t read any manuals<\/span>. They pre-trained R1-Zero on tons of web data and immediately after sent it to the RL phase: \u201cNow go figure out how to reason yourself.\u201d<\/p>\n\n\n\n<p>DeepSeek\u2019s approach to R1 and R1-Zero is reminiscent of DeepMind\u2019s approach to AlphGo and AlphGo Zero. DeepMind did something similar to go from AlphGo to AlphaGo Zero in 2016-2017. AlphaGo learned to play Go by knowing the rules and learning from millions of human matches but then, a year later, decided to teach AlphaGo Zero without any human data, just the rules. And it destroyed AlphaGo. <span style=\"text-decoration: underline;\">Unfortunately, open-ended reasoning has proven harder than Go<\/span>; R1-Zero is slightly worse than R1 and has some issues like poor readability (besides, both still rely heavily on vast amounts of human-crated data in their base model &#8211; a far cry from an AI capability of rebuilding human civilization using nothing more than the laws of physics).<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/substackcdn.com\/image\/fetch\/f_auto,q_auto:good,fl_progressive:steep\/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ecef20b-979b-40c3-9fe2-bbcc0114f40c_873x388.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">what R1 vs R1-Zero looks like<\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">Better base models + distillation + RL wins<\/h3>\n\n\n\n<p>While distillation strategies are both economical and effective, advancing beyond the boundaries of intelligence may still require more powerful base models and larger-scale reinforcement learning.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"635\" src=\"https:\/\/skanto.co.kr\/wp-content\/uploads\/2025\/01\/https3A2F2Fsubstack-post-media.s3.amazonaws.com2Fpublic2Fimages2Fe17e4d33-5c1e-4907-8f4e-aed05878d1ba_1064x660.png-1024x635.webp\" alt=\"\" class=\"wp-image-1915\" srcset=\"https:\/\/skanto.co.kr\/wp-content\/uploads\/2025\/01\/https3A2F2Fsubstack-post-media.s3.amazonaws.com2Fpublic2Fimages2Fe17e4d33-5c1e-4907-8f4e-aed05878d1ba_1064x660.png-1024x635.webp 1024w, https:\/\/skanto.co.kr\/wp-content\/uploads\/2025\/01\/https3A2F2Fsubstack-post-media.s3.amazonaws.com2Fpublic2Fimages2Fe17e4d33-5c1e-4907-8f4e-aed05878d1ba_1064x660.png-300x186.webp 300w, https:\/\/skanto.co.kr\/wp-content\/uploads\/2025\/01\/https3A2F2Fsubstack-post-media.s3.amazonaws.com2Fpublic2Fimages2Fe17e4d33-5c1e-4907-8f4e-aed05878d1ba_1064x660.png-768x476.webp 768w, https:\/\/skanto.co.kr\/wp-content\/uploads\/2025\/01\/https3A2F2Fsubstack-post-media.s3.amazonaws.com2Fpublic2Fimages2Fe17e4d33-5c1e-4907-8f4e-aed05878d1ba_1064x660.png-435x270.webp 435w, https:\/\/skanto.co.kr\/wp-content\/uploads\/2025\/01\/https3A2F2Fsubstack-post-media.s3.amazonaws.com2Fpublic2Fimages2Fe17e4d33-5c1e-4907-8f4e-aed05878d1ba_1064x660.png.webp 1064w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>2025.1.28<\/p>\n","protected":false},"excerpt":{"rendered":"<p>DeepSeek\u2019s breakthrough on cost challenges the \u201cbigger is better\u201d narrative that has driven the A.I. arms race in recent years by showing that relatively small models, when trained properly, can match or exceed the performance of much bigger models. That, in turn, means that A.I. companies may be able to achieve very powerful capabilities with far less investment than previously thought. And it suggests that we may soon see a flood of investment into smaller A.I. start-ups, and much more&#8230;<\/p>\n<p class=\"read-more\"><a class=\"btn btn-default\" href=\"https:\/\/skanto.co.kr\/?p=1905\"> Read More<span class=\"screen-reader-text\">  Read More<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_import_markdown_pro_load_document_selector":0,"_import_markdown_pro_submit_text_textarea":"","footnotes":""},"categories":[14,7],"tags":[48,179],"class_list":["post-1905","post","type-post","status-publish","format-standard","hentry","category-sw-development","category-7","tag-ai","tag-deepseek"],"_links":{"self":[{"href":"https:\/\/skanto.co.kr\/index.php?rest_route=\/wp\/v2\/posts\/1905","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/skanto.co.kr\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/skanto.co.kr\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/skanto.co.kr\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/skanto.co.kr\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1905"}],"version-history":[{"count":8,"href":"https:\/\/skanto.co.kr\/index.php?rest_route=\/wp\/v2\/posts\/1905\/revisions"}],"predecessor-version":[{"id":1916,"href":"https:\/\/skanto.co.kr\/index.php?rest_route=\/wp\/v2\/posts\/1905\/revisions\/1916"}],"wp:attachment":[{"href":"https:\/\/skanto.co.kr\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1905"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/skanto.co.kr\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1905"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/skanto.co.kr\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1905"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}