The AI landscape has shifted, seemingly overnight. A Chinese AI startup, Deepseek, introduced a reasoning language model named Deepseek R1, and it’s turning heads. This model utilises large-scale reinforcement learning with impressive results.
It’s posing serious competition to more established models in the West, like OpenAI’s models. Deepseek R1 stands out in the current AI landscape.
What’s driving this excitement? It boils down to Deepseek R1’s reasoning capabilities and its surprisingly different approach to development. With the often closed-off nature of Western markets, Deepseek offers a potential solution, promising cost savings and increased flexibility.
Table Of Contents:
- DeepSeek R1: Changing AI as We Know It
- The Architecture of DeepSeek-R1: How it is Built
- DeepSeek R1’s Broader Impact
- Potential Controversy
- The Upsides
- How to Start Using DeepSeek-R1
- Conclusion
DeepSeek R1: Changing AI as We Know It
Deepseek R1 isn’t merely another AI model ; it’s causing significant disruption. This is due to its potent features and a somewhat controversial open-source approach.
The AI startup employs large-scale reinforcement learning on its base model . This isn’t something commonly observed in AI development.
Its language model is viewed by some as a challenge to the massive US-backed models, such as OpenAI. The core of the design is that DeepSeek R1 prioritises efficiently answering complex questions.
Reinforcement Learning Put to the Test
The DeepSeek-R1 paper demonstrates a clear dedication to open research. A key element is the application of reinforcement learning (RL) to their base model. This occurs *before* any supervised fine-tuning (SFT), resulting in two distinct versions: DeepSeek-R1-Zero and DeepSeek-R1.
DeepSeek-R1-Zero, developed purely from large-scale RL, excels at tackling reasoning tasks . Deepseek r1 was refined, and Deepseek-R1 included extra training and ” cold-start data .”
These improvements showcase Deepseek’s iterative approach. This constant improvement to Deepseek models gives hope to enthusiasts.
Distilled Models
DeepSeek’s research indicates that the reasoning patterns of larger language models can be transferred into smaller models . Their team put this theory into practice. DeepSeek made several smaller ai models (building upon Qwen and Llama) available as open-source, and their performance results are generating significant discussion.
These distilled models are available in various sizes: 1.5B, 7B, 8B, 14B, 32B, and 70B parameters. Each offers different capabilities to accommodate a range of requirements.
The 7B and 8B sizes are common for developers working to build their own systems, but needing help. The 1.5B could assist where extreme compute constraints happen.
Key Abilities of Deepseek R1
Deepseek R1 concentrates on several key areas:
- Reasoning: It excels in areas such as code, mathematics, and common-sense reasoning.
- Efficiency: The architecture and training methodology produce surprisingly robust results while minimising computational demands.
Comparing DeepSeek-R1 Performance to Other AI Models
This table presents a breakdown of its performance across specific benchmarks:
Category | Benchmark | DeepSeek-R1 |
---|---|---|
English | MMLU (Pass@1) | 90.8 |
Code | LiveCodeBench (Pass@1-COT) | 65.9 |
Math | AIME 2024 (Pass@1) | 79.8 |
Chinese | C-Eval (EM) | 91.8 |
Source: Data based on findings at Hugging Face
The Architecture of DeepSeek-R1: How it is Built
This aspect is quite technical. The main component is the *MoE* (Mixture of Experts) system.
DeepSeek-R1 and DeepSeek-R1-Zero both possess 671 *billion* total parameters. However, only a small fraction of these (around 37 billion) are active at any given moment, enhancing operational efficiency.
The language model boasts a large “context length” of 128K. This shows the volume of text it can retain while formulating responses.
Further details on its structure can be found in the DeepSeek-V3 information. DeepSeek-V3 details some of the language mixing strategies used.
Getting Started With Deepseek R1: Usage Advice
If you’re exploring the use of the DeepSeek-R1 series, a few guidelines will help maintain optimal results.
- Keep it Cool: It may sound unusual, but maintaining the model’s “temperature” around 0.6 prevents it from generating repetitive, nonsensical outputs.
- No Extra Prompting: Avoid providing the AI system with a predefined “system prompt.” Instead, incorporate all instructions within your initial query.
- For the Math Fans: When posing a mathematics-related problem, instruct it to reason “step by step.” Request that it present its answer within a clearly defined box (\\boxed{}). This simplifies the process.
- Check Multiple Times: It’s beneficial to have the AI execute the task multiple times. The averaged scores often provide a more accurate representation of its capabilities.
License Info: Open-Source Implications
One major detail is licensed DeepSeek-R1 which utilises the MIT license. It provides unrestricted (commercial and experimental) reuse.
This could become critical for the future of AI but exclusively in open spaces and not behind closed doors. Some even speculate this may have inspired the use of GPQA Diamond level tests
But there are some legal issues to be aware of. These models were constructed on top of pre-existing systems, and many smaller distilled models.
Like these Apache 2.0 License based, which are for example ## v. There are certain that llama3 appended to the models which are built on top of Llama framework. 1 license.
DeepSeek R1’s Broader Impact
The quick release, combined with the open-access model, has generated a ton of interest. DeepSeek has proved that you don’t need tons of money anymore to get competitive results.
Although analysts estimated Deepseek R1 had cost “under” $6 million to train, the cost is only likely to be somewhat different. This is a pittance relative to the billions in investments leading Silicon Valley AI firms have made.
DeepSeek and Other Chinese Models
A growing force of powerful ai models like Deepseek R1 spr out out of China with help from Alibaba and Moonshot AI This ramps up the ongoing “AI race.”
Some analysts, in particular, point to a geopolitical dimension, given recent moves to restrict the flow of advanced chips to China. Such measures may, paradoxically, have instead encouraged innovation by forcing the development of new AI methods.
The launch of Deepseek R1 even temporarily lowered the stock prices of leading chip makers, including Nvidia. Key contributions were made by individuals including Shirong Ma, Ruoyu Zhang, Runxin Xu, Qihao Zhu, and Peiyi Wang.
Potential Controversy
But alongside the enthusiasm comes controversy over Deepseek R1.
OpenAI is accusing DeepSeek of abusing its models in order to build R1.
It would be considered a breach of agreements. Plus, that’s the drug on which low-cost training chip claims are insufficiently verified. Others argue for the much more likely scenario of the covert use of a legislated banned, high-efficacy Nvidia graphs as
Limitations: Societal Biases
DeepSeek, which is Chinese, faces heavy scrutiny as a product. This pressure molds output to align with perceived “core socialist values.”
Users experience this effect. It’s all silenced with an about face with regard to difficult questions on things like sensitive historical or political issues.
This bias is not unique to AI, but is easily seen using DeepSeek, especially when examining broader model usage free of corporate gatekeeping. The following people are linked to Deepseek R1: Zhu Shirong Ma Peiyi Wang, Zhang Runxin Xu Qihao Zhu Shirong, Yang Haowei Zhang Junxiao Song Ruoyu and Dejian Yang Haowei Zhang Junxiao Song
The Upsides
That said, with the core language model made available via Hugging Face and its own API access, the true measure of success will be in broader community uptake.
Users of DeepSeek’s own chatbot app briefly exceeded even ChatGPT in the app store rankings. Deepseek R1’s capabilities could offer huge long-term benefits for researchers, and possibly for ordinary developers. This advanced technology has previously been out of reach for these developers without deep pockets.
Others involved in this project include: Xu Qihao Zhu Shirong Ma Peiyi, Song Ruoyu Zhang Runxin Xu Qihao, and Junxiao Song Ruoyu Zhang Runxin Xu.
How to Start Using DeepSeek-R1
Deepseek R1 provides developers with several options for getting started:
- Direct Access via API : The company enables direct interaction through its DeepSeek Platform (platform.deepseek.com). They adhere to OpenAI standards to simplify this process.
- Running It Locally : There are multiple approaches, depending on the specific version utilised. The primary DeepSeek-R1 models (not yet compatible with standard Hugging Face tools) require consulting the supplementary information in the DeepSeek-V3 files. Conversely, their smaller ‘Distill’ dense models are compatible with many tools already widely used by the open-source AI community. Popular frameworks such as vLLM, or the SGLang project, significantly streamline the configuration process on a user’s own computing setup.
Conclusion
TDeepseek R1 comes as a testament to how fast the AI landscape is changing. No matter whether its final powers turn out to be durable, the model disrupts standard thoughts. Deepseek is shifting the balance of what is needed to train a strong reasoning model. This is significant.
It seems useful for those doing language models better. It opens possibilities that were previously out of reach because of necessary investment. Deepseek-R is showing value to other systems’ builders.
This new approach and transparency bring some compelling new considerations. Will US tech giants retain their dominance through excessive spend, whilst competing models of equivalent capability are created via other means? Time will tell with such fundamental questions. But, at this stage, models such as DeepSeek-R1 remain competitive in the developing AI arena, resulting in wider availability than once thought.