Deepseek R1- the new AI platform that's challenging the US technology monopoly.

deepseek blue whale AI

Deepseek R1 disrupts the AI landscape, challenging the US technology monopoly with frontier-level performance at a fraction of the cost.

RNfinity | 02-02-2025

The launch of Deepseek R1 has stunned the AI world, delivering the performance of the frontier AI model Chat GPT 01 at a fraction of the price. Benchmark performance is on par across a range of domains, yet the Chat Bot platform is completely free and it’s API exchanges tokens at 1 fiftieth of   the price of the Open AI API. The US markets reacted by shaving 1 trillion dollars off the stock valuations of leading technology companies. Indeed NVIDIA, the primary supplier of high-end chips lost 600 billion dollars or 18% of its value. What was most astonishing is the purported training cost of Deepseek R1, at a paltry 10 million dollars, or 10 times less than Chat GPT 4.0. There is much speculation regarding the Hardware available to the Deepseek team, and whether they were able to import 1000s of high-end NVIDIA chips at a cost of a billion dollars, despite the US export ban to China, or whether CHAT GPT could have been used to train Deepseek R1.

 

Furthermore, the Deepseek R1 model is open source, meaning it can be downloaded and run on a local machine. However, the hardware requirements are very significant as would be expected.

What are the hardware requirements for running large language models (LLMs) like those developed by DeepSeek?

1. Model Size and VRAM Requirements

The most critical factor is the size of the model, which determines how much VRAM (GPU memory) or RAM (CPU memory) you need. Here's a rough estimate:

Model Size Approximate VRAM/GPU Requirements RAM/CPU Requirements
7B parameters 8–16 GB VRAM 16–32 GB RAM
13B parameters 16–24 GB VRAM 32–64 GB RAM
30B parameters 24–48 GB VRAM 64–128 GB RAM
65B+ parameters 48+ GB VRAM 128+ GB RAM

GPU VRAM:

For efficient inference, you need a GPU with sufficient VRAM. For example:

  • NVIDIA RTX 3090 (24 GB VRAM) can handle up to 13B models.
  • NVIDIA A100 (40–80 GB VRAM) can handle larger models (30B+).

CPU RAM:

If you don't have a GPU, you can run smaller models on a CPU, but it will be much slower. You'll need significantly more RAM.

2. GPU Recommendations

NVIDIA GPUs are preferred because of their CUDA support and compatibility with deep learning frameworks like PyTorch and TensorFlow.

  • Entry-level: RTX 3060 (12 GB VRAM) for smaller models (7B).
  • Mid-range: RTX 3090 (24 GB VRAM) or RTX 4090 (24 GB VRAM) for medium-sized models (13B–30B).
  • High-end: NVIDIA A100 (40–80 GB VRAM) for large models (65B+).

3. CPU Recommendations

If you're running the model on a CPU:

  • Use a modern multi-core processor (e.g., AMD Ryzen 9 or Intel Core i9).
  • Ensure you have enough RAM (see the table above).
  • Expect significantly slower performance compared to GPU inference.

4. Storage Requirements

Disk Space: Model weights can take up a lot of space. For example:

  • A 7B model might require 10–20 GB of disk space.
  • A 65B model might require 200+ GB of disk space.

Use an SSD for faster loading times.

5. Quantization for Lower Hardware Requirements

If your hardware is limited, you can use quantized versions of the model (e.g., 4-bit or 8-bit quantization). This reduces the VRAM/RAM requirements significantly:

  • A 7B model quantized to 4-bit might only need 4–6 GB VRAM.
  • Tools like llama.cpp or Hugging Face's bitsandbytes can help with quantization.

6. Example Hardware Setups

  • Budget Setup:
    • GPU: NVIDIA RTX 3060 (12 GB VRAM)
    • RAM: 32 GB
    • Storage: 1 TB SSD
    • Suitable for 7B models.
  • Mid-Range Setup:
    • GPU: NVIDIA RTX 3090 (24 GB VRAM)
    • RAM: 64 GB
    • Storage: 2 TB SSD
    • Suitable for 13B–30B models.

  • High-End Setup:
    • GPU: NVIDIA A100 (40–80 GB VRAM)
    • RAM: 128+ GB
    • Storage: 4 TB SSD
    • Suitable for 65B+ models.

7. Cloud Alternatives

If you don’t have the required hardware, consider using cloud services like:

  • AWS EC2 (e.g., p4d instances with A100 GPUs)
  • Google Cloud (e.g., A100 or T4 GPUs)
  • Lambda Labs or RunPod for GPU rentals.

8. Software Requirements

  • Operating System: Linux (Ubuntu is preferred) or Windows.
  • Frameworks: PyTorch, TensorFlow, or JAX, depending on the model.
  • Libraries: Hugging Face transformers, accelerate, bitsandbytes, etc.

What does this mean for the future of AI development?

The AI world has changed as a result of Deepseek R1. It was never going to be unipolar anyway. Competition is good and will drive down costs and increase the demand for AI. Already Open AI has responded by releasing the 03 mini model which has set some new benchmarks.

The open-source development of Deepseek may have led to increased training efficiency and makes it more likely that powerful AI will be cheaply available everywhere in the future, perhaps for all but the frontier models at least.

The US have instigated a 500 billion Stargate project which will see the pooling of hardware, data, compute time and the finest developers. The goal is clear, to achieve artificial general intelligence (AGI) or super intelligence (ASI) as a once in a lifetime opportunity to beat all other rivals to super intelligence and then further development in AI performance will be through the AI.

The costs of further developments in AI are huge due to the scaling laws, which require exponential effort for linear growth. These indicate that in order to achieve super intelligence, 100,000 times as much compute will be required. But Deepseek may have shifted this paradigm by training a model with much less compute. It is also possible that the cost reduction could be attributable to lower labour costs or training a less than frontier model. Either way the feat is remarkable and has changed the AI industry.

It is not clear yet if AGI is possible. Off course all the leading technology companies believe that it is. The current AI usage, given the huge costs indicate that companies are playing the long game, preparing to incur financial losses on the way to developing what may be available soon. This loss leading exercise may assist in the development of AI through the insight gained from public interactions.

We may not have a clear grasp of what constitutes human intelligence. Impressive though chat bots are, having benefited from a magnitude of training that surpasses what any individual human has received, it is not clear if they are more than speedy librarians, serving up the curated regurgitation of human ingenuity, always selecting the modal probabilistic route. This may not be a route that is taken when generating new knowledge. And as we provide ever greater datasets with greater compute, where does this really take the AI? The AI industry will have us believe this is somewhere beyond human intelligence. Only time will tell whether the 500 billion stargate project is a once only opportunity to reach the AI singularity and achieve unrivalled power. It may turn out to be the pinnacle of Hubris, like a sci-fi tower of Babel. Perhaps further Large Language Model scaling doesn’t get us much further than where we are right now. Any frontier model may very will be replicated a few months later for a fraction of the cost. Doubtless AI will flood the labour markets as we are we only beginning to understand its potential application.