Hardware requirements for running the large language model Deepseek R1 locally.

AI hardware

The minimum and recommended system requirements for Large Language Models like DeepSeek R1, Learn what CPU, RAM and GPU are needed to run DeepSeek LLM

RNfinity | 10-02-2025

What are the hardware requirements for running large language models (LLMs) like those developed by DeepSeek?

1. Model Size and VRAM Requirements

The most critical factor is the size of the model, which determines how much VRAM (GPU memory) or RAM (CPU memory) you need. Here's a rough estimate:

Model Size Approximate VRAM/GPU Requirements RAM/CPU Requirements
7B parameters 8–16 GB VRAM 16–32 GB RAM
13B parameters 16–24 GB VRAM 32–64 GB RAM
30B parameters 24–48 GB VRAM 64–128 GB RAM
65B+ parameters 48+ GB VRAM 128+ GB RAM

GPU VRAM:

For efficient inference, you need a GPU with sufficient VRAM. For example:

  • NVIDIA RTX 3090 (24 GB VRAM) can handle up to 13B models.
  • NVIDIA A100 (40–80 GB VRAM) can handle larger models (30B+).

CPU RAM:

If you don't have a GPU, you can run smaller models on a CPU, but it will be much slower. You'll need significantly more RAM.

2. GPU Recommendations

NVIDIA GPUs are preferred because of their CUDA support and compatibility with deep learning frameworks like PyTorch and TensorFlow.

  • Entry-level: RTX 3060 (12 GB VRAM) for smaller models (7B).
  • Mid-range: RTX 3090 (24 GB VRAM) or RTX 4090 (24 GB VRAM) for medium-sized models (13B–30B).
  • High-end: NVIDIA A100 (40–80 GB VRAM) for large models (65B+).

3. CPU Recommendations

If you're running the model on a CPU:

  • Use a modern multi-core processor (e.g., AMD Ryzen 9 or Intel Core i9).
  • Ensure you have enough RAM (see the table above).
  • Expect significantly slower performance compared to GPU inference.

4. Storage Requirements

Disk Space: Model weights can take up a lot of space. For example:

  • A 7B model might require 10–20 GB of disk space.
  • A 65B model might require 200+ GB of disk space.

Use an SSD for faster loading times.

5. Quantization for Lower Hardware Requirements

If your hardware is limited, you can use quantized versions of the model (e.g., 4-bit or 8-bit quantization). This reduces the VRAM/RAM requirements significantly:

  • A 7B model quantized to 4-bit might only need 4–6 GB VRAM.
  • Tools like llama.cpp or Hugging Face's bitsandbytes can help with quantization.

6. Example Hardware Setups

  • Budget Setup:
    • GPU: NVIDIA RTX 3060 (12 GB VRAM)
    • RAM: 32 GB
    • Storage: 1 TB SSD
    • Suitable for 7B models.
  • Mid-Range Setup:
    • GPU: NVIDIA RTX 3090 (24 GB VRAM)
    • RAM: 64 GB
    • Storage: 2 TB SSD
    • Suitable for 13B–30B models.

  • High-End Setup:
    • GPU: NVIDIA A100 (40–80 GB VRAM)
    • RAM: 128+ GB
    • Storage: 4 TB SSD
    • Suitable for 65B+ models.

7. Cloud Alternatives

If you don’t have the required hardware, consider using cloud services like:

  • AWS EC2 (e.g., p4d instances with A100 GPUs)
  • Google Cloud (e.g., A100 or T4 GPUs)
  • Lambda Labs or RunPod for GPU rentals.

8. Software Requirements

  • Operating System: Linux (Ubuntu is preferred) or Windows.
  • Frameworks: PyTorch, TensorFlow, or JAX, depending on the model.
  • Libraries: Hugging Face transformers, accelerate, bitsandbytes, etc.