Tesla Dojo: The $300 million Supercomputer

Tesla Dojo supercomputer

Tesla Dojo is Tesla’s own supercomputer powered by its in-house chip D1 as NVIDIA is not able to deliver enough GPUs for the requirement of its FSD program. Tesla will its own supercomputer Dojo and a GPU cluster made of NVIDIA H100 GPUs simultaneously.

Today on Aug 28, the electric vehicle giant Tesla is set to activate a highly advanced GPU cluster. It consists of 10,000 units of the cutting-edge NVIDIA H100 GPUs. This supercomputer aims to significantly enhance Tesla’s training of its Full Self-Driving (FSD) technology.

For comparison’s sake, this Tesla cluster’s performance of 340 FP64 PFLOPS is higher than 309 FP64 PFLOPS produced by LUMI, the current world’s 3rd highest performing supercomputer


“Frankly… if they (NVIDIA) could deliver us enough GPUs, we might not need Dojo.”

– Elon Musk


How Is The H100 GPU Different From The Previous A100 GPU?

Released late in 2022, the new H100 from NVIDIA came with a price tag of $40,000.
It is reported to be 30 times faster than A100.
H100 is reported 9 times speedier when it comes to training AI and at least 5 times faster than A100 when it comes to high performance computing.

A concise breakdown of the difference between the two GPUs is presented below.

NVIDIA A100:

Having made its debut three years ago in late 2020, the NVIDIA A100 GPU emerged as a game-changer. It ushered in a staggering 20-fold surge in performance when compared to its predecessor. Primarily tailored for high-performance computing and artificial intelligence (AI) workloads, the A100 boasts the following specifications:

  • 6,912 CUDA cores
  • 432 tensor cores
  • Choice between 40 GB or 80 GB of high-bandwidth memory (HBM2)

NVIDIA H100:

Geared toward tasks demanding substantial graphics processing, such as video training, especially evident in FSD video analysis, the H100 is distinguished by the following attributes:

  • 18,432 CUDA cores
  • 640 tensor cores
  • A remarkable 80 streaming multiprocessors (SMs)
  • Elevated energy consumption compared to the A100

The integration of these new H100 GPUs equips Tesla with the capability to conduct FSD training with unparalleled speed and precision.


Why is Tesla building its own Supercomputer?

Tesla is building its own supercomputer primarily because NVIDIA has been struggling persistently with its GPU shortage.
Due to this, Tesla has allocated over $1 billion for its supercomputer, Dojo. This machine will operate on Tesla’s specially optimized chips designed in-house by Tesla.

In its Q2, 2023 Earnings report, Tesla had emphasized on “Four main technology pillars” needed to “solve vehicle autonomy at scale: extremely large real-world dataset, neural net training, vehicle hardware and vehicle software.”

Tesla is developing all the 4 pillars in-house and Dojo will contribute to the pillar of “Neural Net Training”.
Dojo will serve a dual purpose. It will not only train the cars produced by Tesla but also process all the data generated by these vehicles.


Will Tesla Dojo replace the H100 cluster?

Tesla supercomputer Dojo is not being built to replace the H100 cluster that Tesla is starting today. Tesla is actually launching the NVIDIA H100 GPU cluster at the same time its initiating the operations of Dojo.

This synchronized deployment of computational power by Tesla will put it leagues ahead of any of its competitor in terms of data processing capabilities.

With Tesla’s Dojo D1 Chip, Is It Competing Against NVIDIA And AMD?

Tesla doesn’t plan to jump into semiconductor manufacturing and compete against companies like NVIDIA and AMD.
This decision to develop an in-house Tesla supercomputer came after NVIDIA is not able to keep up with the ever rising demand of its GPU from various AI focused companies like OpenAI and Tesla.
In fact Tesla CEO, Elon Musk said, “Frankly…if they (NVIDIA) could deliver us enough GPUs, we might not need Dojo.”

Leave a Reply

Your email address will not be published. Required fields are marked *