The Enormous Compute Power Needed for Mainstream Text-to-Video Generation

The world of crypto is abuzz with excitement over the possibility of text-to-video generation, and the unveiling of OpenAI’s Sora demo sent shockwaves through the industry. This surge in interest has also had a profound impact on AI tokens, with their market cap reaching a staggering $25 billion according to CoinGecko data.

But to make text-to-video generation mainstream, we will need an astronomical amount of compute power. In fact, it is estimated that we will require more GPUs than are currently used by tech giants like Microsoft, Meta, and Google combined.

Research by Factorial Funds has revealed that a mind-boggling 720,000 high-end Nvidia H100 GPUs would be necessary to support the creator community of platforms like TikTok and YouTube. Training Sora alone would require around 10,500 powerful GPUs per month, and each GPU can generate just 5 minutes of video per hour during the inference process.

This significant amount of compute power far exceeds what is needed for other AI applications such as GPT4 or still image generation. Furthermore, as more individuals and companies adopt AI models like Sora for video generation, the compute power required for inference will surpass that needed for initial model training.

To put things into perspective, Nvidia shipped 550,000 H100 GPUs in 2023, and the twelve largest customers collectively possess 650,000 of these cards. The two largest customers, Meta and Microsoft, account for 300,000 GPUs between them. Acquiring all the necessary GPUs would cost approximately $21.6 billion, which is almost the entire current market cap of AI tokens.

While Nvidia dominates the AI chip market, it’s crucial to acknowledge that AMD also produces competitive products and has seen significant stock market growth. Additionally, other avenues for outsourcing GPU compute power exist, such as distributed GPU computing offered by Render (RNDR) and Akash Network (AKT). However, these networks predominantly rely on less powerful retail-grade gaming GPUs.

Despite the tantalizing promise of text-to-video generation, it is unlikely to become mainstream in the near future due to the monumental hardware requirements. The quest for more compute power continues as we strive to make the dream of AI-generated videos a reality.

The text-to-video generation technology in the crypto industry has caused a surge in interest and has had a profound impact on AI tokens. According to CoinGecko data, the market cap of AI tokens has reached a staggering $25 billion. However, in order to make text-to-video generation mainstream, a massive amount of compute power is required.

Research by Factorial Funds has revealed that platforms like TikTok and YouTube would need an astonishing 720,000 high-end Nvidia H100 GPUs to support their creator community. Training a single model like Sora would require around 10,500 powerful GPUs per month. Each GPU can only generate 5 minutes of video per hour during the inference process.

This amount of compute power far exceeds what is needed for other AI applications, such as GPT4 or still image generation. Moreover, as more individuals and companies adopt AI models for video generation, the compute power required for inference will surpass that needed for initial model training.

To put things into perspective, Nvidia shipped 550,000 H100 GPUs in 2023, and the twelve largest customers collectively possess 650,000 of these cards. Meta and Microsoft, the two largest customers, account for 300,000 GPUs between them. Acquiring all the necessary GPUs would cost approximately $21.6 billion, which is nearly the entire current market cap of AI tokens.

While Nvidia dominates the AI chip market, it’s worth noting that AMD also produces competitive products and has experienced significant stock market growth. Additionally, there are other options for outsourcing GPU compute power, such as distributed GPU computing offered by Render (RNDR) and Akash Network (AKT). However, these networks primarily rely on less powerful retail-grade gaming GPUs.

Despite the promising potential of text-to-video generation, it is unlikely to become mainstream in the near future due to the monumental hardware requirements. The quest for more compute power continues as the industry strives to make AI-generated videos a reality.

For more information on the crypto industry and AI tokens, you can visit the CoinGecko website at CoinGecko.