Google DeepMind releases DiffusionGemma, a model that runs local AI 4x faster

Google DeepMind has released a new artificial intelligence model called DiffusionGemma, which is part of the open-source Gemma 4 model family. Unlike other AI models, DiffusionGemma does not generate text in a linear fashion, but rather produces a complete block of text in parallel. This is achieved through a diffusion process, similar to that used in image generation, where a field of token markers is iteratively refined until the desired text is obtained.

This approach enables DiffusionGemma to be faster and more efficient when executed on local hardware, such as an Nvidia graphics card. In testing, the model has demonstrated the ability to generate around 700 tokens per second with an Nvidia RTX 5090 graphics card, and over 1,000 tokens per second with an Nvidia H100 AI accelerator. This represents a speed increase of approximately four times compared to similar autoregressive models.

The significance of this news lies in the importance of AI model speed and efficiency for their adoption in a variety of applications, from natural language processing to content generation. The ability to execute AI models locally and efficiently may open up new possibilities for developers and companies seeking to leverage the potential of artificial intelligence. Furthermore, research in this area may have a significant impact on the development and use of AI models in the future.

Read the original article on Ars Technica AI

This summary is an informational synthesis produced by dataqbs.com. All rights to the original content belong to its author and the cited media outlet. We act solely as curators of technology news and claim no authorship.