Google’s DiffusionGemma AI Hits 1,000 Tokens Per Second—And It’s Free – Decrypt
In brief Google released DiffusionGemma, a free open-weight model that generates entire 256-token blocks simultaneously via text diffusion—hitting over 1,000 tokens per second on an NVIDIA H100, four times faster than standard autoregressive models. The custom drafter module DiffusionGemma needs for local inference doesn’t exist in any public runtime yet—not in mlx-lm, not in LM...
What's Your Reaction?
Like
0
Dislike
0
Love
0
Funny
0
Angry
0
Sad
0
Wow
0