Google Found a Way to Make Local AI Up to 3x Faster—No New Hardware Required – Decrypt
In brief Google released Multi-Token Prediction (MTP) drafters for Gemma 4, delivering up to a 3x speedup at inference without any degradation in output quality. The technique—called speculative decoding—uses a lightweight “drafter” model to predict several tokens at once, which the main model then verifies in parallel, bypassing the one-token-at-a-time bottleneck. MTP drafters are available...
What's Your Reaction?
Like
0
Dislike
0
Love
0
Funny
0
Angry
0
Sad
0
Wow
0