China’s Xiaomi MiMo Is Now 15X Faster Than ChatGPT and Claude – Decrypt
In brief Xiaomi and inference partner TileRT have broken 1,000 tokens per second on a 1-trillion-parameter model, a first at that scale, using a standard 8-GPU commodity node—not custom chips. The speed comes from FP4 quantization on the model’s expert layers and DFlash speculative decoding, which proposes a full block of tokens in one pass...
What's Your Reaction?
Like
0
Dislike
0
Love
0
Funny
0
Angry
0
Sad
0
Wow
0