Tag
speculative decoding
2 articles

Tools & Apps/May 9
Gemma 4 assistant models get faster draft tokens
Gemma 4 E2B and E4B assistant models use centroid masking to cut lm_head work about 45x with little quality loss.

Research/May 5
SpecKV tunes speculative decoding on the fly
SpecKV adapts speculative decoding’s token budget per step, using draft-model signals to beat fixed gamma across compression settings.