Tag

long-context inference

1 articles

5 KV cache takeaways for llama.cpp users

5 takeaways from TurboQuant: under-3-bit KV cache compression, memory savings, and the tradeoffs llama.cpp users should watch.