[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"article-turboquant-vllm-comparison-fp8-kv-cache-en":3,"tags-turboquant-vllm-comparison-fp8-kv-cache-en":35,"related-lang-turboquant-vllm-comparison-fp8-kv-cache-en":46,"related-posts-turboquant-vllm-comparison-fp8-kv-cache-en":50,"series-research-670a7f69-911f-41e8-a18b-7d3491253a19":87},{"id":4,"title":5,"content":6,"summary":7,"source":8,"source_url":9,"author":10,"image_url":11,"keywords":12,"language":18,"translated_content":10,"views":19,"is_premium":20,"created_at":21,"updated_at":21,"cover_image":11,"published_at":22,"rewrite_status":23,"rewrite_error":10,"rewritten_from_id":24,"slug":25,"category":26,"related_article_id":27,"status":28,"google_indexed_at":10,"x_posted_at":10,"tweet_text":10,"title_rewritten_at":10,"title_original":10,"key_takeaways":29,"topic_cluster_id":33,"embedding":34,"is_canonical_seed":20},"670a7f69-911f-41e8-a18b-7d3491253a19","TurboQuant vs FP8: vLLM’s first broad test","\u003Cp data-speakable=\"summary\">vLLM found FP8 KV-cache quantization beats \u003Ca href=\"\u002Ftag\u002Fturboquant\">TurboQuant\u003C\u002Fa> on speed, while TurboQuant’s strongest variants hurt accuracy.\u003C\u002Fp>\u003Cp>\u003Ca href=\"https:\u002F\u002Fvllm.ai\u002Fblog\u002Fturboquant\" target=\"_blank\" rel=\"noopener\">vLLM\u003C\u002Fa> published a broad \u003Ca href=\"\u002Ftag\u002Fbenchmark\">benchmark\u003C\u002Fa> of \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2401.08671\" target=\"_blank\" rel=\"noopener\">TurboQuant\u003C\u002Fa> on May 11, 2026, and the headline is simple: the memory savings look good, but \u003Ca href=\"https:\u002F\u002Fvllm.ai\u002Fblog\u002Ffp8-kv-cache\" target=\"_blank\" rel=\"noopener\">FP8 KV-cache\u003C\u002Fa> still wins for most production serving setups. The team tested four TurboQuant variants across four models, five benchmarks, and both dense and \u003Ca href=\"\u002Ftag\u002Fmoe\">MoE\u003C\u002Fa> architectures, then compared them with BF16 and FP8 baselines.\u003C\u002Fp>\u003Cp>The most useful part of the post is that it moves the discussion away from small-model demos and into workloads that resemble actual serving traffic. That matters because KV-cache quantization only becomes interesting when context grows, requests pile up, and memory pressure starts to shape the whole inference stack.\u003C\u002Fp>\u003Ctable>\u003Cthead>\u003Ctr>\u003Cth>Method\u003C\u002Fth>\u003Cth>KV-cache capacity\u003C\u002Fth>\u003Cth>Latency impact\u003C\u002Fth>\u003Cth>Throughput impact\u003C\u002Fth>\u003Cth>Accuracy signal\u003C\u002Fth>\u003C\u002Ftr>\u003C\u002Fthead>\u003Ctbody>\u003Ctr>\u003Ctd>FP8\u003C\u002Ftd>\u003Ctd>2x\u003C\u002Ftd>\u003Ctd>Negligible\u003C\u002Ftd>\u003Ctd>Near BF16\u003C\u002Ftd>\u003Ctd>Matches baseline\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>TurboQuant k8v4\u003C\u002Ftd>\u003Ctd>2.4x\u003C\u002Ftd>\u003Ctd>10% to 68% slower\u003C\u002Ftd>\u003Ctd>80% to 75% of BF16\u003C\u002Ftd>\u003Ctd>Near baseline\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>TurboQuant 4bit-nc\u003C\u002Ftd>\u003Ctd>2.3x to 3.7x\u003C\u002Ftd>\u003Ctd>Measurable slowdown\u003C\u002Ftd>\u003Ctd>About 75% of BF16\u003C\u002Ftd>\u003Ctd>Moderate drop\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>TurboQuant k3v4-nc \u002F 3bit-nc\u003C\u002Ftd>\u003Ctd>Higher than FP8\u003C\u002Ftd>\u003Ctd>Largest slowdown\u003C\u002Ftd>\u003Ctd>66% to 73% of BF16\u003C\u002Ftd>\u003Ctd>Clear drop\u003C\u002Ftd>\u003C\u002Ftr>\u003C\u002Ftbody>\u003C\u002Ftable>\u003Ch2>Why TurboQuant got attention\u003C\u002Fh2>\u003Cp>TurboQuant compresses the KV-cache to 3 to 4 bits, then dequantizes back to BF16 before attention computation. That is very different from FP8, which stores the KV-cache in FP8 and also runs attention with FP8 Tensor Core operations. In plain English, TurboQuant saves memory aggressively, but it pays for that savings by doing extra work during inference.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778839858405-b5ao.png\" alt=\"TurboQuant vs FP8: vLLM’s first broad test\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>That tradeoff is why the vLLM team framed this as a study, not a product pitch. A method can look excellent on a slide deck and still lose badly once you measure latency, throughput, and accuracy on long prompts or hard reasoning tasks.\u003C\u002Fp>\u003Cul>\u003Cli>TurboQuant variants tested: k8v4, 4bit-nc, k3v4-nc, 3bit-nc\u003C\u002Fli>\u003Cli>Baselines: BF16 and FP8\u003C\u002Fli>\u003Cli>Models: \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002FMiniMaxAI\u002FMiniMax-M2.7\" target=\"_blank\" rel=\"noopener\">MiniMax-M2.7\u003C\u002Fa>, \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fmeta-llama\u002FLlama-3.3-70B-Instruct\" target=\"_blank\" rel=\"noopener\">Llama-3.3-70B-Instruct\u003C\u002Fa>, \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen3-30B-A3B-Instruct-2507\" target=\"_blank\" rel=\"noopener\">Qwen3-30B-A3B-Instruct-2507\u003C\u002Fa>, \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen3-30B-A3B-Thinking-2507\" target=\"_blank\" rel=\"noopener\">Qwen3-30B-A3B-Thinking-2507\u003C\u002Fa>\u003C\u002Fli>\u003Cli>Benchmarks: \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fopenai\u002Fmrcr\" target=\"_blank\" rel=\"noopener\">openai\u002Fmrcr\u003C\u002Fa>, AIME25, GPQA:Diamond, MATH500, LiveCodeBench-v6\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>What the accuracy numbers actually show\u003C\u002Fh2>\u003Cp>The accuracy story is mixed, but the pattern is consistent. FP8 and TurboQuant k8v4 stay close to the unquantized baseline on the retrieval and reasoning workloads. TurboQuant 4bit-nc loses some accuracy, but it remains in the range where a deployment team might still consider it if memory pressure is severe.\u003C\u002Fp>\u003Cp>The aggressive variants are where things fall apart. On long-context retrieval, the post says Llama-3.3-70B-Instruct at 128k context saw BF16 collapse to 98% average accuracy recovery, while 4bit-nc reached 96% recovery and k3v4-nc plus 3bit-nc dropped by about 20 points. On reasoning, the same pattern held, with AIME25 and LiveCodeBench-v6 taking the biggest hit.\u003C\u002Fp>\u003Cblockquote>\"FP8 via --kv-cache-dtype fp8 remains the best default for KV-cache quantization.\" — vLLM blog, May 11, 2026\u003C\u002Fblockquote>\u003Cp>That quote is blunt, and it is also the most practical conclusion in the post. If your goal is to keep accuracy intact while reducing memory use, FP8 is the safe default. If your goal is to squeeze every last byte out of the cache, TurboQuant starts to look like a niche tool rather than a general recommendation.\u003C\u002Fp>\u003Cul>\u003Cli>Long-context retrieval was tested up to each model’s maximum supported length\u003C\u002Fli>\u003Cli>Accuracy was reported as average pass@1 across 5 repetitions\u003C\u002Fli>\u003Cli>TurboQuant k3v4-nc and 3bit-nc showed about 20-point drops on the hardest long-context cases\u003C\u002Fli>\u003Cli>On MiniMax-M2.7, aggressive TurboQuant variants dropped accuracy by up to about 8 points on AIME25 and LiveCodeBench-v6\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>Speed is where TurboQuant loses the argument\u003C\u002Fh2>\u003Cp>The speed results are even less flattering. vLLM measured latency with 1,024 input tokens and 256 output tokens, then swept batch sizes of 1, 8, 32, and 64. FP8 had negligible overhead across the board. TurboQuant did not.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778839858764-jlxl.png\" alt=\"TurboQuant vs FP8: vLLM’s first broad test\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>On Qwen3-30B-A3B-Instruct-2507, TurboQuant overhead ranged from roughly 10% to 60%. On Llama-3.3-70B-Instruct, it ranged from about 10% to 68%. The larger model also showed a worrying trend: overhead increased with batch size, which is the opposite of what serving teams want when traffic grows.\u003C\u002Fp>\u003Cp>Throughput told the same story. FP8 matched BF16 throughput on both models, while TurboQuant variants came in below baseline. For Qwen3-30B, throughput ranged from 80% of BF16 with k8v4 to 73% with 3bit-nc. For Llama-3.3-70B, the range was 75% down to 66%.\u003C\u002Fp>\u003Cp>That is the key takeaway for operators: lower KV-cache storage cost does not automatically translate into faster serving. Once you add dequantization cost back into the path, the math changes.\u003C\u002Fp>\u003Cul>\u003Cli>Latency tests used 10 warmup iterations and 30 measured iterations\u003C\u002Fli>\u003Cli>Throughput tests used 200 prompts across 256\u002F256, 1024\u002F512, and 4096\u002F256 token pairs\u003C\u002Fli>\u003Cli>vLLM version used: 0.20.2, commit \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fvllm\u002Fcommit\u002F6ec9bbec3\" target=\"_blank\" rel=\"noopener\">6ec9bbec3\u003C\u002Fa>\u003C\u002Fli>\u003Cli>FP8 matched BF16 on latency and throughput, while TurboQuant variants consistently fell behind\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>What this means for real deployments\u003C\u002Fh2>\u003Cp>The most practical reading of the study is that TurboQuant is a memory tool first and a performance tool second. If you are serving a model where KV-cache memory is the bottleneck and you can tolerate slower inference, TurboQuant 4bit-nc may be worth a pilot. If you care about latency, throughput, and accuracy at the same time, FP8 is the cleaner answer.\u003C\u002Fp>\u003Cp>There is also a hardware angle here. FP8 works well because it maps to native Tensor Core behavior on modern \u003Ca href=\"\u002Ftag\u002Fnvidia\">NVIDIA\u003C\u002Fa> GPUs, while TurboQuant has to unpack low-bit storage before attention runs. That extra unpacking step is exactly where the lost time goes.\u003C\u002Fp>\u003Cp>vLLM’s recommendation is more useful than a generic benchmark chart because it tells teams where to spend effort next. For most production systems, the next experiment should be FP8 first, then a narrow TurboQuant test only if memory pressure remains the real blocker. If you are planning a deployment on \u003Ca href=\"https:\u002F\u002Fwww.nvidia.com\u002Fen-us\u002Fdata-center\u002Fh100\u002F\" target=\"_blank\" rel=\"noopener\">H100\u003C\u002Fa> hardware, the question is not whether TurboQuant saves memory. It does. The real question is whether those savings are worth slower serving and weaker accuracy on the workloads that matter most.\u003C\u002Fp>\u003Cp>For related reading, see OraCore’s coverage of \u003Ca href=\"\u002Fnews\u002Ffp8-kv-cache-vllm\">FP8 KV-cache in vLLM\u003C\u002Fa> and \u003Ca href=\"\u002Fnews\u002Fkv-cache-optimization-guide\">KV-cache optimization strategies\u003C\u002Fa>.\u003C\u002Fp>\u003Ch2>Bottom line\u003C\u002Fh2>\u003Cp>vLLM’s first broad study says TurboQuant is useful only when memory pressure is severe enough to justify slower inference, and even then FP8 remains the default most teams should try first. The next question for the community is whether future low-bit KV-cache methods can keep TurboQuant’s memory gains while removing the dequantization tax that hurts real serving workloads.\u003C\u002Fp>","vLLM found FP8 KV-cache quantization beats TurboQuant on speed, while TurboQuant’s strongest variants hurt accuracy.","vllm.ai","https:\u002F\u002Fvllm.ai\u002Fblog\u002Fturboquant",null,"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778839858405-b5ao.png",[13,14,15,16,17],"TurboQuant","vLLM","KV-cache quantization","FP8","inference performance","en",0,false,"2026-05-15T10:10:37.219158+00:00","2026-05-15T10:10:37.198+00:00","done","38fd5461-9e9d-4059-b58c-6be8d8d6c89a","turboquant-vllm-comparison-fp8-kv-cache-en","research","381fb6c6-6da7-4444-831f-8c5eed8d685c","published",[30,31,32],"FP8 remains the best default for KV-cache quantization in vLLM.","TurboQuant’s stronger compression variants hurt both accuracy and serving speed.","TurboQuant 4bit-nc may help only when memory pressure matters more than latency.","3103988e-c4fe-45e3-98ab-846500c9d507","[-0.03132212,0.0003347188,-0.008275041,-0.093619466,-0.024271118,-0.023152264,0.0019086633,0.0052102515,-0.0013399707,0.0005440771,0.0027394616,-0.008597033,0.030170882,-0.018104685,0.11925398,0.029534295,-0.0025312018,0.0031189583,5.3769752e-05,-0.014527918,-0.000675662,0.018375635,0.013294236,-0.0137088,-0.02343512,-0.0046431725,0.024738664,0.002961558,0.02925034,0.016422572,-0.04918913,-0.012463054,0.008798653,0.003878698,0.003520625,0.017605847,0.010225872,-0.0038511353,-0.0032694754,0.019965276,0.002134311,-0.0067094318,0.010916758,-0.011074447,-0.030873336,0.019315002,0.0031726577,-0.01236144,-0.016040334,0.034037028,-0.016765382,0.0011218042,-0.015656145,-0.16656041,0.0056671514,0.013357773,-0.0032549945,-0.0067835096,0.033599954,-0.014862234,-0.027164208,0.027748665,-0.028789982,-0.02206056,-0.01898265,-0.013604581,0.023932071,-0.022334281,-0.03076145,-0.015083976,-0.0149814775,-0.004414671,-0.0067495904,-0.019409351,-0.0025756443,-0.027331905,0.03986978,0.01864571,0.023835788,0.02602571,0.015452858,-0.011333776,0.00824884,0.0076539847,0.0041482435,0.0047757663,-0.008386199,-0.019628385,0.002466444,-0.009495633,-0.005528226,-0.007228128,-0.0052915057,0.012053577,0.009052217,0.0028558485,-0.026686022,-0.009007196,-0.024023963,-0.0062121986,-0.010478287,-0.014000515,0.009019845,0.027093355,-0.018123439,-0.020494713,0.006154392,0.0035664483,-0.0019182884,0.009314103,0.017905114,-0.003273497,0.0023137755,0.029910138,-0.0059726895,-0.10517857,-0.02169013,-0.0050062397,-0.016068129,-0.017561946,-0.018607456,0.031949375,0.01467265,0.02004676,-0.006756424,-0.012202622,-0.016779657,-0.00798814,-0.01894092,0.031454876,-0.042744108,-0.0069510103,-0.0023606261,0.014736855,0.007496239,0.006401455,0.011354892,-0.0018610358,0.0010425325,-0.015620707,0.0019378738,-0.0014590697,-0.00021666105,-0.0077580223,0.002935244,-0.0047233594,-0.027997674,0.014486249,0.0045884643,0.025960539,0.0010754354,-0.016583107,0.0010833577,-0.025651157,0.0048504053,-0.022518538,0.012006209,0.0054326477,-0.009854922,-0.0048424653,-0.009184539,-0.015397674,-0.0035707282,-0.009459956,0.027849592,0.0016206714,-0.01892996,-0.004808523,0.0076290388,0.003401673,-0.014520271,-0.008129716,-0.0075744116,-0.008347704,0.009852624,-0.0090918,-0.026724037,-0.00013090372,-0.006230179,0.009913066,0.0173764,0.0064896243,0.005674704,-0.0038636278,-0.0074935276,0.018527588,0.017638167,0.013752949,0.004338969,0.021760609,-0.019562066,0.0132707935,0.039172642,-0.0197675,-0.0073062195,-0.013434277,-0.0063007623,0.013888812,-0.022952955,0.01392417,0.0020815788,-0.020551646,0.026351087,-0.0010546797,-0.03189755,-0.0026196735,0.019353654,-0.03081973,0.014589674,-0.018971095,-0.0013343898,0.022314994,0.005031741,-0.0016254879,-0.014101088,-0.008595953,0.0068263467,0.010371946,-0.0062985406,-0.010906178,0.011516418,-0.010950737,0.036331803,-0.0114972945,0.0014300457,0.0020494768,0.0061025363,-0.005381841,0.002856092,-0.011827602,0.0003285795,0.011496577,-0.0057856035,-0.010919476,0.03788629,0.018921327,-0.015624055,-0.0048213005,0.030980516,-0.008894634,-0.016253393,0.0054452927,0.027423982,-0.00637869,-0.00014799535,-0.01677355,0.02876484,0.0002688453,-0.024187172,0.023710877,-0.00025342134,-0.0068694153,-0.012861576,-0.035746768,0.022050498,0.0017472524,-0.01242163,0.03806531,0.0010237176,0.0090726875,0.020671956,0.0038418523,-0.012534988,-0.027791608,0.005055948,0.023671614,0.012963216,0.0031161811,-0.01878565,0.011657599,-0.019470945,0.009038519,0.03603476,0.00278472,0.002732812,-0.013834612,-0.03525709,0.03848898,0.005888523,0.024040159,0.021082453,0.010481959,0.0033441575,0.010664812,-0.025391592,0.009738418,-0.027994424,-0.015218418,-0.0023157103,-0.01033082,0.012764083,-0.00794271,-0.006208278,0.00058315514,-0.0038761615,-0.03786165,0.00696495,-0.017442433,-0.020197013,0.005786428,-0.008090598,0.004838186,0.0038003928,0.044134572,-0.017739464,0.021222344,-0.006302294,0.017408451,-0.016220035,-0.013161697,0.015262103,-0.020834617,0.0048884274,-0.015990855,-0.012983,0.008583831,-0.006529688,-0.0031565523,-0.016021343,-0.0239149,-0.004594711,-0.003521555,-0.015874835,0.008892579,0.0056600976,-0.019078841,0.0038331759,0.0015997697,0.015185411,0.0043963003,0.010191384,-0.011884791,0.028166717,-0.020509884,-0.0018989657,-0.009202981,0.000534805,-0.015218008,-0.02103185,-0.047583863,-0.010926906,-0.007502957,0.0031856305,0.0066432236,-0.031554304,-0.0026838097,0.003062205,0.0252322,-0.030323058,-0.026641238,0.0006096888,-0.0077326666,0.0132316565,-0.01802287,-0.028732806,0.03181023,-0.006261143,0.00067515566,0.0021978505,0.010366305,0.0329079,0.01136268,-0.01443018,0.009146478,0.02108722,-0.031455956,0.0121683795,-0.008358898,-0.03862903,-0.012434101,-0.002999331,0.0033605245,0.023983892,-0.0144353425,0.0026838488,0.004662899,-0.002651297,-0.012631677,0.0063864235,0.006720933,-0.0131902965,0.030305577,-0.010744578,-0.009152173,-0.019515686,0.018823728,0.028105922,0.019838845,0.03509239,0.03241471,0.010516133,0.0025500618,-0.006243228,0.02294334,-0.014896921,-0.0034493506,0.008202778,0.010009969,0.0049239546,-0.03985581,0.006447837,-0.008079393,-0.0123615125,-0.010570838,-0.009406145,0.0036351802,-0.009775825,0.011258535,-0.017818335,-0.021488747,-0.011501677,0.008859126,0.012763539,-0.021122795,-0.0010468726,0.00018150336,0.013354455,0.01582859,0.016421618,-0.0217575,0.00018872101,0.048079405,0.019551627,-0.0052000456,-0.0036479393,0.010428297,-0.027724298,-0.029254006,0.0095117185,-0.02410926,0.0046864306,-0.010719459,-0.019701043,0.00646622,-0.03291613,-0.039330997,-0.008962315,0.0010663249,0.0048182877,-0.02666844,-0.036677457,0.010335038,0.003789909,0.016038077,-0.008339633,-0.010830062,0.025328169,-0.015287236,-0.025082532,0.0067414427,0.044741973,0.0005123172,0.023924038,-0.005155834,0.005351221,-0.015706003,0.004775928,0.0031968963,-0.03037373,-0.018006321,-0.0051361294,-0.005860933,0.0032535593,0.017149156,-0.0017045818,-0.010075992,-0.006328924,0.009044266,-0.012875035,-0.014120302,0.010944435,0.04037812,0.00037362377,0.029253138,0.0011719997,0.008098226,-0.004031706,0.0058556525,-0.0102485325,-0.0017712236,-0.023718562,-0.0042216647,0.0054940446,-0.01728929,-0.02285162,0.015220396,0.0068615847,0.010893701,-0.028706355,-0.0033841298,0.030134063,0.03780513,0.034885556,-0.0060287514,0.0023572366,-0.010864421,-0.016237903,-0.025004761,-0.017991321,0.035989694,-0.0036649886,0.001468541,0.006150923,-0.0017408765,0.0034370017,0.005287056,0.0033242763,-0.020560198,-0.013594184,-0.00933496,-0.022511361,-0.015239434,0.036228627,-0.011221685,-0.0072644637,0.009737967,-0.0070574954,0.012874904,-0.034524873,-0.023866983,-0.03083748,-0.012860191,-0.0032090694,0.03282965,0.0036980775,-0.02529244,0.016360486,-0.017843103,0.040243693,0.019777134,-0.01654467,0.021859327,0.012408458,-0.02237579,0.0012089505,-0.0050653876,0.022720307,0.00980534,-0.017452605,0.025946027,-0.012205103,0.0011752651,-0.013555259,-0.014632873,0.033128098,-0.097623155,0.025793167,0.017954264,-0.025055028,-0.025274582,-0.020837855,-0.012350289,-0.020012593,-0.0072953603,0.02058085,0.021702094,-0.003529325,0.015361487,-0.01870253,-0.04084072,-0.010376619,-0.009464541,-0.01472592,0.02593637,-0.01613659,0.032383397,0.02388441,0.0035670192,0.0063820295,-0.013462986,-0.016344449,0.020886123,0.00019111107,0.014560517,-0.010958434,-0.03496032,-0.017367108,-0.0070942324,0.017430129,-0.0054718223,-0.0053746775,0.019852117,-0.0008777766,0.0013388578,0.010679465,0.012069196,0.011205691,-0.028149912,0.007931107,-0.022110863,0.003372497,-0.031274397,0.025618786,0.0024529304,0.017016713,-0.008580567,-0.024327805,-0.0053601284,-0.02135546,0.0056352112,0.019272946,-0.010002962,0.0023932878,-0.01110873,-0.0010306055,0.0007591355,-0.023802234,0.009918348,0.045994606,-0.022529671,-0.019318186,-0.025872562,0.039520968,-0.0012653044,0.0281457,-0.032993548,-0.0070322314,0.017537313,-0.0073710023,-0.023284994,-0.01793913,-0.00079585507,-0.002357808,0.004938215,-0.010789552,-0.021162791,-0.031609155,-0.08698807,0.009031902,-0.0013011943,-0.010177529,0.023222037,-0.0029908377,0.0012980552,-0.028056573,0.017715964,-0.013699356,-0.0131163765,0.006761085,-0.011021073,-0.011902241,-0.0023017903,0.0071904184,0.0003461782,-0.007362069,-0.008742668,-0.03540319,-0.01838135,-0.025797023,0.03599706,-0.015216325,-0.035333224,-0.01394962,0.0038525085,-0.002464068,0.003014479,0.0050797495,0.008747538,-0.123697214,-0.006730776,-0.020735037,-0.012714853,0.008160568,-0.004313073,0.00014547993,0.009408816,0.0117402,-0.0018540112,-0.017471727,-0.026393408,-0.0068733306,0.014593958,0.0075809998,0.1437417,-0.011347817,0.024125285,-0.019622155,-0.030545764,-0.01015527,-0.016912837,-0.015189882,0.030518204,0.0022192146,-0.0014023539,0.023743326,-0.009121384,0.009852181,0.02097166,0.042915262,0.008410795,0.0143135935,-0.00056942133,-0.0058016824,-0.003154583,0.0038070795,-0.03812276,0.008365641,0.010673076,0.0020302685,0.033432703,-0.01248263,-0.015584429,-0.002923506,-0.013409905,-0.0067099994,-0.003981756,-0.014927114,0.002157522,-0.007745364,-0.083072536,-0.0034347477,0.0015079693,0.008163086,-0.016424017,0.00057266117,0.013484256,0.040577214,0.004062187,0.0002604938,-0.00795188,0.0089733815,0.0067558414,-0.021876302,-0.0081720855,0.030413091,0.02097559,-0.0049502295,0.0050197854,-0.00645728,0.009109096,0.009008544,-0.029846853,-0.018605946,0.009005186,0.0060929917,-0.0032352812,0.004117312,-0.031543773,-0.004723964,0.020378113,-0.016893357,-0.012562114,0.009536512,-0.003476941,0.0027013253,-0.0077787507,0.021067442,-0.03547061,-0.019250283,0.005040887,0.008610943,-0.0031118419,9.50538e-06,0.01829232,0.01854825,-0.0043306537,0.036313437,0.0061245942,-0.017701594,-0.014867968,-0.016062327,-0.033208355,-0.0007076206,0.006913184,0.0032316877,0.014411764,0.013358569,-0.009718151]",[36,38,40,42,44],{"name":16,"slug":37},"fp8",{"name":14,"slug":39},"vllm",{"name":17,"slug":41},"inference-performance",{"name":13,"slug":43},"turboquant",{"name":15,"slug":45},"kv-cache-quantization",{"id":27,"slug":47,"title":48,"language":49},"turboquant-vllm-comparison-fp8-kv-cache-zh","TurboQuant 與 FP8 實測結果","zh",[51,57,63,69,75,81],{"id":52,"slug":53,"title":54,"cover_image":55,"image_url":55,"created_at":56,"category":26},"94994abd-e24d-4fd1-b941-942d03d19acf","turboquant-seo-shift-small-sites-en","TurboQuant and the SEO Shift for Small Sites","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778840455122-jfce.png","2026-05-15T10:20:28.134545+00:00",{"id":58,"slug":59,"title":60,"cover_image":61,"image_url":61,"created_at":62,"category":26},"5aef1c57-961f-49f7-8277-f83f7336799a","llmbda-calculus-agent-safety-rules-en","LLMbda calculus gives agents safety rules","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778825459914-obkf.png","2026-05-15T06:10:36.242145+00:00",{"id":64,"slug":65,"title":66,"cover_image":67,"image_url":67,"created_at":68,"category":26},"712a0357-f7cd-48f2-adde-c2691da0815f","low-complexity-beamspace-denoiser-mmwave-mimo-en","A simpler beamspace denoiser for mmWave MIMO","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778814646705-e7mx.png","2026-05-15T03:10:31.764301+00:00",{"id":70,"slug":71,"title":72,"cover_image":73,"image_url":73,"created_at":74,"category":26},"f595f949-6ea1-4b0e-a632-f1832ef26e36","ai-benchmark-wins-cyber-scare-defenders-en","Why AI benchmark wins in cyber should scare defenders","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778807444539-gz7f.png","2026-05-15T01:10:30.04579+00:00",{"id":76,"slug":77,"title":78,"cover_image":79,"image_url":79,"created_at":80,"category":26},"3ad202d1-9e5f-49c5-8383-02fcf1a23cf2","why-linux-security-needs-patch-wave-mindset-en","Why Linux security needs a patch-wave mindset","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778741441493-ikl6.png","2026-05-14T06:50:25.906256+00:00",{"id":82,"slug":83,"title":84,"cover_image":85,"image_url":85,"created_at":86,"category":26},"50662a29-bae9-4d88-b8d8-3d6a83680646","judge-reliability-harness-stress-tests-llm-judges-en","Judge Reliability Harness Stress-Tests LLM Judges","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778740862456-3f4y.png","2026-05-14T06:40:33.380748+00:00",[88,93,98,103,108,113,118,123,128,133],{"id":89,"slug":90,"title":91,"created_at":92},"a2715e72-1fe8-41b3-abb1-d0cf1f710189","ai-predictions-2026-big-changes-en","AI Predictions for 2026: Brace for Big Changes","2026-03-26T01:25:07.788356+00:00",{"id":94,"slug":95,"title":96,"created_at":97},"8404bd7b-4c2f-4109-9ec4-baf29d88af2b","ml-papers-of-the-week-github-research-desk-en","ML Papers of the Week Turns GitHub Into a Research Desk","2026-03-27T01:11:39.480259+00:00",{"id":99,"slug":100,"title":101,"created_at":102},"87897a94-8065-4464-a016-1f23e89e17cc","ai-ml-conferences-to-watch-in-2026-en","AI\u002FML Conferences to Watch in 2026","2026-03-27T01:51:54.184108+00:00",{"id":104,"slug":105,"title":106,"created_at":107},"6f1987cf-25f3-47a4-b3e6-db0997695be8","openclaw-agents-manipulated-self-sabotage-en","OpenClaw Agents Can Be Manipulated Into Failure","2026-03-28T03:03:18.899465+00:00",{"id":109,"slug":110,"title":111,"created_at":112},"a53571ad-735a-4178-9f93-cb09b699d99c","vega-driving-language-instructions-en","Vega: Driving with Natural Language Instructions","2026-03-28T14:54:04.698882+00:00",{"id":114,"slug":115,"title":116,"created_at":117},"a34581d6-f36e-46da-88bb-582fb3e7425c","personalizing-autonomous-driving-styles-en","Drive My Way: Personalizing Autonomous Driving Styles","2026-03-28T14:54:26.148181+00:00",{"id":119,"slug":120,"title":121,"created_at":122},"2bc1ad7f-26ce-4f02-9885-803b35fd229d","training-knowledge-bases-writeback-rag-en","Training Knowledge Bases with WriteBack-RAG","2026-03-28T14:54:45.643433+00:00",{"id":124,"slug":125,"title":126,"created_at":127},"71adc507-3c54-4605-bbe2-c966acd6187e","packforcing-long-video-generation-en","PackForcing: Efficient Long-Video Generation Method","2026-03-28T14:55:02.646943+00:00",{"id":129,"slug":130,"title":131,"created_at":132},"675942ef-b9ec-4c5f-a997-381250b6eacb","pixelsmile-facial-expression-editing-en","PixelSmile Framework Enhances Facial Expression Editing","2026-03-28T14:55:20.633463+00:00",{"id":134,"slug":135,"title":136,"created_at":137},"6954fa2b-8b66-4839-884b-e46f89fa1bc3","adaptive-block-scaled-data-types-en","IF4: Smarter 4-Bit Quantization That Adapts to Your Data","2026-03-31T06:00:36.65963+00:00"]