[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"article-turboquant-quantization-accuracy-performance-study-en":3,"article-related-turboquant-quantization-accuracy-performance-study-en":30,"series-research-aed5cbda-77cf-4dfe-8606-c8463a64403e":82},{"id":4,"slug":5,"title":6,"content":7,"summary":8,"source":9,"source_url":10,"author":11,"image_url":12,"cover_image":12,"category":13,"language":14,"translated_content":11,"related_article_id":15,"keywords":16,"key_takeaways":22,"views":26,"created_at":27,"published_at":28,"topic_cluster_id":29},"aed5cbda-77cf-4dfe-8606-c8463a64403e","turboquant-quantization-accuracy-performance-study-en","TurboQuant shows how 4-bit beats guesswork","\u003Cp data-speakable=\"summary\">\u003Ca href=\"\u002Ftag\u002Fturboquant\">TurboQuant\u003C\u002Fa>’s study turns quantization trade-offs into a practical deployment playbook.\u003C\u002Fp>\u003Cp>I’ve been tuning models long enough to know the part that always gets hand-waved: quantization. Everyone loves the demo where a model gets smaller and faster, then the first real workload lands and the accuracy drop shows up like a bad smell in a closed room. I’ve seen teams ship 8-bit because it felt safe, then spend a week chasing latency that barely moved. I’ve also seen people jump straight to 4-bit because the memory chart looked beautiful, only to discover the model got weird in exactly the places their users care about.\u003C\u002Fp>\u003Cp>That’s why I paid attention when I landed on \u003Ca href=\"https:\u002F\u002Fdasroot.net\u002Fposts\u002F2026\u002F05\u002Fturboquant-comprehensive-study-quantization-accuracy-performance\u002F\">TurboQuant Comprehensive Study: Quantization Accuracy and Performance\u003C\u002Fa> on dasroot.net. It’s not trying to sell me a miracle. It lays out the actual trade-offs between post-training quantization, quantization-aware training, static and dynamic quantization, then ties that to 2026-era framework support in \u003Ca href=\"https:\u002F\u002Fwww.tensorflow.org\u002F\">TensorFlow\u003C\u002Fa>, \u003Ca href=\"https:\u002F\u002Fpytorch.org\u002F\">PyTorch\u003C\u002Fa>, and deployment tools like \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Funslothai\u002Funsloth\">Unsloth\u003C\u002Fa> and \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FTensorRT-LLM\u002FTensorRT-LLM\">TensorRT-LLM\u003C\u002Fa>.\u003C\u002Fp>\u003Ch2>Stop treating quantization like one switch\u003C\u002Fh2>\u003Cblockquote>Quantization in machine learning refers to the process of reducing the precision of model weights and activations, typically from 32-bit floating point (FP32) to lower bit-width representations such as 8-bit integers (INT8) or even 4-bit.\u003C\u002Fblockquote>\u003Cp>What this actually means is that quantization is not a single optimization trick. It’s a family of compromises. You’re deciding how much numeric precision you’re willing to trade for memory savings, lower bandwidth, and faster inference. The article’s framing is useful because it starts from the boring truth: lower precision is cheaper, but it can bend the model in ways that are hard to predict until you benchmark.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1779287080437-oo8r.png\" alt=\"TurboQuant shows how 4-bit beats guesswork\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>I’ve run into this exact problem when a team wanted to shrink a transformer for edge deployment. The instinct was to ask, “What’s the smallest bit-width we can use?” That’s the wrong first question. The better question is, “Where does this model fail when we squeeze it?” Some models tolerate quantization surprisingly well. Others turn brittle in one task, one language, or one layer pattern. If you don’t know that ahead of time, you’re just gambling with a nicer chart.\u003C\u002Fp>\u003Cp>The study’s value is that it keeps the discussion grounded in deployment reality. Mobile, embedded, and edge devices care about memory and power. Server inference cares about throughput and cost per \u003Ca href=\"\u002Ftag\u002Ftoken\">token\u003C\u002Fa>. The same quantization choice will look smart in one place and stupid in another. That’s normal.\u003C\u002Fp>\u003Cp>How to apply it: before you choose a bit-width, write down the constraint you are actually optimizing for. If it’s RAM, say RAM. If it’s latency, say latency. If it’s accuracy on a narrow benchmark, say that. Then benchmark against that constraint instead of trying to make one quantization format satisfy every stakeholder in the room.\u003C\u002Fp>\u003Cul>\u003Cli>Use FP32 only when you truly need the headroom.\u003C\u002Fli>\u003Cli>Use INT8 when you want a conservative compression step.\u003C\u002Fli>\u003Cli>Use 4-bit only after you know which layers and tasks are sensitive.\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>PTQ is the fast path, QAT is the insurance policy\u003C\u002Fh2>\u003Cblockquote>Post-training quantization involves applying quantization to a pre-trained model without retraining it.\u003C\u002Fblockquote>\u003Cp>What this actually means is that PTQ is the cheap, quick path. You take a trained model and compress it after the fact. That’s attractive because it doesn’t force you to rerun a whole training pipeline. But the study is honest about the downside: you can lose accuracy because the model never learned to live with lower precision.\u003C\u002Fp>\u003Cp>The opposite is quantization-aware training. The article describes QAT as integrating quantization into training so the model adapts to the lower-precision world before deployment. That usually preserves accuracy better, but it costs more time and compute. I’ve had projects where QAT was the only sane answer because the model was too sensitive for PTQ. I’ve also had projects where QAT was just overkill. The annoying part is that both camps can be right.\u003C\u002Fp>\u003Cp>This is where teams get sloppy. They start with PTQ because it’s easy, then they blame the model when accuracy drops. Or they jump to QAT because it sounds safer, then burn engineering time on a training loop that wasn’t needed. The study’s practical message is simple: PTQ is the default for speed, QAT is the fallback when accuracy matters enough to pay for it.\u003C\u002Fp>\u003Cp>How to apply it: try PTQ first on a validation set that reflects real usage, not just a clean benchmark. If the drop is acceptable, stop there. If it isn’t, move to QAT and measure whether the recovered accuracy is worth the extra training cost. Don’t decide this by instinct. Decide it by the size of the regression and the cost of the retrain.\u003C\u002Fp>\u003Cul>\u003Cli>PTQ: good for quick deployment and low-risk compression.\u003C\u002Fli>\u003Cli>QAT: good when the model has to stay sharp under quantization.\u003C\u002Fli>\u003Cli>Pick the method based on the cost of accuracy loss, not habit.\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>Static and dynamic quantization are about when you pay the calibration bill\u003C\u002Fh2>\u003Cblockquote>Static quantization determines the range of values for weights and activations during the calibration phase, which is typically done using a representative dataset.\u003C\u002Fblockquote>\u003Cp>What this actually means is that static quantization asks you to do the careful work up front. You calibrate with representative data, lock in ranges, and then run inference with those assumptions. It’s more stable and usually more accurate, but you have to do the calibration step correctly or you get nonsense.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1779287080196-v13q.png\" alt=\"TurboQuant shows how 4-bit beats guesswork\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>Dynamic quantization shifts some of that work to inference time. The article describes it as adjusting ranges dynamically during inference, which can be more efficient in some scenarios but may cost a bit of accuracy. I like this distinction because it explains why teams keep arguing about “static vs dynamic” as if one is always better. They are solving different problems. Static wants predictability. Dynamic wants flexibility.\u003C\u002Fp>\u003Cp>I’ve seen static quantization fail for a dumb reason: the calibration dataset was too neat. It didn’t include the messy inputs the production system actually saw, so the ranges were wrong in all the interesting places. The model looked fine in a notebook and then started drifting once real traffic hit it. That’s not a quantization bug. That’s a calibration bug.\u003C\u002Fp>\u003Cp>How to apply it: if you choose static quantization, spend real effort on the calibration set. Make it representative of actual inputs, not a curated highlight reel. If you choose dynamic quantization, verify that the runtime overhead doesn’t erase the gains you were chasing. Either way, measure on the hardware you plan to ship.\u003C\u002Fp>\u003Cp>For TensorFlow users, the article points to tools like \u003Ca href=\"https:\u002F\u002Fwww.tensorflow.org\u002Flite\u002Fconvert\">TFLiteConverter\u003C\u002Fa> and the \u003Ca href=\"https:\u002F\u002Fwww.tensorflow.org\u002Fmodel_optimization\">TensorFlow Model Optimization Toolkit\u003C\u002Fa>. That matters because quantization is not just a theory problem. It’s a tooling problem too.\u003C\u002Fp>\u003Ch2>8-bit is the boring answer, and boring is often correct\u003C\u002Fh2>\u003Cblockquote>Recent benchmarks from 2026 demonstrate that 8-bit quantization typically retains over 99% of the original model’s accuracy across academic and real-world tasks.\u003C\u002Fblockquote>\u003Cp>What this actually means is that 8-bit is the safe middle ground. It gives you a meaningful reduction in memory and a real inference boost without asking the model to survive a brutal compression step. The study cites 2026 benchmarks showing 8-bit usually keeps over 99% of original accuracy, while 4-bit lands a bit lower but still strong on standard benchmarks like MMLU and ArenaHard.\u003C\u002Fp>\u003Cp>I know “boring” sounds like an insult, but in production boring is what you want when the business depends on the model not embarrassing you. If the application is a cloud chatbot, a search reranker, or an enterprise classifier where a small accuracy drop is expensive, 8-bit is often the cleanest answer. You get a lot of the benefit without forcing the team into a rescue mission.\u003C\u002Fp>\u003Cp>The article also calls out scheme choices like W8A8-INT and W8A8-FP, which is a reminder that “8-bit” is not one thing. Hardware matters. The same quantized model can behave differently on \u003Ca href=\"\u002Ftag\u002Fnvidia\">NVIDIA\u003C\u002Fa>, AMD, or Intel platforms depending on how the backend handles the math. That’s the part people forget when they compare benchmark numbers from different blogs and pretend they’re interchangeable.\u003C\u002Fp>\u003Cp>How to apply it: if you’re deploying server-side and you care about keeping accuracy near baseline, start with 8-bit. Benchmark throughput, memory, and output quality on your real workload. If the savings are enough, stop there. Don’t keep squeezing just because 4-bit exists.\u003C\u002Fp>\u003Cul>\u003Cli>Good default for production inference.\u003C\u002Fli>\u003Cli>Usually easier to justify to stakeholders.\u003C\u002Fli>\u003Cli>Less likely to trigger weird quality regressions.\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>4-bit is where the real trade-off starts to bite\u003C\u002Fh2>\u003Cblockquote>4-bit quantization shows slightly lower but still impressive results, often maintaining 96–98% accuracy on standardized benchmarks like MMLU and ArenaHard.\u003C\u002Fblockquote>\u003Cp>What this actually means is that 4-bit is not a free lunch, but it can be a very good deal if memory is your real bottleneck. The study says 4-bit often keeps 96–98% accuracy on standardized benchmarks, while delivering much stronger compression and in some cases better latency. That’s enough to make a difference on edge devices, laptops, and smaller servers where every gigabyte counts.\u003C\u002Fp>\u003Cp>I’ve had the “let’s just use 4-bit” conversation enough times to know the trap. People see the compression ratio and forget that the model’s behavior can get more brittle. Sometimes that brittleness is acceptable. Sometimes it’s deadly. If your workload is conversational and the model needs to stay coherent across long prompts, the difference between “pretty good” and “slightly off” can matter a lot more than the benchmark suggests.\u003C\u002Fp>\u003Cp>The article mentions \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Funslothai\u002Funsloth\">Unsloth Dynamic v2.0\u003C\u002Fa>, which it says became the default quantization method for future GGUF uploads, and the \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FIST-DASLab\u002Fbrevitas\">Brevitas\u003C\u002Fa> ecosystem around Qronos. That’s useful because it shows the field is moving toward smarter 4-bit methods, not just cruder compression. The point isn’t to crush weights harder. It’s to preserve behavior with better calibration, layer selection, and error handling.\u003C\u002Fp>\u003Cp>How to apply it: choose 4-bit when the deployment target is constrained and you’ve already accepted that some quality loss may be worth the gain. Use a representative calibration set, inspect failure cases, and compare against 8-bit on the same hardware. If 4-bit is only a tiny bit faster but much worse in output quality, don’t force it.\u003C\u002Fp>\u003Cp>For local and edge deployment, the article’s examples make the point clearly: 4-bit can be the difference between “fits” and “doesn’t fit.” That’s not theory. That’s whether the model runs at all.\u003C\u002Fp>\u003Ch2>2026 tooling is finally making quantization less annoying\u003C\u002Fh2>\u003Cblockquote>TensorFlow 2.17 introduced enhanced support for dynamic quantization and optimized the conversion process for TFLite models, achieving up to 3x speed improvements in inference on Intel Arc B580 Graphics.\u003C\u002Fblockquote>\u003Cp>What this actually means is that the ecosystem is catching up to the need for practical quantization workflows. The article points to TensorFlow 2.17 and PyTorch 2.7 as examples of frameworks making quantization easier and faster to use. That matters because a good idea with bad tooling usually dies in a backlog ticket.\u003C\u002Fp>\u003Cp>I care about this part more than I used to. Not because I enjoy framework releases, but because deployment friction is the hidden tax on model optimization. If quantization requires a weird script, a fragile conversion path, or a backend that only one engineer understands, it won’t get used consistently. Teams will quietly fall back to larger models because they’re easier to reason about.\u003C\u002Fp>\u003Cp>The article also mentions TensorRT and \u003Ca href=\"https:\u002F\u002Fdeveloper.nvidia.com\u002Ftensorrt\">TensorRT\u003C\u002Fa> optimizations, including support for FP8, FP4, INT8, and INT4. That’s the kind of backend support that turns quantization from a research topic into an ops decision. When the runtime can actually use the precision format you picked, the whole pipeline becomes less annoying.\u003C\u002Fp>\u003Cp>How to apply it: check the quantization support in your actual stack before you commit to a format. TensorFlow, PyTorch, TFLite, TensorRT, and mobile runtimes all differ in what they support well. If your chosen precision doesn’t map cleanly to your deployment backend, the “optimization” will cost you more than it saves.\u003C\u002Fp>\u003Cul>\u003Cli>Tooling support matters as much as the algorithm.\u003C\u002Fli>\u003Cli>Backend compatibility can make or break the win.\u003C\u002Fli>\u003Cli>Benchmark on the target runtime, not just in training code.\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>The template you can copy\u003C\u002Fh2>\u003Cpre>\u003Ccode># Quantization selection checklist\n\n## Goal\n- Primary constraint: [accuracy | latency | memory | power | cost]\n- Deployment target: [server | mobile | edge | embedded]\n- Acceptable accuracy drop: [e.g. &lt;1%, &lt;3%, task-specific]\n- Hardware target: [NVIDIA | AMD | Intel | Apple | ARM]\n\n## Step 1: Start with PTQ\n- Apply post-training quantization to the trained model.\n- Use a validation set that matches production inputs.\n- Measure:\n  - accuracy \u002F task quality\n  - latency\n  - throughput\n  - memory use\n  - model size\n\n## Step 2: Decide if PTQ is good enough\n- Keep PTQ if the accuracy drop is acceptable.\n- Move to QAT if the model regresses too much.\n- Do not pick QAT unless the accuracy gain justifies retraining cost.\n\n## Step 3: Choose bit-width\n- 8-bit if you want the safest production default.\n- 4-bit if memory or edge deployment is the real bottleneck.\n- 3-bit or lower only if you have specialized tooling and strong benchmark evidence.\n\n## Step 4: Pick the quantization mode\n- Static quantization if you can calibrate with representative data.\n- Dynamic quantization if you need runtime flexibility or simpler setup.\n\n## Step 5: Calibrate properly\n- Build a representative calibration set.\n- Include messy, real-world inputs.\n- Avoid using only clean or synthetic samples.\n\n## Step 6: Benchmark on target hardware\n- Run the quantized model on the exact runtime you plan to ship.\n- Compare against the FP32 baseline.\n- Record quality regressions by task, not just aggregate score.\n\n## Decision rule\n- Use 8-bit for server inference when accuracy matters most.\n- Use 4-bit for edge, mobile, or latency-sensitive deployments.\n- Use QAT when PTQ loses too much quality.\n- Re-evaluate whenever the framework, backend, or model architecture changes.\n\n## Practical notes\n- TensorFlow users: check TFLiteConverter and Model Optimization Toolkit support.\n- PyTorch users: validate backend support before assuming parity.\n- TensorRT users: confirm precision support for your GPU generation.\n- If the benchmark looks great but production quality drops, the calibration set is probably the problem.\n\u003C\u002Fcode>\u003C\u002Fpre>\u003Cp>If I were turning the article into an internal team checklist, I’d use exactly this flow. It keeps the decision from turning into vibes. It also forces the team to write down the constraint before they start arguing about bit-widths like it’s a moral issue.\u003C\u002Fp>\u003Cp>The main thing TurboQuant gets right is that quantization is not about squeezing every model as hard as possible. It’s about matching the compression method to the deployment problem. That sounds obvious, but I’ve watched enough teams ignore it to know it needs repeating.\u003C\u002Fp>\u003Cp>Source: \u003Ca href=\"https:\u002F\u002Fdasroot.net\u002Fposts\u002F2026\u002F05\u002Fturboquant-comprehensive-study-quantization-accuracy-performance\u002F\">dasroot.net\u003C\u002Fa>. I’m breaking down the article’s ideas and stitching them into a practical workflow; the template above is my editorial rewrite, not a verbatim copy of the post.\u003C\u002Fp>","I break down TurboQuant’s quantization study into a practical playbook for choosing 8-bit, 4-bit, PTQ, or QAT.","dasroot.net","https:\u002F\u002Fdasroot.net\u002Fposts\u002F2026\u002F05\u002Fturboquant-comprehensive-study-quantization-accuracy-performance\u002F",null,"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1779287080437-oo8r.png","research","en","456ad15d-693b-4a13-8896-23d26e57c4de",[17,18,19,20,21],"quantization","INT8","INT4","TensorFlow","PyTorch",[23,24,25],"Start with the deployment constraint, not the bit-width.","PTQ is the fast default; QAT is the accuracy recovery path.","8-bit is the safest production choice, 4-bit is for tighter memory and edge targets.",2,"2026-05-20T14:24:12.033527+00:00","2026-05-20T14:24:12.02+00:00","3103988e-c4fe-45e3-98ab-846500c9d507",{"tags":31,"relatedLang":41,"relatedPosts":45},[32,33,35,37,39],{"name":17,"slug":17},{"name":18,"slug":34},"int8",{"name":21,"slug":36},"pytorch",{"name":20,"slug":38},"tensorflow",{"name":19,"slug":40},"int4",{"id":15,"slug":42,"title":43,"language":44},"turboquant-quantization-accuracy-performance-study-zh","TurboQuant 讓 4-bit 不再亂猜","zh",[46,52,58,64,70,76],{"id":47,"slug":48,"title":49,"cover_image":50,"image_url":50,"created_at":51,"category":13},"850449f2-e75b-4dbf-97c0-3590c6cbf097","crdts-keep-replicas-in-sync-without-locks-en","CRDTs keep replicas in sync without locks","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781011086602-cokl.png","2026-06-09T13:17:35.890527+00:00",{"id":53,"slug":54,"title":55,"cover_image":56,"image_url":56,"created_at":57,"category":13},"7c6b6428-ba8d-4c59-840b-cf96a95139e5","post-deterministic-systems-autonomous-infra-en","Post-Deterministic Systems for Autonomous Infra","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781010190497-1grq.png","2026-06-09T13:02:33.235795+00:00",{"id":59,"slug":60,"title":61,"cover_image":62,"image_url":62,"created_at":63,"category":13},"53ec2203-e127-4bf8-8b3d-2dce8d156a54","causal-learnability-formal-language-tasks-en","Causal methods for measuring task learnability","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780987698514-ky8m.png","2026-06-09T06:47:35.103221+00:00",{"id":65,"slug":66,"title":67,"cover_image":68,"image_url":68,"created_at":69,"category":13},"55e7197e-f114-4b6c-b3e2-af1a3cd9dfa4","rl-training-hands-off-control-gradually-en","RL Training That Hands Off Control Gradually","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780986801034-gf8m.png","2026-06-09T06:32:33.516452+00:00",{"id":71,"slug":72,"title":73,"cover_image":74,"image_url":74,"created_at":75,"category":13},"93fc6735-b524-4baf-989f-645c4c47d593","omnigamearena-vlm-game-agent-benchmark-en","OmniGameArena benchmarks VLM game agents better","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780985895695-ugcj.png","2026-06-09T06:17:32.668876+00:00",{"id":77,"slug":78,"title":79,"cover_image":80,"image_url":80,"created_at":81,"category":13},"9f0c9505-6d75-411c-ba46-2382e8f295a5","turboquant-cuts-kv-cache-memory-6x-google-tests-en","TurboQuant cuts KV cache memory 6x in Google tests","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780906679116-fqdo.png","2026-06-08T08:17:22.276769+00:00",[83,88,93,98,103,108,113,118,123,128],{"id":84,"slug":85,"title":86,"created_at":87},"a2715e72-1fe8-41b3-abb1-d0cf1f710189","ai-predictions-2026-big-changes-en","AI Predictions for 2026: Brace for Big Changes","2026-03-26T01:25:07.788356+00:00",{"id":89,"slug":90,"title":91,"created_at":92},"8404bd7b-4c2f-4109-9ec4-baf29d88af2b","ml-papers-of-the-week-github-research-desk-en","ML Papers of the Week Turns GitHub Into a Research Desk","2026-03-27T01:11:39.480259+00:00",{"id":94,"slug":95,"title":96,"created_at":97},"87897a94-8065-4464-a016-1f23e89e17cc","ai-ml-conferences-to-watch-in-2026-en","AI\u002FML Conferences to Watch in 2026","2026-03-27T01:51:54.184108+00:00",{"id":99,"slug":100,"title":101,"created_at":102},"6f1987cf-25f3-47a4-b3e6-db0997695be8","openclaw-agents-manipulated-self-sabotage-en","OpenClaw Agents Can Be Manipulated Into Failure","2026-03-28T03:03:18.899465+00:00",{"id":104,"slug":105,"title":106,"created_at":107},"a53571ad-735a-4178-9f93-cb09b699d99c","vega-driving-language-instructions-en","Vega: Driving with Natural Language Instructions","2026-03-28T14:54:04.698882+00:00",{"id":109,"slug":110,"title":111,"created_at":112},"a34581d6-f36e-46da-88bb-582fb3e7425c","personalizing-autonomous-driving-styles-en","Drive My Way: Personalizing Autonomous Driving Styles","2026-03-28T14:54:26.148181+00:00",{"id":114,"slug":115,"title":116,"created_at":117},"2bc1ad7f-26ce-4f02-9885-803b35fd229d","training-knowledge-bases-writeback-rag-en","Training Knowledge Bases with WriteBack-RAG","2026-03-28T14:54:45.643433+00:00",{"id":119,"slug":120,"title":121,"created_at":122},"71adc507-3c54-4605-bbe2-c966acd6187e","packforcing-long-video-generation-en","PackForcing: Efficient Long-Video Generation Method","2026-03-28T14:55:02.646943+00:00",{"id":124,"slug":125,"title":126,"created_at":127},"675942ef-b9ec-4c5f-a997-381250b6eacb","pixelsmile-facial-expression-editing-en","PixelSmile Framework Enhances Facial Expression Editing","2026-03-28T14:55:20.633463+00:00",{"id":129,"slug":130,"title":131,"created_at":132},"6954fa2b-8b66-4839-884b-e46f89fa1bc3","adaptive-block-scaled-data-types-en","IF4: Smarter 4-Bit Quantization That Adapts to Your Data","2026-03-31T06:00:36.65963+00:00"]