[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"article-llms-implicit-grammar-representations-en":3,"tags-llms-implicit-grammar-representations-en":34,"related-lang-llms-implicit-grammar-representations-en":44,"related-posts-llms-implicit-grammar-representations-en":48,"series-research-22c43f4e-8be9-4440-bd1b-74a00b60dfa3":85},{"id":4,"title":5,"content":6,"summary":7,"source":8,"source_url":9,"author":10,"image_url":11,"keywords":12,"language":18,"translated_content":10,"views":19,"is_premium":20,"created_at":21,"updated_at":21,"cover_image":11,"published_at":22,"rewrite_status":23,"rewrite_error":10,"rewritten_from_id":24,"slug":25,"category":26,"related_article_id":27,"status":28,"google_indexed_at":29,"x_posted_at":10,"tweet_text":10,"title_rewritten_at":10,"title_original":10,"key_takeaways":30,"topic_cluster_id":10,"embedding":10,"is_canonical_seed":20},"22c43f4e-8be9-4440-bd1b-74a00b60dfa3","Do LLMs Learn Grammar Beyond Likelihood?","\u003Cp data-speakable=\"summary\">Language models appear to encode grammaticality in hidden layers, beyond raw string likelihood.\u003C\u002Fp>\u003Cp>Pretrained language models are good at producing grammatical text, but that does not automatically mean their probability scores cleanly separate grammatical from ungrammatical sentences. This paper asks a narrower, practical question: do LMs carry an internal signal for grammaticality that is different from the probability they assign to a string?\u003C\u002Fp>\u003Cp>The answer, based on a simple linear probe, is yes to a degree. The probe finds a grammaticality signal in hidden representations that generalizes beyond the training setup, including to human judgment benchmarks and even to other languages. But the same signal is not a universal replacement for likelihood: on plausibility tasks where both sentences are grammatical, string probability still does better.\u003C\u002Fp>\u003Ch2>What problem this paper is trying to fix\u003C\u002Fh2>\u003Cp>For engineers working with language models, “the model likes this sentence” is often used as a proxy for “the sentence is grammatical.” The paper points out why that shortcut is shaky. Grammaticality and likelihood are not the same thing in human language, and the abstract says LMs can generate well-formed text and handle tightly controlled minimal pairs, yet their raw string probabilities do not sharply separate grammatical from ungrammatical sentences overall.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778135464967-fzem.png\" alt=\"Do LLMs Learn Grammar Beyond Likelihood?\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>That creates a practical gap. If you want to use an LM as a grammar signal—for filtering text, scoring outputs, or building evaluation tools—you need to know whether the model has learned grammar as a distinct internal feature, or whether any apparent success is just a byproduct of \u003Ca href=\"\u002Ftag\u002Ftoken\">token\u003C\u002Fa> likelihood.\u003C\u002Fp>\u003Cp>This paper is trying to separate those two possibilities. Instead of looking only at output probabilities, it looks inside the model and asks whether hidden layers contain a more direct grammaticality representation.\u003C\u002Fp>\u003Ch2>How the method works in plain English\u003C\u002Fh2>\u003Cp>The authors train a linear probe on internal representations from language models. In plain English, a probe is a lightweight classifier that tries to read one property from a model’s hidden state. Here, the property is grammaticality.\u003C\u002Fp>\u003Cp>To create training data, they use a naturalistic text corpus and then generate synthetic ungrammatical sentences by applying perturbations. That gives them pairs of grammatical and altered ungrammatical examples without needing to hand-label everything from scratch. The probe learns to distinguish between those two classes using the model’s internal activations.\u003C\u002Fp>\u003Cp>The key idea is that if a simple linear probe can recover grammaticality, then the information is likely present in the hidden layers in a usable form. The authors then test whether this probe generalizes to other settings, including human-curated grammaticality judgment benchmarks and languages beyond English.\u003C\u002Fp>\u003Cp>Importantly, the paper is not claiming that the probe is a full theory of grammar or that it “understands” syntax in a human sense. It is making a narrower representational claim: the model seems to encode a grammaticality distinction that is not reducible to string probability alone.\u003C\u002Fp>\u003Ch2>What the paper actually shows\u003C\u002Fh2>\u003Cp>The strongest result is that the grammaticality probe generalizes to human-curated grammaticality judgment benchmarks and outperforms LM probability-based grammaticality judgments. That is the main practical takeaway: if your goal is to decide whether a sentence is grammatical, a learned probe over hidden states appears more effective than just using the model’s likelihood score.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778135472303-uh0l.png\" alt=\"Do LLMs Learn Grammar Beyond Likelihood?\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>The paper also reports a contrast that matters. When the same probe is applied to semantic plausibility benchmarks—cases where both sentences are grammatical but one is more plausible than the other—the probe performs worse than string probability. In other words, the probe seems better at grammaticality than at plausibility.\u003C\u002Fp>\u003Cp>That distinction is useful because it suggests the probe is not just learning a generic “better sentence” score. It is picking up something closer to syntax or form, while likelihood still carries stronger information for plausibility judgments.\u003C\u002Fp>\u003Cp>The authors also report nontrivial cross-lingual generalization. An English-trained probe outperforms string probabilities on grammaticality benchmarks in numerous other languages. That is a meaningful result for multilingual work, because it suggests the learned signal is not limited to one language’s surface patterns.\u003C\u002Fp>\u003Cp>Finally, the paper says probe scores correlate only weakly with string probabilities. That weak correlation is one of the clearest signs that grammaticality and likelihood are not the same axis inside the model. The hidden layers appear to carry some separate grammaticality information, even if it is incomplete.\u003C\u002Fp>\u003Cp>The abstract does not provide benchmark numbers, dataset sizes, or exact model names, so those details are not available from the source text here.\u003C\u002Fp>\u003Ch2>Why developers should care\u003C\u002Fh2>\u003Cp>If you build systems on top of language models, this paper is a reminder not to treat likelihood as a universal proxy for language quality. A model can assign probabilities in a way that reflects many things at once: syntax, token frequency, discourse fit, and semantic plausibility. This work suggests those signals can be partially separated.\u003C\u002Fp>\u003Cp>That matters for several practical workflows:\u003C\u002Fp>\u003Cul>\u003Cli>Grammar checking and text filtering, where you may want a signal closer to well-formedness than to generic fluency.\u003C\u002Fli>\u003Cli>Evaluation pipelines, where likelihood-based scoring can blur grammaticality with plausibility.\u003C\u002Fli>\u003Cli>Cross-lingual applications, where a signal learned in English may still transfer better than expected.\u003C\u002Fli>\u003Cli>Interpretability work, where probes can help map what information is actually present in hidden states.\u003C\u002Fli>\u003C\u002Ful>\u003Cp>At the same time, the limitations are just as important. The paper does not show that the probe is a perfect grammar detector, only that it improves on probability-based judgments in the tested settings. It also performs worse on plausibility benchmarks, so it is not a drop-in replacement for likelihood in every task.\u003C\u002Fp>\u003Cp>There is also an open question about robustness. A linear probe can reveal what is linearly accessible in a representation, but that does not tell you how the model computes it, whether the signal is stable across architectures, or how it changes with scale and training data. The abstract also does not tell us how sensitive the result is to the specific perturbations used to create ungrammatical examples.\u003C\u002Fp>\u003Cp>Still, the core message is solid and useful: language models seem to encode grammaticality in their hidden layers, and that signal is only partly reflected in output probabilities. For anyone building tools that depend on sentence well-formedness, that is a useful reason to look beyond raw likelihood and toward internal representations.\u003C\u002Fp>\u003Ch2>Bottom line\u003C\u002Fh2>\u003Cp>This paper argues that pretrained language models learn an implicit grammaticality signal that can be read out from hidden layers with a simple linear probe. The signal generalizes better than probability-based scoring on grammaticality tasks, but it does not replace likelihood for plausibility judgments.\u003C\u002Fp>","A probe study finds hidden layers in language models encode grammaticality better than string probability, but not plausibility.","arxiv.org","https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.05197",null,"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778135464967-fzem.png",[13,14,15,16,17],"language models","grammar","linear probes","hidden representations","cross-lingual generalization","en",2,false,"2026-05-07T06:30:35.804749+00:00","2026-05-07T06:30:35.791+00:00","done","d02e9f8f-f2c0-4251-b671-619e1ec2c8d9","llms-implicit-grammar-representations-en","research","f07807ac-d51e-413e-a08a-42b6045d1e90","published","2026-05-07T09:00:17.923+00:00",[31,32,33],"Hidden layers can encode grammaticality beyond raw likelihood.","A linear probe beats probability-based judgments on grammaticality benchmarks.","The same probe is weaker on plausibility tasks and only partly reflects string probability.",[35,36,38,40,42],{"name":14,"slug":14},{"name":15,"slug":37},"linear-probes",{"name":13,"slug":39},"language-models",{"name":17,"slug":41},"cross-lingual-generalization",{"name":16,"slug":43},"hidden-representations",{"id":27,"slug":45,"title":46,"language":47},"llms-implicit-grammar-representations-zh","LLM 學到文法了嗎？","zh",[49,55,61,67,73,79],{"id":50,"slug":51,"title":52,"cover_image":53,"image_url":53,"created_at":54,"category":26},"94994abd-e24d-4fd1-b941-942d03d19acf","turboquant-seo-shift-small-sites-en","TurboQuant and the SEO Shift for Small Sites","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778840455122-jfce.png","2026-05-15T10:20:28.134545+00:00",{"id":56,"slug":57,"title":58,"cover_image":59,"image_url":59,"created_at":60,"category":26},"670a7f69-911f-41e8-a18b-7d3491253a19","turboquant-vllm-comparison-fp8-kv-cache-en","TurboQuant vs FP8: vLLM’s first broad test","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778839858405-b5ao.png","2026-05-15T10:10:37.219158+00:00",{"id":62,"slug":63,"title":64,"cover_image":65,"image_url":65,"created_at":66,"category":26},"5aef1c57-961f-49f7-8277-f83f7336799a","llmbda-calculus-agent-safety-rules-en","LLMbda calculus gives agents safety rules","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778825459914-obkf.png","2026-05-15T06:10:36.242145+00:00",{"id":68,"slug":69,"title":70,"cover_image":71,"image_url":71,"created_at":72,"category":26},"712a0357-f7cd-48f2-adde-c2691da0815f","low-complexity-beamspace-denoiser-mmwave-mimo-en","A simpler beamspace denoiser for mmWave MIMO","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778814646705-e7mx.png","2026-05-15T03:10:31.764301+00:00",{"id":74,"slug":75,"title":76,"cover_image":77,"image_url":77,"created_at":78,"category":26},"f595f949-6ea1-4b0e-a632-f1832ef26e36","ai-benchmark-wins-cyber-scare-defenders-en","Why AI benchmark wins in cyber should scare defenders","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778807444539-gz7f.png","2026-05-15T01:10:30.04579+00:00",{"id":80,"slug":81,"title":82,"cover_image":83,"image_url":83,"created_at":84,"category":26},"3ad202d1-9e5f-49c5-8383-02fcf1a23cf2","why-linux-security-needs-patch-wave-mindset-en","Why Linux security needs a patch-wave mindset","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778741441493-ikl6.png","2026-05-14T06:50:25.906256+00:00",[86,91,96,101,106,111,116,121,126,131],{"id":87,"slug":88,"title":89,"created_at":90},"a2715e72-1fe8-41b3-abb1-d0cf1f710189","ai-predictions-2026-big-changes-en","AI Predictions for 2026: Brace for Big Changes","2026-03-26T01:25:07.788356+00:00",{"id":92,"slug":93,"title":94,"created_at":95},"8404bd7b-4c2f-4109-9ec4-baf29d88af2b","ml-papers-of-the-week-github-research-desk-en","ML Papers of the Week Turns GitHub Into a Research Desk","2026-03-27T01:11:39.480259+00:00",{"id":97,"slug":98,"title":99,"created_at":100},"87897a94-8065-4464-a016-1f23e89e17cc","ai-ml-conferences-to-watch-in-2026-en","AI\u002FML Conferences to Watch in 2026","2026-03-27T01:51:54.184108+00:00",{"id":102,"slug":103,"title":104,"created_at":105},"6f1987cf-25f3-47a4-b3e6-db0997695be8","openclaw-agents-manipulated-self-sabotage-en","OpenClaw Agents Can Be Manipulated Into Failure","2026-03-28T03:03:18.899465+00:00",{"id":107,"slug":108,"title":109,"created_at":110},"a53571ad-735a-4178-9f93-cb09b699d99c","vega-driving-language-instructions-en","Vega: Driving with Natural Language Instructions","2026-03-28T14:54:04.698882+00:00",{"id":112,"slug":113,"title":114,"created_at":115},"a34581d6-f36e-46da-88bb-582fb3e7425c","personalizing-autonomous-driving-styles-en","Drive My Way: Personalizing Autonomous Driving Styles","2026-03-28T14:54:26.148181+00:00",{"id":117,"slug":118,"title":119,"created_at":120},"2bc1ad7f-26ce-4f02-9885-803b35fd229d","training-knowledge-bases-writeback-rag-en","Training Knowledge Bases with WriteBack-RAG","2026-03-28T14:54:45.643433+00:00",{"id":122,"slug":123,"title":124,"created_at":125},"71adc507-3c54-4605-bbe2-c966acd6187e","packforcing-long-video-generation-en","PackForcing: Efficient Long-Video Generation Method","2026-03-28T14:55:02.646943+00:00",{"id":127,"slug":128,"title":129,"created_at":130},"675942ef-b9ec-4c5f-a997-381250b6eacb","pixelsmile-facial-expression-editing-en","PixelSmile Framework Enhances Facial Expression Editing","2026-03-28T14:55:20.633463+00:00",{"id":132,"slug":133,"title":134,"created_at":135},"6954fa2b-8b66-4839-884b-e46f89fa1bc3","adaptive-block-scaled-data-types-en","IF4: Smarter 4-Bit Quantization That Adapts to Your Data","2026-03-31T06:00:36.65963+00:00"]