[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"article-locus-local-ordinance-corpus-us-en":3,"article-related-locus-local-ordinance-corpus-us-en":30,"series-research-d7f11606-750d-42ea-87b8-23a761269509":73},{"id":4,"slug":5,"title":6,"content":7,"summary":8,"source":9,"source_url":10,"author":11,"image_url":12,"cover_image":12,"category":13,"language":14,"translated_content":11,"related_article_id":15,"keywords":16,"key_takeaways":22,"views":26,"created_at":27,"published_at":28,"topic_cluster_id":29},"d7f11606-750d-42ea-87b8-23a761269509","locus-local-ordinance-corpus-us-en","LOCUS opens U.S. local law for legal AI","\u003Cp data-speakable=\"summary\">LOCUS builds a large corpus of U.S. local ordinances to make county and city law machine-readable.\u003C\u002Fp>\u003Cul>\u003Cli>\u003Cstrong>Research org\u003C\u002Fstrong>: Unspecified in arXiv abstract\u003C\u002Fli>\u003Cli>\u003Cstrong>Core data\u003C\u002Fstrong>: 9,239 cities and counties\u003C\u002Fli>\u003Cli>\u003Cstrong>Breakthrough\u003C\u002Fstrong>: OCR-backed corpus plus county-harmonized access layer\u003C\u002Fli>\u003C\u002Ful>\u003Cp>Legal AI systems are only as useful as the text they can actually see. This paper argues that local ordinances are a major missing layer: they shape zoning, housing, business licensing, public health, noise, animal control, and more, but they are scattered across vendor platforms built for browsing, not bulk analysis.\u003C\u002Fp>\u003Cp>For engineers, the practical issue is simple. If your system can search federal law but not the local code that governs day-to-day activity, it is missing a huge part of the legal surface area. LOCUS tries to close that gap by turning fragmented municipal and county codes into something researchers and model builders can work with at scale.\u003C\u002Fp>\u003Ch2>What problem this paper is trying to fix\u003C\u002Fh2>\u003Cp>The paper starts from a basic mismatch: American law is not just statutes and regulations at the federal or state level. A lot of the rules that affect real-world decisions live in local ordinances, and those ordinances are hard to collect in machine-readable form.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781764376812-ikxd.png\" alt=\"LOCUS opens U.S. local law for legal AI\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>According to the abstract, existing corpora largely leave out this layer because local codes are fragmented and hosted on platforms designed for human browsing. That makes them awkward for bulk extraction, indexing, and downstream NLP work. In other words, the data exists, but it has been operationally inaccessible.\u003C\u002Fp>\u003Cp>LOCUS, short for the Local Ordinance Corpus for the United States, is meant to change that. The authors frame it as both a raw corpus and a county-harmonized access layer, so the dataset is not just a pile of documents but also a more standardized way to work with them.\u003C\u002Fp>\u003Ch2>How LOCUS works in plain English\u003C\u002Fh2>\u003Cp>The core move is data collection plus normalization. The paper says the raw corpus represents nearly all publicly available municipal and county ordinance codes, and that OCR is used to handle the many document formats that have kept the law from being a public resource.\u003C\u002Fp>\u003Cp>That OCR detail matters. Local law is not neatly packaged in one schema, and the source material implies a lot of the work is about converting messy, document-heavy legal text into something that can be searched and analyzed. For developers, this is the unglamorous but essential part of building legal infrastructure: extraction, cleanup, and metadata.\u003C\u002Fp>\u003Cp>The paper also introduces a county-harmonized LOCUS access layer. The abstract says this layer covers the largest 2,309 of 3,144 U.S. counties, which the authors note accounts for a majority of the population. That suggests the dataset is designed to support both broad coverage and more standardized county-level analysis.\u003C\u002Fp>\u003Cp>Another important detail is reproducibility. The authors say they release the corpus with coverage metadata, which should help researchers understand what is included and what is missing. That is especially important in legal datasets, where absent jurisdictions can quietly distort results.\u003C\u002Fp>\u003Ch2>What the paper actually shows\u003C\u002Fh2>\u003Cp>The abstract gives a few concrete coverage numbers, but it does not provide \u003Ca href=\"\u002Ftag\u002Fbenchmark\">benchmark\u003C\u002Fa> scores for retrieval, classification, or downstream legal QA. So if you are looking for model accuracy or task performance, the abstract does not include that.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781764374903-z9y0.png\" alt=\"LOCUS opens U.S. local law for legal AI\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>What it does show is scale. The raw corpus contains codes from 9,239 cities and counties. The harmonized access layer covers 2,309 of 3,144 U.S. counties. Those numbers are the main evidence that this is not a narrow proof of concept but a serious attempt to build infrastructure for local-law research.\u003C\u002Fp>\u003Cp>The authors also say they train a collection of ModernBERT-based classifiers and scorers. These are intended to help analyze local law across dimensions such as opacity and paternalism, which the abstract says have not previously been studied at this scale. That is a notable research angle: once the corpus exists, the paper moves from collection to measurement.\u003C\u002Fp>\u003Cp>Still, the abstract stops short of telling us how well those models perform, what the labeling process \u003Ca href=\"\u002Fnews\u002Fgpt-56-fix-and-upgrade-release-en\">looks like\u003C\u002Fa>, or how robust the scores are across jurisdictions. The paper appears to be strongest as a dataset and tooling contribution, with the modeling work serving as an initial analytical layer rather than the headline result.\u003C\u002Fp>\u003Ch2>Why developers should care\u003C\u002Fh2>\u003Cp>If you build legal search, compliance tools, civic tech, or policy analytics systems, local ordinances are not edge cases. They are often the rules that determine whether a business can operate, what kind of building can be approved, or how a neighborhood can be regulated.\u003C\u002Fp>\u003Cp>LOCUS gives the ecosystem something that has been missing: a corpus large enough to support training, evaluation, and analysis on local law rather than just higher-level legal text. That matters for retrieval pipelines, domain adaptation, jurisdiction-aware classification, and any workflow that needs coverage beyond state and federal sources.\u003C\u002Fp>\u003Cp>The county-harmonized layer is especially useful for engineering work because harmonization reduces one of the biggest headaches in legal data: inconsistent jurisdiction structure. Even if the raw corpus is messy, a standardized access layer can make experimentation far more practical.\u003C\u002Fp>\u003Ch2>What this does not solve yet\u003C\u002Fh2>\u003Cp>The abstract is clear that LOCUS is a release and an access layer, not a declaration that local law is now fully solved as a data problem. Coverage is broad, but not complete in the harmonized layer, and the source does not claim every ordinance is perfectly structured or equally easy to parse.\u003C\u002Fp>\u003Cp>OCR also introduces its own risks. The paper says OCR is used to handle diverse document formats, but the abstract does not spell out error rates, validation procedures, or how much manual cleanup was required. For legal applications, those details matter because small extraction mistakes can change meaning.\u003C\u002Fp>\u003Cp>There is also an open question around downstream evaluation. The abstract mentions ModernBERT-based classifiers and scorers, but it does not report task metrics. That means practitioners should treat the dataset as enabling infrastructure first and a validated benchmark suite second.\u003C\u002Fp>\u003Ch2>The bottom line\u003C\u002Fh2>\u003Cp>LOCUS is a practical dataset paper with a clear thesis: if legal AI wants to understand American law at the level where many real-world rules actually live, it needs local ordinances in machine-readable form.\u003C\u002Fp>\u003Cp>By assembling nearly all publicly available municipal and county codes, adding OCR-based ingestion, and releasing a county-harmonized access layer with coverage metadata, the authors provide a foundation for new work in legal retrieval, classification, and local-law analysis. The paper does not yet give benchmark numbers in the abstract, but it does give the field something more basic and arguably more important: a way to see the data at all.\u003C\u002Fp>","LOCUS builds a large corpus of U.S. local ordinances to make county and city law machine-readable.","arxiv.org","https:\u002F\u002Farxiv.org\u002Fabs\u002F2606.19334",null,"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781764376812-ikxd.png","research","en","ba82ac15-7751-4d2c-82b0-3cbbf76b8a09",[17,18,19,20,21],"legal AI","local ordinances","OCR","corpus","municipal law",[23,24,25],"Local ordinances are a major missing layer in machine-readable legal corpora.","LOCUS assembles 9,239 city and county codes and adds a county-harmonized access layer.","The paper provides infrastructure and coverage metadata, but the abstract does not report benchmark numbers.",0,"2026-06-18T06:32:30.210741+00:00","2026-06-18T06:32:30.201+00:00","3103988e-c4fe-45e3-98ab-846500c9d507",{"tags":31,"relatedLang":32,"relatedPosts":36},[],{"id":15,"slug":33,"title":34,"language":35},"locus-local-ordinance-corpus-us-zh","LOCUS把美國地方法規變機器可讀","zh",[37,43,49,55,61,67],{"id":38,"slug":39,"title":40,"cover_image":41,"image_url":41,"created_at":42,"category":13},"03e7168c-77a8-40ea-924b-96f86204d88e","turing-rl-user-simulator-rewards-en","Turing-RL trains user simulators by fooling judges","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781763480946-dpwl.png","2026-06-18T06:17:31.584257+00:00",{"id":44,"slug":45,"title":46,"cover_image":47,"image_url":47,"created_at":48,"category":13},"0e33a353-6482-43dc-a0d7-646b9b1a2a2a","omniagent-active-perception-video-understanding-en","OmniAgent brings active perception to video understanding","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781762581923-hx7i.png","2026-06-18T06:02:32.210704+00:00",{"id":50,"slug":51,"title":52,"cover_image":53,"image_url":53,"created_at":54,"category":13},"596a6b3f-d7c0-46ef-9a88-1915a6e3f238","arxiv-ai-papers-agents-memory-data-en","ArXiv AI papers push agents, memory, and data","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781685183085-978g.png","2026-06-17T08:32:37.121772+00:00",{"id":56,"slug":57,"title":58,"cover_image":59,"image_url":59,"created_at":60,"category":13},"d910529d-15c0-498a-a930-85e14c6ef748","reprorepo-github-issues-reproducibility-audits-en","ReproRepo scales reproducibility audits with GitHub issues","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781678880894-uawp.png","2026-06-17T06:47:35.608681+00:00",{"id":62,"slug":63,"title":64,"cover_image":65,"image_url":65,"created_at":66,"category":13},"434fbb0a-e925-43f3-9c3d-a3fbd187acdc","variable-width-transformers-cut-wasted-capacity-en","Variable-Width Transformers cut wasted capacity","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781677980601-tp4b.png","2026-06-17T06:32:32.993101+00:00",{"id":68,"slug":69,"title":70,"cover_image":71,"image_url":71,"created_at":72,"category":13},"2f8d825d-5520-4fb6-b1dc-a309b0193f3e","veritas-robot-policy-visual-verification-en","VERITAS lets robots verify and improve at runtime","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781677086468-mhbq.png","2026-06-17T06:17:38.067708+00:00",[74,79,84,89,94,99,104,109,114,119],{"id":75,"slug":76,"title":77,"created_at":78},"a2715e72-1fe8-41b3-abb1-d0cf1f710189","ai-predictions-2026-big-changes-en","AI Predictions for 2026: Brace for Big Changes","2026-03-26T01:25:07.788356+00:00",{"id":80,"slug":81,"title":82,"created_at":83},"8404bd7b-4c2f-4109-9ec4-baf29d88af2b","ml-papers-of-the-week-github-research-desk-en","ML Papers of the Week Turns GitHub Into a Research Desk","2026-03-27T01:11:39.480259+00:00",{"id":85,"slug":86,"title":87,"created_at":88},"87897a94-8065-4464-a016-1f23e89e17cc","ai-ml-conferences-to-watch-in-2026-en","AI\u002FML Conferences to Watch in 2026","2026-03-27T01:51:54.184108+00:00",{"id":90,"slug":91,"title":92,"created_at":93},"6f1987cf-25f3-47a4-b3e6-db0997695be8","openclaw-agents-manipulated-self-sabotage-en","OpenClaw Agents Can Be Manipulated Into Failure","2026-03-28T03:03:18.899465+00:00",{"id":95,"slug":96,"title":97,"created_at":98},"a53571ad-735a-4178-9f93-cb09b699d99c","vega-driving-language-instructions-en","Vega: Driving with Natural Language Instructions","2026-03-28T14:54:04.698882+00:00",{"id":100,"slug":101,"title":102,"created_at":103},"a34581d6-f36e-46da-88bb-582fb3e7425c","personalizing-autonomous-driving-styles-en","Drive My Way: Personalizing Autonomous Driving Styles","2026-03-28T14:54:26.148181+00:00",{"id":105,"slug":106,"title":107,"created_at":108},"2bc1ad7f-26ce-4f02-9885-803b35fd229d","training-knowledge-bases-writeback-rag-en","Training Knowledge Bases with WriteBack-RAG","2026-03-28T14:54:45.643433+00:00",{"id":110,"slug":111,"title":112,"created_at":113},"71adc507-3c54-4605-bbe2-c966acd6187e","packforcing-long-video-generation-en","PackForcing: Efficient Long-Video Generation Method","2026-03-28T14:55:02.646943+00:00",{"id":115,"slug":116,"title":117,"created_at":118},"675942ef-b9ec-4c5f-a997-381250b6eacb","pixelsmile-facial-expression-editing-en","PixelSmile Framework Enhances Facial Expression Editing","2026-03-28T14:55:20.633463+00:00",{"id":120,"slug":121,"title":122,"created_at":123},"6954fa2b-8b66-4839-884b-e46f89fa1bc3","adaptive-block-scaled-data-types-en","IF4: Smarter 4-Bit Quantization That Adapts to Your Data","2026-03-31T06:00:36.65963+00:00"]