[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"article-what-large-language-models-are-how-they-work-en":3,"article-related-what-large-language-models-are-how-they-work-en":30,"series-research-4e2d39c9-e078-498b-90ca-988afae7b79f":81},{"id":4,"slug":5,"title":6,"content":7,"summary":8,"source":9,"source_url":10,"author":11,"image_url":12,"cover_image":12,"category":13,"language":14,"translated_content":11,"related_article_id":15,"keywords":16,"key_takeaways":22,"views":26,"created_at":27,"published_at":28,"topic_cluster_id":29},"4e2d39c9-e078-498b-90ca-988afae7b79f","what-large-language-models-are-how-they-work-en","What large language models are, and how they work","\u003Cp data-speakable=\"summary\">Large language models are neural networks trained on huge text datasets to generate and process language.\u003C\u002Fp>\u003Cp>In 2024, the biggest \u003Ca href=\"\u002Ftag\u002Fllms\">LLMs\u003C\u002Fa> were still transformer-based, and \u003Ca href=\"\u002Ftag\u002Fopenai\">OpenAI\u003C\u002Fa>’s \u003Ca href=\"https:\u002F\u002Fopenai.com\u002Findex\u002Fgpt-4\u002F\" target=\"_blank\" rel=\"noopener\">GPT-4\u003C\u002Fa> and \u003Ca href=\"https:\u002F\u002Fopenai.com\u002Findex\u002Fintroducing-gpt-4o\u002F\" target=\"_blank\" rel=\"noopener\">GPT-4o\u003C\u002Fa> kept pushing public expectations for what a chatbot can do. The story is bigger than chat: these models now summarize documents, translate text, write code, and power tools that look a lot like software assistants.\u003C\u002Fp>\u003Ctable>\u003Cthead>\u003Ctr>\u003Cth>Fact\u003C\u002Fth>\u003Cth>Value\u003C\u002Fth>\u003Cth>Why it matters\u003C\u002Fth>\u003C\u002Ftr>\u003C\u002Fthead>\u003Ctbody>\u003Ctr>\u003Ctd>Transformer breakthrough\u003C\u002Ftd>\u003Ctd>2017\u003C\u002Ftd>\u003Ctd>Set the architecture used by most top LLMs\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>GPT-3 release\u003C\u002Ftd>\u003Ctd>2020\u003C\u002Ftd>\u003Ctd>Made large-scale prompting a mainstream workflow\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>ChatGPT release\u003C\u002Ftd>\u003Ctd>2022\u003C\u002Ftd>\u003Ctd>Turned LLMs into a consumer product\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>DeepSeek R1\u003C\u002Ftd>\u003Ctd>671 billion parameters\u003C\u002Ftd>\u003Ctd>Showed how open-weight reasoning models can compete on cost\u003C\u002Ftd>\u003C\u002Ftr>\u003C\u002Ftbody>\u003C\u002Ftable>\u003Ch2>From text prediction to useful software\u003C\u002Fh2>\u003Cp>An \u003Ca href=\"\u002Ftag\u002Fllm\">LLM\u003C\u002Fa> is a neural network trained on a vast amount of text so it can predict the next \u003Ca href=\"\u002Ftag\u002Ftoken\">token\u003C\u002Fa>, then use that skill to answer questions, draft prose, or transform one kind of text into another. That sounds modest until you see the output quality: once the model has enough data, parameters, and training compute, it starts producing fluent language that feels less like autocomplete and more like a general-purpose text engine.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1779341169797-ssad.png\" alt=\"What large language models are, and how they work\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>The Wikipedia article gets this right in one important way: LLMs are foundational to modern chatbots, but they are also more general than chat. They can parse legal language, summarize long reports, generate code, and translate between languages. The catch is reliability. If the training data is biased, stale, or flat-out wrong, the model will echo those problems with impressive confidence.\u003C\u002Fp>\u003Cul>\u003Cli>They are trained on large text corpora, then tuned for instruction following.\u003C\u002Fli>\u003Cli>They use tokens, embeddings, and attention to process language numerically.\u003C\u002Fli>\u003Cli>They can generate, summarize, translate, and classify text.\u003C\u002Fli>\u003Cli>They still hallucinate, especially when asked for facts outside their training distribution.\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>Why transformers took over\u003C\u002Fh2>\u003Cp>The big architectural shift came in 2017 with \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F1706.03762\" target=\"_blank\" rel=\"noopener\">Attention Is All You Need\u003C\u002Fa>, the paper that introduced the transformer. Before that, language models leaned heavily on recurrent networks and hand-built statistical methods. Transformers changed the economics of training because they parallelize well and handle long-range context better than earlier approaches.\u003C\u002Fp>\u003Cp>That matters because scale is the whole story here. Once the model can pay attention across a large context window, it becomes much better at tasks that depend on relationships between distant words, paragraphs, or code blocks. By 2024, the largest and strongest models were transformer-based, even as research continued into alternatives such as state space models.\u003C\u002Fp>\u003Cblockquote>\u003Cp>“Attention Is All You Need”\u003C\u002Fp>\u003Cfooter>Vaswani et al., 2017\u003C\u002Ffooter>\u003C\u002Fblockquote>\u003Cp>The quote above is the title of the paper that changed the field, and it still captures the core idea better than most summaries do. The model does not read text like a human; it weights pieces of input against each other, then uses those relationships to decide what comes next.\u003C\u002Fp>\u003Ch2>Prompting turned LLMs into tools people could steer\u003C\u002Fh2>\u003Cp>One reason LLMs spread so quickly is that instruction following made them usable without retraining. A non-expert can often get strong results with a few rounds of trial and error: ask for a draft, ask for a rewrite, then ask for a stricter format. That simple interaction pattern opened the door for \u003Ca href=\"\u002Fnews\u002Fprompt-engineering-vague-asks-usable-outputs-en\">prompt engineering\u003C\u002Fa>, retrieval-augmented generation, and tool use.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1779341183477-4n74.png\" alt=\"What large language models are, and how they work\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>Chain-of-thought prompting, described in a 2022 paper, pushed this further by encouraging models to break problems into steps before answering. OpenAI’s \u003Ca href=\"https:\u002F\u002Fopenai.com\u002Findex\u002Flearning-to-reason-with-llms\u002F\" target=\"_blank\" rel=\"noopener\">o1\u003C\u002Fa> model, released in 2024, took a related path by generating long internal reasoning chains before returning a final answer. That does not make the system magical; it makes the model slower, more deliberate, and often better on multi-step tasks.\u003C\u002Fp>\u003Cul>\u003Cli>\u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002F\" target=\"_blank\" rel=\"noopener\">Hugging Face\u003C\u002Fa> helped open-weight models spread through the community.\u003C\u002Fli>\u003Cli>\u003Ca href=\"https:\u002F\u002Fai.meta.com\u002Fllama\u002F\" target=\"_blank\" rel=\"noopener\">LLaMA\u003C\u002Fa> and \u003Ca href=\"https:\u002F\u002Fmistral.ai\u002F\" target=\"_blank\" rel=\"noopener\">Mistral AI\u003C\u002Fa> pushed open-weight adoption.\u003C\u002Fli>\u003Cli>\u003Ca href=\"https:\u002F\u002Fwww.deepseek.com\u002F\" target=\"_blank\" rel=\"noopener\">DeepSeek\u003C\u002Fa> released R1 in January 2025 as a 671-billion-parameter open-weight model.\u003C\u002Fli>\u003Cli>\u003Ca href=\"https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FChain-of-thought_prompting\" target=\"_blank\" rel=\"noopener\">Chain-of-thought prompting\u003C\u002Fa> made stepwise reasoning easier to trigger with prompts.\u003C\u002Fli>\u003C\u002Ful>\u003Cp>There is a practical lesson here for teams building with LLMs: prompting is now a product skill, not a parlor trick. If the model can follow instructions well, the interface design matters almost as much as the model choice.\u003C\u002Fp>\u003Ch2>What still breaks, and why that matters\u003C\u002Fh2>\u003Cp>LLMs are powerful, but they are also brittle in ways that matter for real products. Hallucinations remain a problem because the model optimizes for plausible text, not truth. Bias in the training data can skew outputs. Prompt injection can trick an \u003Ca href=\"\u002Ftag\u002Fagent\">agent\u003C\u002Fa> into ignoring the user’s intent. Energy use is also part of the bill, especially when training and serving giant models at scale.\u003C\u002Fp>\u003Cp>Evaluation tries to keep up with these issues through perplexity, benchmarks, adversarial tests, and safety checks. That sounds tidy on paper, but benchmarks often lag behind real-world use. A model can score well on academic tasks and still fail when a user asks it to summarize a messy spreadsheet, follow a long policy, or resist malicious instructions hidden inside retrieved content.\u003C\u002Fp>\u003Cp>That gap is why the field keeps circling back to grounding, retrieval, and tool use. The best systems today do not rely on the model alone. They wrap it in search, validation, and guardrails so the model can draft while other systems check facts or execute actions.\u003C\u002Fp>\u003Cp>For a broader view of how this shift affects product teams, see \u003Ca href=\"\u002Fnews\u002Fai-agents-are-changing-product-design\" target=\"_blank\" rel=\"noopener\">our coverage of AI agents and product design\u003C\u002Fa>.\u003C\u002Fp>\u003Ch2>The real test is usefulness, not hype\u003C\u002Fh2>\u003Cp>LLMs moved from research curiosities to everyday infrastructure because they made language software programmable. That is the interesting part: a model trained to predict tokens now sits inside search, coding tools, support bots, and internal knowledge systems. The next step is less about bigger demos and more about better failure handling.\u003C\u002Fp>\u003Cp>My bet is simple: the teams that win will treat LLMs like unreliable but very fast junior assistants. They will verify outputs, constrain actions, and measure error rates instead of chasing bigger parameter counts alone. If you are building with one today, the question is not whether it can write a good answer. The real question is whether your product can tell when the answer is wrong.\u003C\u002Fp>","Large language models turn huge text corpora into systems that generate, summarize, and reason with language.","en.wikipedia.org","https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FLarge_language_model",null,"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1779341169797-ssad.png","research","en","d077afc5-6593-4e0f-afbf-b12229d083b6",[17,18,19,20,21],"large language models","transformers","prompt engineering","chain-of-thought","open-weight models",[23,24,25],"Transformers made modern LLMs practical at scale.","Prompting turned LLMs into usable products for non-experts.","Hallucinations, bias, and prompt injection still limit real-world reliability.",8,"2026-05-21T05:25:43.849628+00:00","2026-05-21T05:25:43.82+00:00","3103988e-c4fe-45e3-98ab-846500c9d507",{"tags":31,"relatedLang":40,"relatedPosts":44},[32,34,36,37,38],{"name":19,"slug":33},"prompt-engineering",{"name":17,"slug":35},"large-language-models",{"name":18,"slug":18},{"name":20,"slug":20},{"name":21,"slug":39},"open-weight-models",{"id":15,"slug":41,"title":42,"language":43},"what-large-language-models-are-how-they-work-zh","大型語言模型是什麼，怎麼運作","zh",[45,51,57,63,69,75],{"id":46,"slug":47,"title":48,"cover_image":49,"image_url":49,"created_at":50,"category":13},"850449f2-e75b-4dbf-97c0-3590c6cbf097","crdts-keep-replicas-in-sync-without-locks-en","CRDTs keep replicas in sync without locks","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781011086602-cokl.png","2026-06-09T13:17:35.890527+00:00",{"id":52,"slug":53,"title":54,"cover_image":55,"image_url":55,"created_at":56,"category":13},"7c6b6428-ba8d-4c59-840b-cf96a95139e5","post-deterministic-systems-autonomous-infra-en","Post-Deterministic Systems for Autonomous Infra","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781010190497-1grq.png","2026-06-09T13:02:33.235795+00:00",{"id":58,"slug":59,"title":60,"cover_image":61,"image_url":61,"created_at":62,"category":13},"53ec2203-e127-4bf8-8b3d-2dce8d156a54","causal-learnability-formal-language-tasks-en","Causal methods for measuring task learnability","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780987698514-ky8m.png","2026-06-09T06:47:35.103221+00:00",{"id":64,"slug":65,"title":66,"cover_image":67,"image_url":67,"created_at":68,"category":13},"55e7197e-f114-4b6c-b3e2-af1a3cd9dfa4","rl-training-hands-off-control-gradually-en","RL Training That Hands Off Control Gradually","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780986801034-gf8m.png","2026-06-09T06:32:33.516452+00:00",{"id":70,"slug":71,"title":72,"cover_image":73,"image_url":73,"created_at":74,"category":13},"93fc6735-b524-4baf-989f-645c4c47d593","omnigamearena-vlm-game-agent-benchmark-en","OmniGameArena benchmarks VLM game agents better","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780985895695-ugcj.png","2026-06-09T06:17:32.668876+00:00",{"id":76,"slug":77,"title":78,"cover_image":79,"image_url":79,"created_at":80,"category":13},"9f0c9505-6d75-411c-ba46-2382e8f295a5","turboquant-cuts-kv-cache-memory-6x-google-tests-en","TurboQuant cuts KV cache memory 6x in Google tests","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780906679116-fqdo.png","2026-06-08T08:17:22.276769+00:00",[82,87,92,97,102,107,112,117,122,127],{"id":83,"slug":84,"title":85,"created_at":86},"a2715e72-1fe8-41b3-abb1-d0cf1f710189","ai-predictions-2026-big-changes-en","AI Predictions for 2026: Brace for Big Changes","2026-03-26T01:25:07.788356+00:00",{"id":88,"slug":89,"title":90,"created_at":91},"8404bd7b-4c2f-4109-9ec4-baf29d88af2b","ml-papers-of-the-week-github-research-desk-en","ML Papers of the Week Turns GitHub Into a Research Desk","2026-03-27T01:11:39.480259+00:00",{"id":93,"slug":94,"title":95,"created_at":96},"87897a94-8065-4464-a016-1f23e89e17cc","ai-ml-conferences-to-watch-in-2026-en","AI\u002FML Conferences to Watch in 2026","2026-03-27T01:51:54.184108+00:00",{"id":98,"slug":99,"title":100,"created_at":101},"6f1987cf-25f3-47a4-b3e6-db0997695be8","openclaw-agents-manipulated-self-sabotage-en","OpenClaw Agents Can Be Manipulated Into Failure","2026-03-28T03:03:18.899465+00:00",{"id":103,"slug":104,"title":105,"created_at":106},"a53571ad-735a-4178-9f93-cb09b699d99c","vega-driving-language-instructions-en","Vega: Driving with Natural Language Instructions","2026-03-28T14:54:04.698882+00:00",{"id":108,"slug":109,"title":110,"created_at":111},"a34581d6-f36e-46da-88bb-582fb3e7425c","personalizing-autonomous-driving-styles-en","Drive My Way: Personalizing Autonomous Driving Styles","2026-03-28T14:54:26.148181+00:00",{"id":113,"slug":114,"title":115,"created_at":116},"2bc1ad7f-26ce-4f02-9885-803b35fd229d","training-knowledge-bases-writeback-rag-en","Training Knowledge Bases with WriteBack-RAG","2026-03-28T14:54:45.643433+00:00",{"id":118,"slug":119,"title":120,"created_at":121},"71adc507-3c54-4605-bbe2-c966acd6187e","packforcing-long-video-generation-en","PackForcing: Efficient Long-Video Generation Method","2026-03-28T14:55:02.646943+00:00",{"id":123,"slug":124,"title":125,"created_at":126},"675942ef-b9ec-4c5f-a997-381250b6eacb","pixelsmile-facial-expression-editing-en","PixelSmile Framework Enhances Facial Expression Editing","2026-03-28T14:55:20.633463+00:00",{"id":128,"slug":129,"title":130,"created_at":131},"6954fa2b-8b66-4839-884b-e46f89fa1bc3","adaptive-block-scaled-data-types-en","IF4: Smarter 4-Bit Quantization That Adapts to Your Data","2026-03-31T06:00:36.65963+00:00"]