[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"article-why-distributed-systems-feel-so-weird-en":3,"tags-why-distributed-systems-feel-so-weird-en":30,"related-lang-why-distributed-systems-feel-so-weird-en":40,"related-posts-why-distributed-systems-feel-so-weird-en":44,"series-industry-eed49c58-cac4-4542-b3ac-0412e5ad3835":81},{"id":4,"title":5,"content":6,"summary":7,"source":8,"source_url":9,"author":10,"image_url":11,"keywords":12,"language":18,"translated_content":10,"views":19,"is_premium":20,"created_at":21,"updated_at":21,"cover_image":11,"published_at":22,"rewrite_status":23,"rewrite_error":10,"rewritten_from_id":24,"slug":25,"category":26,"related_article_id":27,"status":28,"google_indexed_at":29,"x_posted_at":10,"tweet_text":10,"title_rewritten_at":10,"title_original":10,"key_takeaways":10,"topic_cluster_id":10,"embedding":10,"is_canonical_seed":20},"eed49c58-cac4-4542-b3ac-0412e5ad3835","Why Distributed Systems Feel So Weird","\u003Cp>Amazon had so few servers in 1999 that engineers named them things like “fishy” and “online-01.” That sounds quaint now, but the hard part was already there: distributed computing was messy then, and it is still messy today. In \u003Ca href=\"https:\u002F\u002Faws.amazon.com\u002Fbuilders-library\u002Fchallenges-with-distributed-systems\u002F\" target=\"_blank\" rel=\"noopener\">AWS’s Builders’ Library article\u003C\u002Fa>, the core warning is simple: once a system spans machines, failures stop behaving like isolated bugs and start behaving like a daily operating condition.\u003C\u002Fp>\u003Cp>The article makes a point that still catches teams off guard. A request that feels like one method call on a single machine turns into a chain of network sends, deliveries, validations, state updates, and replies. Each step can fail on its own. That is why distributed systems feel so strange compared with normal application code.\u003C\u002Fp>\u003Cp>If you have ever debugged a service that worked perfectly in unit tests and then fell apart under real traffic, this is the reason. The network does not share fate with your process, and that changes everything.\u003C\u002Fp>\u003Ch2>Three kinds of distributed systems, three very different pain levels\u003C\u002Fh2>\u003Cp>Amazon breaks distributed systems into three broad groups: offline systems, soft real-time systems, and hard real-time systems. The first group gets the easiest life. Batch jobs, rendering farms, and large analytics clusters can tolerate delays and retries because nobody is waiting on a live response.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1775171214027-ewht.png\" alt=\"Why Distributed Systems Feel So Weird\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>Soft real-time systems sit in the middle. They need to keep producing results, but they have breathing room. A search index builder can be offline for minutes or hours in some cases, and EC2 credential propagation has a window measured in hours because old credentials do not expire instantly.\u003C\u002Fp>\u003Cp>Hard real-time systems are where the pain concentrates. These are request\u002Freply services where a user, payment flow, or API caller is waiting right now. That includes front-end web servers, order pipelines, credit card transactions, telephony, and most AWS APIs.\u003C\u002Fp>\u003Cul>\u003Cli>Offline systems: batch processing, big data analysis, movie rendering, protein folding\u003C\u002Fli>\u003Cli>Soft real-time systems: search index builders, impaired-server detectors, EC2 credential updates\u003C\u002Fli>\u003Cli>Hard real-time systems: web requests, order processing, payment authorization, live APIs\u003C\u002Fli>\u003C\u002Ful>\u003Cp>The difference matters because the failure budget changes. Offline systems can retry quietly. Hard real-time systems need to answer fast, and they need to do it while the network, the remote machine, and the client all keep their own failure modes.\u003C\u002Fp>\u003Cp>That is why distributed systems are not just “single-machine programs with more servers.” The operating assumptions change as soon as a request has to cross a fault boundary.\u003C\u002Fp>\u003Ch2>The network turns one call into eight steps\u003C\u002Fh2>\u003Cp>The article’s best mental model is the request\u002Freply cycle. On a single machine, a function call looks simple. Across a network, one round trip breaks into eight steps: post the request, deliver it, validate it, update server state, post the reply, deliver the reply, validate it, and update client state.\u003C\u002Fp>\u003Cp>That list is long for a reason. Each step can fail independently, and each failure creates a different recovery problem. A request may never leave the client. It may reach the server and then get lost before the reply comes back. The reply may arrive but fail validation because of a version mismatch or corrupted data.\u003C\u002Fp>\u003Cp>To make the point concrete, the article imagines a networked Pac-Man game where the board lives on another server. A single line like \u003Ccode>board.find(\"pacman\")\u003C\u002Fcode> becomes a remote operation with transport, serialization, validation, and timeout concerns layered underneath it.\u003C\u002Fp>\u003Cblockquote>“The network enables sending messages from one fault domain to another.” — \u003Ca href=\"https:\u002F\u002Faws.amazon.com\u002Fbuilders-library\u002Fchallenges-with-distributed-systems\u002F\" target=\"_blank\" rel=\"noopener\">Amazon Builders’ Library\u003C\u002Fa>\u003C\u002Fblockquote>\u003Cp>That sentence is the heart of the piece. Once you send a message across a fault boundary, you stop living in the tidy world of in-process calls. You are now depending on two machines, the network between them, and the timing behavior of both sides.\u003C\u002Fp>\u003Cp>It also explains why distributed bugs are so annoying to reproduce. The code may be correct, but the timing may be wrong. Or the timing may be fine, but the machine may fail after receiving a request and before replying. Or the reply may be valid but arrive too late to matter.\u003C\u002Fp>\u003Ch2>Failure is normal, and “unknown” failure is worse\u003C\u002Fh2>\u003Cp>Amazon’s article spends a lot of time on failure because distributed systems force you to assume it. On one machine, if the CPU fries or the kernel panics, the whole box is probably gone. That is fate sharing: one failure takes the whole unit with it, which actually simplifies reasoning.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1775171212128-qwcd.png\" alt=\"Why Distributed Systems Feel So Weird\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>Across machines, fate sharing disappears. The client can keep running while the server is dead. The server can receive a request and crash before replying. The network can drop packets while both endpoints look healthy. That means code has to treat every step as a possible failure point.\u003C\u002Fp>\u003Cp>Here is the failure checklist the article walks through:\u003C\u002Fp>\u003Cul>\u003Cli>Request posting can fail if the network is down or the server rejects the connection\u003C\u002Fli>\u003Cli>Request delivery can fail if the server crashes after receiving the message\u003C\u002Fli>\u003Cli>Request validation can fail because of bad data, incompatible versions, or corrupted packets\u003C\u002Fli>\u003Cli>Server state updates can fail because storage or memory operations break\u003C\u002Fli>\u003Cli>Reply posting can fail even after the server has processed the request\u003C\u002Fli>\u003Cli>Reply delivery can fail after the response leaves the server\u003C\u002Fli>\u003Cli>Reply validation can fail on the client because the data is invalid or unexpected\u003C\u002Fli>\u003Cli>Client state updates can fail after the reply is received\u003C\u002Fli>\u003C\u002Ful>\u003Cp>This is why distributed systems teams obsess over timeouts, retries, idempotency, and versioning. Those are not optional extras. They are the tools that keep an API from turning into a pile of ambiguous partial failures.\u003C\u002Fp>\u003Cp>It also explains why “unknown unknowns” matter so much. You can write tests for a dropped packet or a refused connection. It is much harder to test the timing edge case where a server processed a request, died before replying, and then got retried by the client a second later.\u003C\u002Fp>\u003Ch2>How AWS compares the hard cases to the easy ones\u003C\u002Fh2>\u003Cp>The article’s comparison between single-machine code and networked code is useful because it shows how quickly complexity grows. A local call like \u003Ccode>board.find(\"pacman\")\u003C\u002Fcode> has a small set of failure modes. A remote call has to account for transport, serialization, validation, state mutation, and timeout behavior on both sides.\u003C\u002Fp>\u003Cp>That is why distributed systems engineering often feels less like application programming and more like designing for adversarial conditions. You are planning for partial success, duplicate delivery, stale state, and delayed responses all at once.\u003C\u002Fp>\u003Cul>\u003Cli>Single-machine call: one fault domain, one process, one memory space\u003C\u002Fli>\u003Cli>Network call: at least two fault domains, plus the network in between\u003C\u002Fli>\u003Cli>Single-machine failure: usually ends the whole process\u003C\u002Fli>\u003Cli>Network failure: one side may keep running while the other side fails or stalls\u003C\u002Fli>\u003Cli>Offline workloads: can absorb long delays and retries\u003C\u002Fli>\u003Cli>Hard real-time workloads: need quick answers even when parts of the system misbehave\u003C\u002Fli>\u003C\u002Ful>\u003Cp>That comparison is also why a lot of teams underestimate distributed systems at first. The code looks familiar. The deployment model looks familiar. The runtime behavior is anything but familiar once packets, timeouts, retries, and retries of retries enter the picture.\u003C\u002Fp>\u003Cp>One useful way to read this article is as a warning against local thinking. A method call on your laptop and a request between two AWS services may look similar in code, but they live in very different failure environments.\u003C\u002Fp>\u003Ch2>What this means for teams building real services\u003C\u002Fh2>\u003Cp>The strongest takeaway from AWS’s write-up is that distributed systems are defined by their failure modes, not by their diagrams. If your service crosses machines, you should assume the network will lie to you sometimes, the remote side will disappear sometimes, and your own process will sometimes get an answer too late to use it.\u003C\u002Fp>\u003Cp>That is why production systems need explicit handling for retries, deduplication, timeouts, and state reconciliation. It is also why teams that build APIs, payment systems, or control planes spend so much time on boring-looking infrastructure details. The boring parts are where correctness lives.\u003C\u002Fp>\u003Cp>For readers who want the next layer of detail, OraCore’s related coverage on \u003Ca href=\"\u002Fnews\u002Fdistributed-timeouts-and-retries\" target=\"_self\">timeouts and retries in distributed services\u003C\u002Fa> pairs well with this piece. The same goes for \u003Ca href=\"\u002Fnews\u002Fidempotency-in-api-design\" target=\"_self\">idempotency in API design\u003C\u002Fa>, because once a request can fail halfway through, duplicate work becomes a real design problem.\u003C\u002Fp>\u003Cp>My prediction is simple: as systems get more distributed, the teams that win will be the ones that treat failure as a normal input, not an exception. If you are designing a service today, ask one question before anything else: what exactly happens when the request reaches the server, but the reply never makes it back?\u003C\u002Fp>\u003C\u002Fp>","AWS explains why distributed systems break simple assumptions: every request crosses fault domains, and each step can fail independently.","aws.amazon.com","https:\u002F\u002Faws.amazon.com\u002Fbuilders-library\u002Fchallenges-with-distributed-systems\u002F",null,"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1775171214027-ewht.png",[13,14,15,16,17],"distributed systems","AWS Builders' Library","fault domains","request-reply","network failures","en",0,false,"2026-04-02T23:06:33.123649+00:00","2026-04-02T23:06:33.007+00:00","done","94a2b774-b6d8-4acc-a465-58a518e3458f","why-distributed-systems-feel-so-weird-en","industry","5d130eaa-4020-425f-8560-79a030c9f957","published","2026-04-07T07:41:14.768+00:00",[31,33,35,37,39],{"name":15,"slug":32},"fault-domains",{"name":17,"slug":34},"network-failures",{"name":13,"slug":36},"distributed-systems",{"name":14,"slug":38},"aws-builders-library",{"name":16,"slug":16},{"id":27,"slug":41,"title":42,"language":43},"why-distributed-systems-feel-so-weird-zh","分散式系統為何這麼怪","zh",[45,51,57,63,69,75],{"id":46,"slug":47,"title":48,"cover_image":49,"image_url":49,"created_at":50,"category":26},"cf1863f5-624d-4b5f-bc32-d469c2149866","why-ai-infrastructure-is-now-the-real-moat-en","Why AI infrastructure is now the real moat","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778875858866-4ikl.png","2026-05-15T20:10:38.090619+00:00",{"id":52,"slug":53,"title":54,"cover_image":55,"image_url":55,"created_at":56,"category":26},"6ff3920d-c8ea-4cf3-8543-9cf9efc3fe36","circles-agent-stack-targets-machine-speed-payments-en","Circle’s Agent Stack targets machine-speed payments","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778871659638-hur1.png","2026-05-15T19:00:44.756112+00:00",{"id":58,"slug":59,"title":60,"cover_image":61,"image_url":61,"created_at":62,"category":26},"1270e2f4-6f3b-4772-9075-87c54b07a8d1","iren-signs-nvidia-ai-infrastructure-pact-en","IREN signs Nvidia AI infrastructure pact","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778871059665-3vhi.png","2026-05-15T18:50:38.162691+00:00",{"id":64,"slug":65,"title":66,"cover_image":67,"image_url":67,"created_at":68,"category":26},"b308c85e-ee9c-4de6-b702-dfad6d8da36f","circle-agent-stack-ai-payments-en","Circle launches Agent Stack for AI payments","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778870450891-zv1j.png","2026-05-15T18:40:31.462625+00:00",{"id":70,"slug":71,"title":72,"cover_image":73,"image_url":73,"created_at":74,"category":26},"f7028083-46ba-493b-a3db-dd6616a8c21f","why-nebius-ai-pivot-is-more-real-than-hype-en","Why Nebius’s AI Pivot Is More Real Than Hype","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778823055711-tbfv.png","2026-05-15T05:30:26.829489+00:00",{"id":76,"slug":77,"title":78,"cover_image":79,"image_url":79,"created_at":80,"category":26},"b63692ed-db6a-4dbd-b771-e1babdc94af7","nvidia-backs-corning-factories-with-billions-en","Nvidia backs Corning factories with billions","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778822444685-tvx6.png","2026-05-15T05:20:28.914908+00:00",[82,87,92,97,102,107,112,117,122,127],{"id":83,"slug":84,"title":85,"created_at":86},"d35a1bd9-e709-412e-a2df-392df1dc572a","ai-impact-2026-developments-market-en","AI's Impact in 2026: Key Developments and Market Shifts","2026-03-25T16:20:33.205823+00:00",{"id":88,"slug":89,"title":90,"created_at":91},"5ed27921-5fd6-492e-8c59-78393bf37710","trumps-ai-legislative-framework-en","Trump's AI Legislative Framework: What's Inside?","2026-03-25T16:22:20.005325+00:00",{"id":93,"slug":94,"title":95,"created_at":96},"e454a642-f03c-4794-b185-5f651aebbaca","nvidia-gtc-2026-key-highlights-innovations-en","NVIDIA GTC 2026: Key Highlights and Innovations","2026-03-25T16:22:47.882615+00:00",{"id":98,"slug":99,"title":100,"created_at":101},"0ebb5b16-774a-4922-945d-5f2ce1df5a6d","claude-usage-diversifies-learning-curves-en","Claude Usage Diversifies, Learning Curves Emerge","2026-03-25T16:25:50.770376+00:00",{"id":103,"slug":104,"title":105,"created_at":106},"69934e86-2fc5-4280-8223-7b917a48ace8","openclaw-ai-commoditization-concerns-en","OpenClaw's Rise Raises Concerns of AI Model Commoditization","2026-03-25T16:26:30.582047+00:00",{"id":108,"slug":109,"title":110,"created_at":111},"b4b2575b-2ac8-46b2-b90e-ab1d7c060797","google-gemini-ai-rollout-2026-en","Google's Gemini AI Rollout Extended to 2026","2026-03-25T16:28:14.808842+00:00",{"id":113,"slug":114,"title":115,"created_at":116},"6e18bc65-42ae-4ad0-b564-67d7f66b979e","meta-llama4-fabricated-results-scandal-en","Meta's Llama 4 Scandal: Fabricated AI Test Results Unveiled","2026-03-25T16:29:15.482836+00:00",{"id":118,"slug":119,"title":120,"created_at":121},"bf888e9d-08be-4f47-996c-7b24b5ab3500","accenture-mistral-ai-deployment-en","Accenture and Mistral AI Team Up for AI Deployment","2026-03-25T16:31:01.894655+00:00",{"id":123,"slug":124,"title":125,"created_at":126},"5382b536-fad2-49c6-ac85-9eb2bae49f35","mistral-ai-high-stakes-2026-en","Mistral AI: Facing High Stakes in 2026","2026-03-25T16:31:39.941974+00:00",{"id":128,"slug":129,"title":130,"created_at":131},"9da3d2d6-b669-4971-ba1d-17fdb3548ed5","cursors-meteoric-rise-pressures-en","Cursor's Meteoric Rise Faces Industry Pressures","2026-03-25T16:32:21.899217+00:00"]