← DataFolks
Chapter 2 Seedling 15 min read

Couriers, Packets, and Digital Dabbawalas

Your datafolk hit the road. Using Mumbai's legendary dabbawala system as a lens, this chapter traces how data moves across the internet, splitting, multiplying, and leaving copies at every stop.

The moment you tap “Log In,” something small but important happens. Not on the screen, but underneath it. A quiet departure.

A piece of you leaves.

It does not leave as a photograph or a sentence or a neatly packed file. It leaves as fragments. Tiny, structured scraps. A device identifier here. A timestamp there. A cookie that says you were here before. Together, these fragments make up a travelling version of you, one that rarely waits for permission. This is your datafolk in motion.

To understand how datafolk move, it helps to stop thinking about the internet as a cloud. Clouds are vague and polite. The internet is neither. It is closer to a city filled with couriers, routes, checkpoints, warehouses, and rules. Data does not float. It is carried.

And there is no better way to understand how it’s carried than to start with a lunchbox.

The Dabbawalas of Mumbai

Every morning in Mumbai, around 5,000 dabbawalas pick up freshly cooked lunches from homes across the city.1 Each tiffin box, each dabba, travels through a chain of hands, changes trains at Churchgate or Dadar, and arrives at the right office desk by 12:30 PM. The system moves 200,000 lunches daily with an error rate so low that Harvard Business School studied it. One mistake per six million deliveries. Your favourite food delivery app could never.

Here’s the crucial detail: no single dabbawala sees the entire journey. Most cannot read the addresses they carry. Yet the system works, because everyone follows the same code.

The markings on each dabba tell the story. A combination of colours, symbols, and numbers painted on the lid encodes the origin station, the destination station, the building, and the floor. Each dabbawala reads only the part relevant to their leg of the journey. At the origin station, one reads the colour that indicates which train to board. At the destination station, another reads the symbol that points to the neighbourhood. At the final building, someone reads the number that specifies the floor. The dabba itself carries its routing instructions.

This is what happens to your datafolk when they travel through the internet. And understanding dabbawalas is the first step to understanding what happens to your data every time you tap a button on your phone.

The comparison is not decorative. It is structurally precise. Both systems break a delivery into legs, use coded labels for routing, rely on intermediaries who don’t need to understand the full journey, and achieve remarkable efficiency through standardisation. The dabbawala network and the internet are, at their core, the same design pattern, just operating at different scales and different centuries.

What Is a Packet?

When your phone sends data across the internet, it doesn’t open a direct connection and pour the data through like water through a hose. Instead, the data is broken into small chunks, typically around 1,500 bytes each. Each chunk is wrapped in some extra information and sent out as an independent unit.

That unit is a packet.

Think of it this way: if you wanted to send a 300-page book through the dabbawala system, you wouldn’t send the whole book as one package, it’s too heavy, too unwieldy, and if it gets lost, you’ve lost everything. You’d also lose a friend, because nobody appreciates being handed a 300-page book and told “deliver this by lunch.”

Instead, you’d tear out each page, put it in its own envelope with instructions on the cover, and send them separately. Each envelope can take a different route. Each one arrives independently. The recipient reassembles the pages in order.

That’s packet switching. And it’s the reason the internet works.

Anatomy of a Packet

Every packet has two parts: a header and a payload. The header is the label on the lunchbox. The payload is the lunch.

The Header

The header is the metadata that tells the network how to handle this packet:

Source address, Your device’s IP address, the return address on the envelope. It tells the network where this packet came from, so the response knows where to go back to. Like writing your home address on the dabba lid, except considerably less colourful.

Destination address, The IP address of wherever this packet is headed. When you watch YouTube, the destination is one of Google’s video servers. When you send a WhatsApp message, it’s Meta’s servers. When you’re arguing with a stranger on Twitter, it’s, well, probably better not to think about the infrastructure supporting that particular activity.

Sequence number, Which piece of the original message this packet represents. If your video was broken into 10,000 packets, this number tells the receiver “I am packet #4,827, please put me between #4,826 and #4,828 and do try not to lose me.”

Protocol, The rules this packet follows. The two main protocols are:

  • TCP, Reliable delivery. If a packet gets lost, TCP requests it again. Used for web pages, messages, file downloads. The responsible older sibling of internet protocols.
  • UDP, Fast delivery, no guarantees. If a packet gets lost, too bad, it’s gone, we’ve moved on. Used for video calls, online gaming, live streaming. The younger sibling who shows up late with no explanation and somehow gets away with it.

TTL (Time to Live), A countdown that starts at a number (usually 64) and decreases by one at each hop. When it hits zero, the packet is discarded. This prevents lost packets from bouncing around the network forever, like a dabba that missed every handoff and is now just riding the local train indefinitely, confusing everyone.

The Payload

The payload is the actual data, a fragment of your message, a slice of an image, a piece of a video frame. On its own, a single packet’s payload is often meaningless. It’s one page torn from a book. But combined with all the other packets, it reconstructs the complete data.

Your entire internet experience, every video, every message, every payment, every doom-scroll session at 2 AM, is packets. All the way down.

Why Packets? A Brief History of Not Reserving the Wire

Before the internet, we had the telephone network. When you called someone, the system created a dedicated circuit, an actual physical path of wires, between you and the other person. That circuit was held open for the entire call, reserved exclusively for you.

This worked fine for voice calls. But it was spectacularly wasteful. Think about a phone conversation: there are pauses, silences, moments when you’re listening, moments when you’re wondering why your aunt called you at this hour. During all that, the dedicated wire is sitting idle. You’re paying for a reserved lane on the highway while your car is parked and you’re at the dhaba having tea.

If you tried to run the internet on circuit switching, you’d need a dedicated wire between every pair of communicating devices. That’s billions of simultaneous connections. It’s physically and economically impossible. It’s also an infrastructure engineer’s anxiety dream.

In the 1960s, researchers including Paul Baran and Donald Davies independently came up with a radical idea: don’t reserve the wire.2 Instead, break the data into small packets, label each one with its destination, and let them share the wires.

Packets from different senders interleave on the same cable. Your YouTube video packets share the same fibre optic cable as someone else’s WhatsApp messages and a third person’s UPI payment. Each packet finds its way to the right destination based on its header, just like dabbas from different households sharing the same train and the same dabbawalas.

This is why the internet is efficient. This is also why it’s chaotic, packets can arrive out of order, get lost, or take different routes. The protocols (TCP and UDP) exist to handle this chaos, which is fundamentally the same job as a very patient project manager.

The Journey Begins: A UPI Payment

Let’s follow a single action through the system. You are standing at a chai stall in Koramangala, Bangalore. You scan the shopkeeper’s PhonePe QR code and enter ₹20. You tap “Pay.” Your phone tells you the transaction is complete. The chai tastes the same as always. Nothing seems to have happened except that your bank balance dropped by twenty rupees.

But beneath that simple tap, your datafolk just took a journey longer and more complex than any dabbawala’s route.

The moment you tapped “Pay,” your phone created a packet. Not a single message, but a bundle of fragments. This packet contained your UPI ID (a kind of address), the shopkeeper’s UPI ID, the amount, a timestamp, and your phone’s device identifier. Each piece of information was wrapped in a specific format that routers along the way could read, just as each dabba carries painted codes that dabbawalas interpret.

This packet did not travel directly to your bank. It could not. The internet does not work that way. Instead, your packet first went to your phone’s operating system, which handed it to your mobile network. The network’s nearest tower received the packet and passed it to the telecom operator’s regional gateway. From there, the packet entered the internet backbone, the digital equivalent of Mumbai’s railway network, where it hopped between routers operated by multiple companies until it reached the National Payments Corporation of India (NPCI), which operates the UPI system.

At NPCI, the packet was authenticated, checked against your bank’s records, and routed to both your bank and the shopkeeper’s bank. Multiple packets travelled back: confirmations, balance updates, transaction logs. Your phone’s “Success” screen appeared only after this entire round trip was completed, typically in under two seconds.

In those two seconds, your datafolk split into multiple copies, passed through at least a dozen different systems operated by different organisations, and left traces in each one. The chai stall’s QR code provider logged the transaction. Your phone’s UPI app recorded it. Your telecom operator logged the data packets. NPCI stored the transaction details. Both banks updated their records. The shopkeeper’s payment provider noted the incoming payment.

A single tap. At least seven copies of your datafolk created. And you noticed nothing but the chai cooling in your hand.

What Each Intermediary Sees

Here’s what makes this interesting, and a little unsettling.

Even when the payload is encrypted (and it usually is, thanks to HTTPS), the header is visible. It has to be, the network needs to read the destination address to route the packet, just like a dabbawala needs to read the lid to know where the dabba goes.

This means everyone between you and the destination can see:

WhoWhat They See
Your Wi-Fi routerEvery IP address you connect to, when, and how much data you exchange. It’s the nosiest device in your home, and you gave it admin access.
Your ISP (Jio, Airtel, BSNL)Same as the router, plus your account identity. They may not know you watched a specific YouTube video, but they know you connected to YouTube’s servers for 45 minutes at 11 PM.
Internet exchange pointsAggregated traffic patterns. Not individual content, but flow data, what’s going where and how much of it.
NPCIBoth sides of every UPI transaction in India. Sender, receiver, amount, time, banks, apps. The most complete picture of Indian financial activity that has ever existed.

This is metadata, data about data. And metadata is often more revealing than the content itself.

Consider: we don’t need to read your messages to know that you called a divorce lawyer at 2 AM, then a real estate agent the next morning. The pattern of connections tells the story. We also don’t need to read your UPI payloads to know that you make a payment to the same wine shop every Friday evening. The metadata, IP addresses, timestamps, frequency, is enough to write your biography. An unflattering one.

The Cables Under the Sea

Your ₹20 UPI payment probably stayed within India, bouncing between data centres in Mumbai, Chennai, and Bangalore. But most of your internet activity doesn’t.3

When you open YouTube, your packets might travel to Google’s nearest edge server, possibly in Mumbai or Singapore. When you use Instagram, your data crosses the ocean to Meta’s data centres in the United States. When you search for something on Google, depending on what you’re searching and where the relevant index is cached, your query might hop through an undersea fibre optic cable.

There are currently over 500 submarine cables crisscrossing the world’s oceans, carrying approximately 99% of intercontinental data traffic. Some of these cables are as thin as a garden hose, sitting on the ocean floor, carrying the collective internet activity of entire nations. The idea that your 2 AM Wikipedia deep-dive about capybara social behaviour travels through a tube on the literal ocean floor is, frankly, one of the more humbling facts about modern technology.

India’s international connectivity lands primarily at Mumbai’s Versova and Chennai’s submarine cable landing stations. Eleven of India’s submarine cables come ashore at Versova, a six-kilometre stretch of coastline in suburban Mumbai that carries the vast majority of India’s international data traffic. Your datafolk’s journey abroad almost certainly passes through this unremarkable patch of beach.

Data Localisation and the Border Problem

Packets don’t have passports. They don’t respect borders. A packet from Bangalore to Delhi might route through Singapore if that’s the most efficient path. Your data is a cosmopolitan traveller who never asked for a visa.

This creates a problem. The physical location of a server determines which country’s laws apply to the data on it. A server in Mumbai is subject to Indian law. A server in Virginia is subject to American law, even if it stores data about Indian citizens, collected by an Indian company, about transactions that happened in India.

In April 2018, the Reserve Bank of India issued a circular that changed this equation for financial data: all payment system operators must store their data exclusively in India.4 Not a copy, the primary data. On Indian servers. Under Indian law.

This is data localisation, and it’s one of the defining debates of our time. We’ll explore it in depth in a later chapter. But the seed is here: your datafolk, once born and set in motion, can end up anywhere. The question of who controls where they come to rest, and who gets to look at them once they do, is ultimately a question of power.

Cookies, Trackers, and the Passengers You Didn’t Invite

So far, we’ve talked about datafolk you knowingly created, a UPI payment, a YouTube video, a login. But your datafolk also pick up hitchhikers.

When you visit a website, it often drops a cookie on your device, a small file that remembers you. First-party cookies are generally useful: they keep you logged in, remember your language preference, save your shopping cart. They’re the friendly neighbourhood shopkeeper who remembers your usual order.

Third-party cookies are different. These are placed not by the website you’re visiting, but by advertisers, analytics companies, and data brokers who have code embedded on that website. When you visit a news article, the article’s website might drop 30-60 third-party cookies. Each one is a tiny datafolk birth, a new tracker that can follow you across the internet, building a profile of your browsing habits across different websites.

You read a health article about migraines. A tracker notes it. You search for headache medication. A different tracker notes it. You visit a pharmacy’s website. A third tracker notes it. Now three different companies know you have migraines, and your personalised ad experience is about to become very, very specific. You will see migraine medication advertisements for the next three months. You will see them everywhere. You will begin to wonder if the ads are causing the migraines.

This is cross-site tracking, and it’s the engine that powers most of the internet’s advertising economy. Your datafolk didn’t just travel from your phone to a server. They multiplied at every stop, and each copy wandered off to tell someone about your headache.

The Chain Reaction

This is the takeaway of this chapter, and it’s worth stating plainly: sharing data is rarely a single action. It is a chain reaction.

When you tap “Pay” for a ₹20 chai, you didn’t send one piece of data to one place. You triggered a cascade of packets across multiple networks, creating copies at every intermediary, generating metadata that reveals your patterns, and leaving traces that persist long after the chai is finished and you’ve moved on with your day.

When you visit a website, you didn’t just load a page. You announced your presence to dozens of trackers, each of which sent your information to different servers in different countries operated by different companies under different privacy laws.

When you installed an app and tapped “Allow” on the permissions popup, you didn’t just grant access to your camera or contacts. You opened a channel that will continuously generate datafolk for as long as the app is installed, whether you’re using it or not.

Every digital action is a departure. Every departure creates copies. Every copy has a destination you didn’t choose and a lifespan you don’t control.

Your datafolk are out there, riding the cables, bouncing between routers, sitting in databases you’ll never see. They are the dabbawalas who never come home.

The next question is: where do they end up, and who’s waiting for them when they arrive?

Try It Yourself

The Packet Tracer experiment in the lab lets you send a packet from your phone to a web server and watch it hop through each network node. Toggle the packet header to see what metadata each intermediary can read, even when the payload is encrypted.


Previously: How Datafolk Are Born, from census takers to Aadhaar, the origins of data collection. Next chapter: The Invisible Bazaar, where your datafolk are valued, traded, and profiled.

Reference

Glossary

Packet
A small unit of data sent across a network, typically around 1,500 bytes. Like a page torn from a book, wrapped in an envelope with routing instructions. Meaningless alone, essential in aggregate.
IP Address
A unique numerical label assigned to every device on a network, your device's mailing address on the internet. IPv4 looks like 192.168.1.1. IPv6 looks like a cat walked across a keyboard.
TCP
Transmission Control Protocol, a reliable delivery method that ensures every packet arrives and arrives in order. The postal service of the internet: slower, but nothing gets lost. Used for web pages, emails, and anything where every byte matters.
UDP
User Datagram Protocol, a fast delivery method with no guarantees. The courier who throws the package at your door and sprints away. Used for video calls, live streaming, and online gaming where speed matters more than perfection.
Router
A device that reads packet headers and forwards them toward their destination, the dabbawala of the internet. It doesn't read your data; it reads the address on the envelope.
DNS
Domain Name System, the internet's phone book. Converts human-readable addresses (google.com) to IP addresses (142.250.182.14). Without it, you'd need to memorise numbers instead of names, and the internet would feel even more hostile.
Cookie
A small text file a website places on your device to remember you between visits. First-party cookies are helpful (staying logged in). Third-party cookies are the ones following you around the internet like a particularly persistent auto-rickshaw driver.
Metadata
Data about data, the envelope rather than the letter. Who sent it, who received it, when, how large, from where. Often more revealing than the contents. As one intelligence official reportedly said: 'We kill people based on metadata.'

Reference

Sources

  1. 1

    Thomke, Stefan. 'Mumbai's Models of Service Excellence.' Harvard Business Review, November 2012.

    → source
  2. 2

    Baran, Paul. 'On Distributed Communications: Introduction to Distributed Communications Networks.' RAND Corporation, 1964.

    → source
  3. 3

    TeleGeography Submarine Cable Map.

    → source
  4. 4

    RBI Circular on Storage of Payment System Data, April 2018. Reserve Bank of India.

    → source

SEARCH