← DataFolks
Chapter 1 Seedling 9 min read

How Datafolk Are Born

From census takers with clipboards to apps with cookies, tracing the origins of data collection. How identity becomes something that can be recorded, stored, and occasionally used against you.

You were born twice.

The first time was messy, loud, and involved your mother doing all the work. The second time was quieter. Someone typed your name into a form, assigned you a number, and just like that, you existed in a database. Your datafolk was born.

That second birth might have happened in the hospital itself. A nurse entered your weight, your time of arrival, and your parents’ names into a system. Or it might have happened a few weeks later, when your father stood in a government office and filled out a birth certificate application while you screamed in the background, blissfully unaware that you were being committed to bureaucratic immortality.

Either way, the pattern is the same. A human being, complicated, contradictory, occasionally smelly, gets reduced to a set of fields in a system. Name. Date. Gender. Address. Parent names. Done. You are now data.

Welcome to the world. Please fill out this form.

The Very First Data Collection (It Was a Census, Obviously)

Humans have been counting each other for thousands of years, mostly to figure out who to tax and who to send to war. The ancient Babylonians did it. The Romans did it. The word “census” literally comes from the Latin censere, to assess, to tax. Not “to understand” or “to celebrate.” To tax.

The British conducted India’s first modern census in 1871. They counted 238 million people across the subcontinent, categorising them by caste, religion, language, occupation, and, because the British Empire was nothing if not thorough, “infirmities.” They had a column for whether you were blind, deaf, or, and I quote the actual census form, an “idiot.” Victorian data collection: comprehensive and unflinching.

But here’s the thing about counting people. The moment you start recording, you start deciding what matters. The census didn’t just count Indians, it categorised them. And those categories, chosen by colonial administrators in offices thousands of miles away, hardened into identities that persist to this day. Caste categories that were fluid and contextual became rigid database entries. Communities that defined themselves in complex ways got reduced to checkboxes.

This is the first lesson of data collection: the form shapes the person, not the other way around. When a system asks you to pick “Male” or “Female” and offers nothing else, it isn’t reflecting reality, it’s constructing it. When a school admission form asks for your “mother tongue” and gives you a dropdown of 22 options, it’s not capturing the richness of Indian linguistic life, it’s compressing it into a schema.

Your datafolk is always smaller than you. That’s by design.

The Punch Card Revolution (And Why You Should Care)

In 1890, the United States faced a problem. The 1880 census had taken nearly eight years to process, by hand, with clerks tallying marks on paper. The country was growing fast. At that rate, the 1890 census wouldn’t be finished before the 1900 census began. A bureaucratic nightmare, which is the worst kind of nightmare because nobody even gets to scream.

Enter Herman Hollerith, an engineer with an elegant idea: what if you could encode census data as holes punched in a card, and then run the cards through a machine that counted the holes electrically?1

It worked. The 1890 census was tabulated in roughly one year. Hollerith became famous. His company eventually merged with others to form a little outfit called International Business Machines. You know them as IBM.

Punch cards turned data processing from an artisanal craft into an industrial operation. Suddenly, you could sort thousands of records by any field, age, occupation, location, in hours instead of months. You could cross-reference. You could find patterns. A census went from “how many people live here” to “what kind of people live here, and what are they doing.”

This is not an entirely heartwarming story.

In the 1930s and 1940s, the same punch card technology was used by Nazi Germany to identify, track, and ultimately locate Jewish, Roma, and other persecuted populations across Europe.3 IBM’s German subsidiary, Dehomag, provided the machines. The census data, originally collected for administrative purposes, became a tool of genocide.

The technology was neutral. The punch card didn’t care what data you put on it. But the humans operating the system were not neutral at all.

This is the second lesson: data collected for one purpose can always be used for another. A census built to allocate resources becomes a tool for surveillance. A customer loyalty programme built to offer discounts becomes a profiling engine. An app that asks for your location to show nearby restaurants sells that data to advertisers. The birth of your datafolk is a one-way door. Once recorded, data doesn’t go back.

How Modern Datafolk Are Born

Let’s fast-forward to right now. You wake up and reach for your phone. It’s 7:14 AM and you haven’t even opened your eyes properly, but you’ve already created data.

Your alarm went off, your phone logged the time. Your sleep tracking app noted that you moved at 6:52, 7:03, and 7:11 before finally accepting that morning is, unfortunately, real.

You unlock your phone, Face ID or fingerprint. Your biometric data is matched against a stored template. The phone logs a successful authentication event: time, method, success/failure.

You open Instagram, the app logs your session start time, your device model, your OS version, your IP address (which reveals your approximate location), your language setting, and your account ID. It reads the cookies from your last session. Before you’ve seen a single post, Instagram already knows you’re back. It missed you. Sort of.

You scroll for four minutes, every post you pause on, every reel you watch past the three-second mark, every story you skip, every ad you linger on for a fraction of a second too long. All logged. A complete record of your 7:15 AM attention span, which is to say: a record of a person trying to wake up.

You order chai on Swiggy, your address, your order history, your payment method, the time of day, the restaurant you chose, the items you rejected. Swiggy now knows you order chai before anything else, which makes you predictable, which makes you profitable.

By 7:30 AM, you’ve generated hundreds of data points without consciously doing anything. Each one is a small birth, a tiny datafolk spawned from an action so routine you’ve already forgotten it.

Here’s the uncomfortable arithmetic: you probably touch your phone 150-200 times a day. Each touch generates multiple data points. That’s roughly 2,000 to 3,000 data points per day, just from your phone. Add your laptop, your smart TV, your fitness band, your UPI transactions, your metro card, your CCTV appearances, and you’re generating a continuous stream of datafolk, all day, every day, whether you intend to or not.

You are, to put it bluntly, a data factory that also sometimes eats biryani.

The Aadhaar Moment

No discussion of datafolk births in India is complete without Aadhaar.

Launched in 2009, Aadhaar is a 12-digit identity number linked to your biometric data, fingerprints (all ten) and iris scans (both eyes). As of 2024, over 1.37 billion Aadhaar numbers have been issued,2 covering approximately 99% of India’s adult population.

Aadhaar is, in a very literal sense, a mass datafolk birth event. The largest biometric database in human history, built in a country where many people previously had no government-issued ID at all. For millions, Aadhaar was the first time they existed in a formal system, the first time a database acknowledged that they were real.

This is genuinely remarkable. A woman in rural Jharkhand who couldn’t prove her identity to receive her pension? Aadhaar solved that. A migrant worker who lost his ration card? Aadhaar could link him to his entitlements across state borders. The promise of Aadhaar was inclusion, giving a datafolk to people who’d been invisible to the state.

But Aadhaar also demonstrates every tension we’ve discussed. Your Aadhaar number is a universal identifier, it can link your bank account, your phone number, your tax returns, your vaccination records, your gas subsidy, and your electoral roll into a single graph. One number, connecting everything. Convenient? Absolutely. But also a single key that unlocks your entire life.

The data collected for inclusion can also be used for exclusion. Failed biometric authentication, a construction worker’s fingerprints too worn to scan, an elderly person’s iris too clouded, can mean denied rations, denied pensions, denied existence. The system that was meant to see you can also decide it doesn’t recognise you.

Why This Matters

Every piece of data you’ve ever generated still exists somewhere. That hospital entry from the day you were born. Your school admission records. Your first email address (yes, the embarrassing one). Your Aadhaar enrolment. Your Swiggy order history. Your 3 AM YouTube watch history, which we will not discuss.

These are your datafolk. Some were born because you chose to create them, you signed up, you opted in, you tapped “I Agree” without reading the terms (don’t worry, nobody reads them, the last person who read an entire Terms of Service document was a lawyer in 2006 and she described the experience as “harrowing”).

Some were born without your knowledge. The CCTV camera at the metro station. The Wi-Fi probe your phone sent out, announcing your device’s MAC address to every access point in range. The tracking pixel in that email you opened. Data collection doesn’t require your participation. It barely requires your presence.

The question isn’t whether you have datafolk, you do, more than you can count. The question is who else has them, what they’re doing with them, and whether you have any say in the matter.

In the next chapter, your datafolk start to move. They split, multiply, and travel through networks that span continents. Understanding how they travel is the first step to understanding where they end up, and who’s watching them along the way.


Next: Couriers, Packets, and Digital Dabbawalas, your datafolk hit the road. Or more accurately, the fibre optic cable.

Reference

Glossary

Structured Data
Data organised into predefined categories and formats, think spreadsheet rows and columns rather than a handwritten diary. Computers love it. Humans produce it reluctantly.
Identifier
Any piece of information that singles you out from the crowd, a name, a number, a fingerprint. Your Aadhaar number is an identifier. So is your Instagram handle. One is 12 digits; the other is usually something you regret choosing at 15.
Schema
The blueprint that defines what kind of data goes where, like a form with pre-printed fields. The schema decides whether you're a Name + DOB + Address, or just a User ID + Email. You don't get to choose.
PII
Personally Identifiable Information, data that can be traced back to a specific individual. Name, phone number, Aadhaar number, email. The kind of information you'd rather not find on a billboard.
Cookie
A small text file a website stores on your device to remember you between visits. Invented in 1994, the digital equivalent of a shopkeeper who never forgets a face, except this shopkeeper also tells his friends.

Reference

Sources

  1. 1

    Truesdell, Leon E. The Development of Punch Card Tabulation in the Bureau of the Census, 1890–1940. US Government Printing Office, 1965.

  2. 2

    UIDAI Aadhaar Dashboard. Unique Identification Authority of India.

    → source
  3. 3

    Black, Edwin. IBM and the Holocaust: The Strategic Alliance Between Nazi Germany and America's Most Powerful Corporation. Crown Publishers, 2001.

SEARCH