← DataFolks
0 Seedling 8 min read

Meeting Your Datafolk

You've never met your digital double, but it's been living a busy life. This introduction invites you to notice the quiet trail you leave behind every day, and to meet the datafolk assembled from your clicks, preferences, and habits.

Let’s start with something you did this morning.

You woke up. You reached for your phone. You checked the time, maybe scrolled a notification or two, possibly opened Instagram before your eyes were fully cooperating. Then you got out of bed, made tea or coffee (we don’t judge), and continued with your day.

Nothing remarkable happened. Certainly nothing that felt like a event. And yet, in those first five minutes of consciousness, you generated somewhere between 50 and 200 data points. Each one a tiny record of your existence, your location, your device, your habits, your attention, stored on servers you’ve never seen, in cities you may never visit, by companies whose names you might not recognise.

You didn’t sign anything. You didn’t agree to anything (well, you did once, three years ago, when you tapped “I Agree” on a terms-of-service document that was longer than most novels and considerably less entertaining). But the recording happened anyway, because that’s how the system works. It records. That is its nature. Asking the internet not to collect data is like asking a river not to be wet. You can have opinions about it, but the river doesn’t care.

Here is the thing nobody tells you: you have a double.

Your Other Self

Somewhere in the infrastructure of the internet, spread across databases in Mumbai, servers in Virginia, data centres in Singapore, there exists a version of you. It is not a copy. It is not a photograph. It is something stranger: an assembled version, stitched together from thousands of small observations, none of which are particularly interesting on their own, but which together form a portrait that is, frankly, uncomfortably accurate.

This assembled version knows what time you wake up (your phone logs it). It knows what you eat for breakfast (Swiggy knows). It knows your commute route (Google Maps knows, and so does your telecom provider). It knows how you spend money (your bank and UPI app know). It knows what you read, what you watch, what you skip, what you linger on for three seconds too long. It knows when you’re bored (your scrolling pattern changes). It knows when you’re anxious (you Google symptoms at 2 AM, and yes, everyone does this, and no, it’s never just a headache, according to the internet it’s always something terrible).

This version of you has no body, no voice, no feelings. It doesn’t know it exists. But it makes decisions on your behalf, or rather, decisions are made about you based on its profile. What ads you see. What content appears in your feed. What price you’re offered for a flight. Whether your loan application is approved. Whether you’re flagged for additional screening at an airport.

This book calls that version your datafolk.

The word is deliberately playful. “Data” because that’s what it’s made of. “Folk” because it’s a character, a version of you that lives in systems and has its own journey, its own story, its own adventures (mostly in databases, which are less exciting than they sound, but bear with me). Your datafolk is not you, but it represents you in every digital system you’ve ever touched. It speaks for you in rooms you’re not allowed to enter.

And it has been busy.

The Scale of the Thing

India has the world’s largest digital public infrastructure. This is not a boast, it is a fact with consequences.

UPI processed over 100 billion transactions in 2023.1 Each one generated data at five different entities. That’s 500 billion data records from payments alone. From one system. In one year. In one country. If you printed each record on a Post-it note, you would run out of Post-it notes. You would also run out of surfaces to stick them on. You would run out of planet.

Aadhaar covers over 1.37 billion enrolled individuals.2 Each enrolment includes ten fingerprints, two iris scans, a photograph, and demographic details. The largest biometric database in human history, stored on servers in Bangalore and Manesar, behind security protocols that the government assures us are very robust. (The government assures us of many things.)

DigiLocker stores digital copies of official documents. CoWIN tracked vaccinations for a billion-plus people. ONDC is building an open commerce network. Account Aggregator is connecting financial data across institutions. The India Stack, the collective name for this infrastructure, is vast, ambitious, and growing faster than most people’s ability to understand it.

Every one of these systems creates datafolk. Every interaction leaves a trace. Every trace is stored, processed, and, in ways that are rarely transparent, used.

The global numbers are no less staggering. Humanity generated approximately 120 zettabytes of data in 2023.3 A zettabyte is a trillion gigabytes. If you stored one zettabyte on standard Blu-ray discs, the stack would reach from Earth to the Moon. And back. Twice. We generated 120 of those.

Most of that data is about people. About you. About your datafolk.

Why This Book Exists

This book is not a warning. It is not a manifesto. It is not a guide to “protecting your privacy in 10 easy steps” (there are no easy steps; if there were, someone would have turned them into an app and that app would be collecting your data).

This book is an attempt to make the invisible visible.

Most Indians interact with digital infrastructure daily, scanning QR codes, tapping UPI, unlocking phones with fingerprints, sharing locations with cab apps, without any mental model for how these systems work. What happens when you scan that QR code? Where does your Aadhaar data live? Who can see your UPI transaction history? What is a packet, and why should you care? These aren’t academic questions. They affect 1.4 billion people. They affect you.

The problem is not that these systems are malicious. Most of them are genuinely useful. UPI is a marvel of financial inclusion. Aadhaar has given identity to millions who previously had none. The internet, for all its dysfunction, remains humanity’s most powerful tool for sharing knowledge, connecting people, and watching cat videos.

The problem is that these systems are opaque. They operate beneath the surface of daily life, processing your data in ways you can’t see and rarely understand. And opacity, in a system that affects a billion people, is not a design flaw, it’s a power imbalance.

This book exists to reduce that imbalance. Not by telling you what to think, but by showing you how the systems work. Once you can see the machinery, you can ask better questions. And better questions are the beginning of agency.

How This Book Works

Each chapter follows your datafolk through a different stage of their journey:

Chapter 1 traces their birth, how identity became something that could be recorded and stored, from ancient censuses to Aadhaar. You’ll learn why the form shapes the person, and why data collected for one purpose always finds other uses.

Chapter 2 puts them in motion, using Mumbai’s legendary dabbawala system as a lens for understanding how data moves across networks. Packets, protocols, routers, and the surprisingly physical infrastructure (submarine cables, data centres) that carries your digital life.

Chapter 3 takes them to market, the invisible bazaar where your datafolk are valued, traded, and profiled by ad-tech ecosystems and recommendation engines you’ve never heard of.

Chapters 4 through 7 follow them through prediction engines, data breaches, privacy tools, and finally ask: what does a healthier relationship with our datafolk look like?

Each chapter uses Indian systems, Indian metaphors, and Indian stakes. The dabbawala system. UPI payments. Aadhaar authentication. IRCTC bookings. WhatsApp forwards. The RBI’s data localisation mandate. This is not a book about Silicon Valley that happens to be sold in India. This is a book about Indian digital life, written for Indian readers.

The companion interactive lab lets you experiment with the concepts yourself. Trace a packet. Watch a UPI payment hop through six systems. See how data localisation changes where your information lives. The lab is not an afterthought, it’s where understanding becomes intuition.

A Note on Tone

You may have noticed that this book is not solemn.

This is deliberate. The subject of data and privacy has been written about extensively in a tone that ranges from “academic paper” to “the end is nigh.” Both tones have their place. But this book takes a different approach: it assumes you are curious before you are frightened, and it believes that a reader who finds the dabbawala-to-packet metaphor delightful is more likely to keep reading than one who feels lectured about surveillance capitalism before page 20.

The goal is not to make you paranoid. It is to make you literate. There is a difference. A paranoid person deletes all their apps and moves to a cave. A literate person understands the trade-offs, asks better questions, and makes informed choices, while still using UPI for chai, because honestly, who carries cash anymore.

Your datafolk will accompany you for the rest of your life. They were created without your full understanding, and they will outlive most of your passwords. Learning to see them, to understand where they go, who holds them, and what is done with them, is not about rejecting technology. It is about learning to live with it more consciously.

Let’s begin.


Next: How Datafolk Are Born, from census takers with clipboards to apps with cookies, tracing the origins of data collection.

Reference

Glossary

Datafolk
Your digital double, an assembled version of you that lives inside databases, algorithms, and servers. Built from clicks, preferences, transactions, and habits. It doesn't sleep, it doesn't forget, and it has terrible taste in ads.
Digital Footprint
The trail of data you leave behind as you move through digital systems. Unlike actual footprints, these don't fade in the rain. They get copied, sold, and analysed by people you've never met.
India Stack
India's layered digital public infrastructure, Aadhaar (identity), UPI (payments), DigiLocker (documents), and more. The largest digital governance experiment in history, affecting over a billion people who mostly just want their UPI to work.
Platform
A digital service that connects users and collects data from the interaction, Google, Instagram, Swiggy, PhonePe. The word makes them sound neutral, like a train platform. They are not neutral. They are the train, the track, and the ticket inspector, all at once.

Reference

Sources

  1. 1

    NPCI UPI Product Statistics, 2024. National Payments Corporation of India.

    → source
  2. 2

    UIDAI Aadhaar Dashboard, 2024. Unique Identification Authority of India.

    → source
  3. 3

    Statista. Volume of data/information created, captured, copied, and consumed worldwide from 2010 to 2025.

    → source

SEARCH