
Pop-Tarts & Predictions
11 minGolden Hook & Introduction
SECTION
Joe: Most of human progress, from the scientific method to a child learning about the world, is built on asking one simple question: 'Why?' Lewis: Right, it’s our fundamental driver. Why did that happen? What’s the cause? Joe: But the authors of the book we're discussing today argue that in our modern world, 'why' is becoming an obsolete question. They claim the future belongs to those who ignore 'why' and focus only on 'what.' Lewis: That feels deeply wrong. And a little bit terrifying. Joe: It’s a deeply unsettling, and incredibly powerful idea. That's the central, mind-bending argument in Big Data: A Revolution That Will Transform How We Live, Work, and Think by Viktor Mayer-Schönberger and Kenneth Cukier. Lewis: And it's worth noting who these authors are. This isn't just a tech-bro fantasy. You have Viktor Mayer-Schönberger, a professor of Internet Governance at Oxford, and Kenneth Cukier, a long-time data editor for The Economist. They approached this from both academic and journalistic angles. Joe: Exactly. The book became a massive bestseller in the early 2010s because it was one of the first to explain to a general audience that 'big data' wasn't just about having more information. It was about a fundamental, and sometimes uncomfortable, shift in how we think. Lewis: A shift that starts with this controversial idea of ditching 'why.'
The 'What' Over the 'Why': The Revolution of Correlation
SECTION
Joe: Let's dive right into that. For centuries, we operated in a world of data scarcity. If you wanted to understand something, you had to start with a hypothesis—a theory about why something was happening—and then collect a small, precise sample of data to test it. Lewis: Like a clinical trial. You have a theory about a drug, you test it on a small group. Joe: Precisely. But now, we live in a world of data abundance. We can collect and analyze massive, messy datasets about almost anything. The authors call this the shift to "N=all"—analyzing all the data, not just a sample. And when you do that, you can let the data speak for itself. You don't need a hypothesis anymore. Lewis: Okay, that sounds abstract. Give me a real-world example. Joe: The book gives a classic one: Walmart. In 2004, their data analysts were sifting through mountains of transaction data, looking for patterns. They found that when a hurricane was forecast to hit a coastal area, sales of two things spiked unexpectedly. Lewis: Let me guess. Plywood and batteries? Joe: Good guess, but no. It was flashlights... and strawberry Pop-Tarts. Lewis: Strawberry Pop-Tarts? That makes absolutely no sense. Why? Joe: And that’s the revolutionary part! Walmart’s analysts didn't know why. They didn't care why. They just knew what was happening. So, when a hurricane warning was issued, they moved pallets of strawberry Pop-Tarts to the front of their stores, right next to the hurricane supplies. And sales went through the roof. Lewis: Wow. So they acted on a correlation without needing a cause. But that feels so unsatisfying. Aren't we just becoming mindless pattern-matchers? It reminds me of what the book calls the 'McNamara Fallacy.' Joe: You're hitting on a key risk. Robert McNamara, as Secretary of Defense during the Vietnam War, was obsessed with quantifiable data. He measured success by 'body count.' If the numbers went up, he thought we were winning. He was measuring the 'what' but completely missed the 'why'—the political and social reality on the ground, which led to disaster. Lewis: Exactly! So how is the Pop-Tart example any different? Joe: The authors argue it's about the stakes and the application. For selling Pop-Tarts, knowing 'what' is enough. It's low-risk, high-reward. For fighting a war, you absolutely need the 'why.' The trick is knowing when correlation is sufficient. They point to the original Google Flu Trends. By correlating search terms like "fever" and "cough" with CDC data, Google could predict flu outbreaks faster than the government. They didn't need to know the biological 'why' of the virus, just the 'what' of people's search behavior. Lewis: But that system eventually failed, didn't it? It started over-predicting the flu because it couldn't distinguish between people who were actually sick and people who were just worried because the flu was in the news. Joe: It did, and that's a perfect illustration of the limits. Correlation is powerful, but it's not magic. It's a tool, and like any tool, it can be misused or misunderstood. But the core idea remains: in a big data world, we can often get incredible value just by knowing 'what' is happening.
The Hidden Value: Datification and Digital 'Exhaust'
SECTION
Lewis: Okay, so if we accept this 'what over why' idea, how does it actually create value? It's not just about predicting Pop-Tart sales, right? Joe: Exactly. This leads to the second big idea in the book: 'datification.' It's a clunky word, but a powerful concept. It’s the process of turning aspects of the world that we never thought of as 'data' into quantifiable information that a computer can analyze. Lewis: What do you mean? Joe: Think about Google's project to scan all the world's books. The first step was just taking pictures of the pages—that's digitization. But the magic happened when they used Optical Character Recognition to turn those pictures into text. That was datification. Suddenly, the words weren't just for human reading; they were data that could be searched, indexed, and analyzed on a massive scale. Lewis: And that data has value beyond just letting me search for a quote. Joe: Immense value. This leads to the concept of 'data exhaust.' It's the digital trail we leave behind, often unintentionally, that can be repurposed for something completely new. The best example from the book is the spell checker. Lewis: Right, I remember using Microsoft Word in the 90s. The spell checker was pretty good. Joe: It was, and Microsoft spent millions of dollars and years of effort creating it. They hired linguists to build a massive, curated dictionary. Google, on the other hand, built a superior spell checker for almost free. Lewis: How? Joe: By using our data exhaust. Every time someone misspelled a word in the Google search bar—say, they typed "definately"—and then clicked on the suggestion "definitely," they were providing a free, tiny piece of data. They were teaching Google's algorithm the correct spelling. Google just collected billions of these corrections. Lewis: So our mistakes became their asset. That's brilliant and a little bit sneaky. Joe: It's the core of the big data business model. The data you provide for one purpose—your search query—is repurposed for a secondary, incredibly valuable use. Lewis: That’s a great way to put it. So data exhaust is like the digital equivalent of sawdust. A lumber mill's main product is wood, but the sawdust is a byproduct that you can collect and press into particleboard, a whole new product. Our misspelled searches are the sawdust. Joe: That's a perfect analogy. And it’s everywhere. Every 'like' on Facebook is data exhaust that trains their algorithm. Every flight search on Farecast, the company that first predicted airline prices, was data exhaust used to refine their predictions. The authors argue that the companies that will win are the ones that are best at collecting and repurposing this digital sawdust.
The 'Minority Report' Problem: Prediction, Punishment, and the Need for Control
SECTION
Lewis: This is all incredibly powerful, but it's also starting to sound a bit scary. If you can predict flu outbreaks and Pop-Tart sales, you can also predict human behavior. Where does this all lead? Joe: It leads directly to the dark side of big data, which the authors spend a lot of time on. The most chilling application is what they call 'predictive punishment.' It’s essentially the plot of the movie Minority Report. Lewis: Where they arrest people for crimes they are about to commit. Joe: Exactly. The book points out this isn't science fiction anymore. Police departments are already using 'predictive policing' algorithms to identify crime 'hotspots' and even individuals who are statistically likely to commit a crime. There are companies that generate a 'medication adherence score' for you based on your consumer data, predicting how likely you are to take your prescribed medicine. Lewis: Wait, so your credit score could affect your healthcare? That's insane. Joe: It is, and it highlights the danger. We are being judged not on our actions, but on our propensities. And the authors remind us this isn't a new problem. Data has always been a tool of power. They cite chilling historical examples: the Nazis used the famously efficient Dutch civil registries to identify and round up Jewish citizens with devastating effectiveness. In the U.S., the Census Bureau provided data that helped the government intern Japanese-Americans during World War II. Lewis: Wow. So we're building this incredibly powerful machine without an off-switch or a steering wheel. What do we do? Do the authors offer any hope? Joe: They do. They argue that our old models of control, like 'notice and consent' for privacy, are broken. You can't consent to a future use of your data that hasn't been invented yet. Instead, they propose a shift in responsibility. The burden shouldn't be on us to protect our data, but on the companies using it to do so responsibly. Lewis: And how do you enforce that? Joe: This is their most interesting idea. They propose creating a new class of professionals: 'algorithmists.' Lewis: Algorithmists? Like an auditor for algorithms? Joe: Precisely. Just like we have financial auditors to check a company's books, or a newspaper has an ombudsman to investigate reader complaints, an algorithmist would be an independent expert who can audit these 'black box' systems. If an algorithm denies you a loan or flags you as a risk, the algorithmist could investigate. They would check if the data was biased, if the model was fair, and if the conclusions were sound. Lewis: That actually makes a lot of sense. It introduces human accountability back into the machine. It’s a check on the 'dictatorship of data' they warn about. Joe: It's about ensuring that these powerful tools are used to augment human judgment, not replace it entirely.
Synthesis & Takeaways
SECTION
Joe: When you pull it all together, you see this three-part revolution the book lays out. First, a new way of knowing, where we prioritize the 'what' of correlation over the 'why' of causation. Second, a new source of value, created by datifying the world and repurposing our digital exhaust. And third, a new set of dangers that force us to rethink privacy, justice, and control. Lewis: It's a huge shift. The book came out over a decade ago, and it feels more relevant than ever. It was quite polarizing; some critics felt it was too optimistic and business-friendly, while readers found it to be a crucial, accessible guide to a changing world. Joe: I think its real legacy is that it forced us to have this conversation. The book's ultimate message isn't that data is king, but that we're at a critical juncture. We've built this incredibly powerful engine, and now we have to decide how to govern it. Lewis: It leaves you with a big question. The authors say that in a world saturated with data, human qualities like creativity, intuition, and even the capacity for error become more valuable, not less. That's where the truly new ideas, the ones the data can't predict, come from. Joe: The spark of invention comes from what the data doesn't say. Lewis: Exactly. So the real challenge isn't just managing the data, but making sure we don't let it manage us. Joe: It's a huge topic, and we've only scratched the surface. We'd love to hear what you think. Does this feel more like an opportunity or a threat? Let us know your thoughts. Lewis: This is Aibrary, signing off.