
Big Data. La revolución de los datos masivos
10 minIntroduction
Narrator: In 2009, as the H1N1 flu virus spread across the globe, public health officials at the Centers for Disease Control and Prevention (CDC) were in a race against time. Their traditional methods for tracking outbreaks relied on reports from doctors, a system that lagged one to two weeks behind the virus's actual spread. But a different group was getting results faster. Engineers at Google, by analyzing billions of search queries for terms like "flu symptoms" and "fever," were able to predict the spread of the virus in near real-time, outperforming the official government models. This wasn't just a minor improvement; it was a fundamental shift in how we can understand and react to the world.
This power to find new insights from massive, often messy, sets of information is the subject of Big Data: A Revolution That Will Transform How We Live, Work, and Think by Viktor Mayer-Schönberger and Kenneth Cukier. The book argues that we are at the beginning of a profound transformation, one that changes our relationship with information and forces us to rethink everything from business strategy to the nature of justice itself.
The End of Sampling
Key Insight 1
Narrator: For most of human history, understanding the world meant working with small, carefully chosen samples. When the U.S. Census Bureau faced the 1880 census, the sheer volume of data was so overwhelming that it took eight years to tabulate, rendering the information obsolete. The solution for the 1890 census was Herman Hollerith’s punch-card machine, a technological marvel that was still a laborious, expensive way to process all the data. This reliance on sampling was a necessary shortcut in an age of information scarcity.
Big Data upends this tradition. With modern computing power, it is now possible to analyze all the data, a concept the authors refer to as "N=All." This allows for a level of granularity and insight that samples could never provide. For example, when economists Steven Levitt and his colleague analyzed eleven years of sumo wrestling data—over 64,000 bouts—they didn't need a sample. By looking at the entire dataset, they uncovered a subtle pattern of corruption. They found that wrestlers with a 7-7 record, who desperately needed one more win to maintain their rank and income, had an unusually high success rate against opponents who had already secured their winning record. A small sample would have missed this anomaly, but analyzing the entire dataset made the pattern clear.
Embracing the Mess
Key Insight 2
Narrator: The second major shift in the Big Data era is a move away from an obsession with perfect, pristine data. In the past, when data was scarce, accuracy was paramount. Today, the authors argue, it is often better to have a massive, messy dataset than a small, perfect one.
The story of Google Translate provides a powerful illustration. In the 1990s, IBM developed a machine translation program called Candide, which was trained on a carefully curated and translated corpus of three million sentences. It was a high-quality, but small, dataset. Google took a different approach. It fed its translation system with billions of pages of text from the entire internet—a messy, uncurated collection full of errors, slang, and imperfect translations. The result? Google's system, powered by a far greater quantity of "confused" data, became vastly more accurate and versatile than its predecessors. The sheer volume of data compensated for its lack of quality, proving that in the world of Big Data, more is often better than better.
The Power of Correlation Over Causation
Key Insight 3
Narrator: For centuries, the quest for knowledge has been a quest for causation—the search for the "why" behind events. Big Data, however, often prioritizes correlation—understanding "what" is happening. Knowing that two things are related can be incredibly valuable, even if the reason for the connection remains a mystery.
The classic example comes from Walmart. By mining its vast transaction data, the retail giant discovered that just before a hurricane, sales of not only flashlights and batteries but also strawberry Pop-Tarts, would spike. Walmart didn't need to understand the causal reason for this craving. They simply needed to know the correlation. Armed with this knowledge, they began stocking Pop-Tarts near hurricane supplies before a storm, and sales soared. Similarly, Amazon’s recommendation engine, which drives a third of its sales, doesn't need to know why people who buy a certain book also tend to buy a specific kitchen gadget. It only needs to know that the correlation exists to make a successful recommendation. This shift from "why" to "what" allows for faster, more efficient decision-making based on data-driven predictions.
Datification - Turning the World into Data
Key Insight 4
Narrator: Big Data isn't just about analyzing existing information; it's about a process the authors call "datification." This is the act of taking phenomena that were previously unquantified and turning them into data. This concept is older than computers. In the 19th century, a U.S. Navy officer named Matthew Fontaine Maury was injured and assigned to a desk job. There, he discovered a trove of old ship logs, which were considered useless paper records. Maury, however, saw their potential. He and his team "datified" the information, extracting details on wind, currents, and weather, and used it to create the first reliable charts of ocean winds and currents. He turned seemingly worthless information into a tool that revolutionized sea travel.
Today, this process is accelerating. Social media platforms datify our friendships and sentiments. LinkedIn datifies our professional experience. Even a car seat can be datified. Researchers in Japan developed a system with 360 pressure sensors in a car seat that could identify a driver with 98% accuracy, simply by quantifying their unique posture. Datification is about finding new ways to measure the world, unlocking the latent value in all aspects of our lives.
The Hidden Value of Data Exhaust
Key Insight 5
Narrator: One of the most powerful concepts in Big Data is the value of "data exhaust"—the digital traces we leave behind as byproducts of our online activities. This seemingly worthless data can be repurposed to create immense value.
Consider the spell checker. Microsoft spent millions developing its spell checker for Word, relying on a meticulously curated dictionary. Google, in contrast, built a superior system by recycling its data exhaust. When users typed a misspelled word into the search bar, like "flourescent," and then clicked on the corrected suggestion, "fluorescent," they were providing Google with a free, high-quality signal about the correct spelling. This feedback loop, powered by billions of user interactions, allowed Google to build a spell checker that was more comprehensive and accurate, covering nearly every living language at a fraction of the cost. This illustrates a core principle of Big Data value: the secondary use of data is often more valuable than its primary purpose.
The Dark Side of Prediction
Key Insight 6
Narrator: For all its benefits, Big Data carries significant risks. The authors warn of a "dictatorship of data," where we blindly follow algorithms, and the danger of "punishments based on propensities." This is the world imagined in the film Minority Report, where individuals are arrested for crimes they are predicted to commit in the future. This scenario moves from fiction to reality with the rise of predictive policing, where algorithms identify crime "hotspots" or even individuals likely to offend.
This approach threatens the very foundations of justice, which holds people accountable for their actions, not their predicted tendencies. It undermines free will and the presumption of innocence. The authors argue that while data can be a powerful tool for managing risk, it is a poor tool for assigning individual blame. The danger lies in using correlations to make causal judgments about people's lives. To prevent this, society must establish new safeguards that protect human agency and ensure that we are judged for what we do, not for what an algorithm predicts we might do.
Conclusion
Narrator: Ultimately, Big Data reveals that this revolution is not merely about technology; it is a fundamental shift in human understanding. The book's most important takeaway is that we are moving from a world that demands to know "why" to a world that can act powerfully just by knowing "what." This transition from causation to correlation, from small, precise samples to massive, messy datasets, unlocks unprecedented opportunities for innovation and progress.
However, this power comes with a profound challenge. As we become more capable of predicting human behavior, we must consciously decide how to use that knowledge. Will we use it to create a more efficient, safer, and healthier world, or will we allow it to erode privacy, freedom, and individual responsibility? The authors suggest that the solution lies not in halting progress, but in developing new forms of governance, including a new class of professionals they call "algorithmists" to audit these powerful systems. The ultimate question the book leaves us with is not whether we can use Big Data, but whether we have the wisdom to control it.