Big Data: Big Competition, Big Potential, and Big Rewards

It is SurvivorThe Celebrity Apprentice, and statistics mixed together. Big-data analytics firm Kaggle is holding big competitions that are the Olympics for math geeks and tech whizzes. More than 300 teams are battling each other to create a predictive model to see which insurance policies customers will buy, based on their shopping history. Allstate (ALL) sees big potential finding the answer because the longer it takes for a quote to pop up on its website, the more likely customers will shop elsewhere. The blue-chip insurance firm is dangling a big reward of $50,000. The deadline is May 19.

Kaggle has 14 competitions currently underway. They range from categorizing sentiment in sentences on the movie-review website Rotten Tomatoes to business problems to reconstructing connections between neurons in the brain. So far 176 teams are facing off to create a model that can predict store sales based on historical mark down data from Wal-Mart (WMT) stores in different regions. The world’s largest retailer will crown the winners with job offers.

The most crowded competition of all with 1,264 teams is the “Kaggle Getting Started Competition.” Mental athletes have to devise a method using Excel, Python and other software tools to predict which Titanic passengers were mostly likely to survive the 1912 shipwreck. The contestants get access to learning tutorials. The prize to be had is knowledge.

Kaggle specializes in conducting predictive analytics with an army of scientists around the world. Clients include NASA, Ford (F), Facebook (FB) and a variety of other blue-chip names. The San Francisco-based startup founded in 2010 helped NASA find a way to more accurately record images of dark matter. It forecasted demand to improve sales for a beverage maker with $10 billion in revenue. Researchers studying endangered whales contacted Kaggle to develop an algorithm to accurately detect noises made by different species.

Analyzing goldmines of big data from the likes of (AMZN) and Google (GOOG) and turning it into useful information can bring big monetary rewards. And it is big competition in the information age. It has big potential to finding answers to the most vexing problems in health care and education.

Crunching raw data into useful information is akin to turning raw oil in the ground into gasoline, says Andreas Weigend, formerly the chief scientist at For example, deciding whether people are creditworthy by looking at their behavior on Tencent’s communications platform, WeChat. Knowing that they hang out with drug dealers reveals how likely they will repay a loan, says the independent big-data consultant. Just as refining oil can lead to an explosion, collecting big data opens the floodgates for credit card data leaks and other security breaches.

Civis Analytics, based in Washington DC and Chicago, uses big data to solve problems on education and health care. On the education front, Civis is using data to find low-income, college-bound students who intend to go to colleges deemed beneath their potential and get them to go to better ones. On the insurance front, it is trying to root out two kinds of people: the uninsured and those with health risks such as cardiovascular disease. Dan Wagner, who served as the chief analytics officer for President Obama’s 2012 reelection campaign, founded the firm.

Researchers at the Icahn Institute for Genomics and Multiscale Biology in New York City are collecting medical and scientific data such as the a cancer patient’s genome sequence, which spits out a terabyte of data. Eric Schadt, a professor and organization’s director says they’re crunching data into highly sophisticated mathematical models to predict whether people are prone to a disease and which treatments would be best for them. A major obstacle to using big data in health-care research is complying with patient-privacy laws.

One of Kaggle’s biggest frustrations is that companies want to keep their datasets to themselves thereby limiting potential in a big way. William Cukierski, a data scientist at Kaggle is working hard to convince people there are big rewards to be had by disclosing their big data and that it is not a big deal for their competition.

By Trang Ho


Bloomberg Businessweek
