In Praise of Small Data
Larry Maloney | November 25, 2014“Big Data” has taken the business world by storm. The digital age and the spread of social networks have enticed companies into making huge investments in computer power, servers and networking technology to acquire enormously complex data sets. One recent survey found that 73% of companies have already invested or plan to invest in Big Data in the next 24 months, up from 64% in 2013.
But what are the real payoffs from this pursuit of Big Data? Software developer Ugur Kadakal is skeptical. A Ph.D. engineer and founder of Pagos, a Boston-area software firm, Kadakal says companies would realize greater benefits from analyzing smaller, more relevant sets of data. He shared his thoughts on the value of “Small Data” with Engineering 360 contributing editor Larry Maloney.
Maloney: More and more companies are targeting Big Data as a pathway to smarter business strategies. Just what is included under the Big Data umbrella?
Ugur KadakalKadakal: Let me first define “Small Data,” because what is not Small Data falls into the domain of Big Data. Fundamentally, Small Data is the information that companies gather from their essential business operations: human resources, sales, production, procurement, inventory and the like. Big Data goes beyond this and includes such things as social media, email, web site traffic and continuous log data on the functioning of equipment ranging from computers and factory machines to devices that a company sells, be it a vending machine or a heart pacemaker..
Maloney: What else distinguishes these two types of data?
Kadakal: Literature on Big Data often refers to the “three Vs”: volume, velocity and variety. For Small Data, volume involves data streams primarily in the realm of megabytes and gigabytes. It’s the kind of data that you can handle with equipment you purchase from Best Buy. In contrast, Big Data volume keeps moving ever upward, from terabytes to petabytes and beyond. Rather than a single computer, you often need a server farm to handle it all.
The second V – velocity – refers to how fast the data is processed. A good example of Small Data velocity is pulling a data extract from your customer relationship management (CRM) system for a particular time period, perhaps to answer questions you might have on service complaints. Another example would be the financial data that your company analyzes monthly or quarterly.
Big Data tends to be acquired in real time, such as the ongoing data that flows from monitoring sensors placed on machines at a factory or power plant. That real-time Big Data can trigger real-time actions, such as automated shutdown of a machine to avoid catastrophic damage or worker injury. On the other hand, that real-time data on machine function can also become valuable Small Data, if it is analyzed by plant engineers on a regular basis to devise better maintenance schedules.
Finally, there’s the category of “variety.” In Small Data, the data tends to be in structured formats for ready analysis, such as a CRM data base or an Excel file. Much of Big Data, however, is unstructured. For example, how are you going to analyze trends from the thousands of email messages that flow through a company every year, or from the mountains of documents or engineering drawings that sit on people’s computers?
Maloney: To what extent are businesses using Big Data effectively?
Kadakal: Without a doubt, there can be a lot of value in Big Data, but unfortunately very few companies are using it effectively. It’s a new field with many unknowns, and most companies aren’t as yet getting a good return on their investment. Before plunging into large investments on Big Data technology, companies should make sure that they are doing all they can with the Small Data they are gathering. Many large companies, even those that are the most data centric, still don’t realize the full potential of Small Data.
It’s very easy to overlook or misinterpret Small Data that you deal with every day. I like to use the example of player salaries in the National Basketball Association. My company used our Visart visualization software to analyze salary data from 1991 to 2014. The data size consists of 11,000 rows and takes up 400 kilobytes on an Excel file.
A quick analysis would lead you to believe that Michael Jordan was the highest paid player ever, since the software displayed major spikes in 1997 and 1998 when Jordan earned $30 million and $33 million, respectively. No one came close to that until Kobe Bryant earned $30 million in 2014.
But further analysis reveals that, in terms of total career earnings, Jordan barely makes the top 100. Kevin Garnett is the #1 career earner. Companies can make similar wrong assumptions if they don’t to a good job of analyzing trends and relationships in the Small Data they gather.
Maloney: What can companies do to insure that they fully exploit Small Data?
Kadakal: Make sure that you focus on data sets that are most critical to your business or department success. Once you’ve gathered this data, use it. Too often, this valuable data just sits there. Empower your own managers to analyze the data. It also helps to employ tools that will make your data come alive, such as visualization software.
More Resources:
IHS Quarterly Big (data) Insights
Harvard Business Review "You Might Not Need Big Data After All"
Visart Data Visualization Software