As part of its perpetual quest to reinvent and perfect its business model, IBM has made an aggressive push into the analytics market in the last half-dozen or so years. The company’s slick, though occasionally confusing ad campaigns (remember those ads with the mysterious red box being unveiled?) often announce its new initiatives, though it is not always clear that a new announcement is indeed a major one. In the analytics space, however, Big Blue does mean business. The announcement of its sizable new business analytics and optimization division is clearly intended to prove as much. Shortly after its announcement, IBM also unveiled a new stream computing platform called “System S” to much fanfare. The breathless enthusiasm of business journalists, technology bloggers and investment analysts has been palpable. But what exactly does this technological advancement do, and what does it mean for your business?
To answer this question, let’s begin briefly by dissecting what IBM has introduced. Imagine that you are receiving a continuous stream of data, such as stock prices on the Nasdaq. These figures must be quickly analyzed so that the proper buy and sell orders can be placed. Suppose that you also need to base your decisions not just on the Nasdaq prices but also the numbers figures coming in from dozens of other exchanges.
Just for fun, let’s say that you also want to include up-to-the-second weather information of 20 cities in your decision process. We can safely say that there are not a great deal of human beings in the world that could look at all of this information in the blink of an eye, make the best possible decision, and then repeat the exact same process a split-second later with new information. Well, this example highlights the best application for stream computing. In the stock price analysis outlined above, we needed to evaluate information coming to us on a continuous basis from a number of different sources, evaluate it, and deploy the result in a desired manner. Let’s say that we wanted analyze the correlation between Motorola’s stock price movements and the price movements of 50 other stocks on 50 different exchanges, as well as the outdoor temperatures of every single world capitol. IBM’s breakthrough allows you to do this and also update your analysis in real-time as new information is received. That’s the good news.
The bad news is that there are real limits to the insight that can be obtained this way. There are really two areas in which stream computing can be applied: link detection and itemset detection. These two are related – link detection tries to identify events that are correlated, and itemset detection tries to identify events that are co-incident (i.e. the prices of GE and Sony stock went up, AND the price of Motorola went up).
Itemset detection algorithms have progressed to the point where they can be accurate and effective without taxing the computer’s resources too heavily. IBM’s accomplishment lies in deploying this algorithm for real-time itemset detection on unprecedented scale, but ultimately, the business insight that can be extracted from correlation and co-incidence is limited. Frankly, IBM’s press release statement that System S allows users to “…create a forward-looking analysis of data from any source,” is a bit of a fairy tale. Any competent manager will tell you that “forward looking analysis” based only on correlations in historical data is not a recipe for success. With itemset detection (which is closely related to and sometimes confused with association mining), you can now look at more data more quickly than ever before, but you can’t ultimately do much new with it.
Truly forward-looking analysis requires going several steps beyond itemset detection, because making accurate predictions about events that have never happened requires that you do more than analyze past correlations. For example, how could you predict the highest price at which a customer would be willing to buy a product that they have never purchased before using only their previous purchase information? Or how about predicting the effect an advertisement will have in a market that you have never advertised in before? Integrating techniques that are better geared for this analysis such as principal components analysis, collaborative filtering, and others make these types of predictive analysis possible, not to mention significantly more robust than correlative analytics. The downside of integrating of all of these techniques into your analysis is that it becomes extremely expensive computationally. Other related breakthroughs are required to make this feasible in a business setting (see Cheating Your Way Into Business Visibility). To make better informed decisions with an understanding of the likelihood of possible outcomes, managers and policymakers alike need a way to reduce the uncertainty inherent in them. The aforementioned “virgin sale” process is a very micro-example that exemplifies this problem. Sadly, being able to comb through data more quickly by itself does not get us to that higher end-state. IBM can claim a notable achievement in computing, but hailing it as a quantum leap for businesses decision support understates the true challenge of making better business decisions.