Cheating Your Way into Business Visibility
Christian Bonilla | June 12th, 2009Filed under: Managers View, Tech Trends | Tags: BI, business intelligence and predictive analytics at a person’s fingertips, cheat the laws of physics, computer architecture, data visibility, database, enterprise data management, high-performance computing, I/O, maximizing the computer’s potential throughput, optimal prices, pricing, processing speed, RAID stack, sentrana research, the i/o curse | 1 Comment »
Several weeks ago, I wrote a post about how the pace at which the world is accumulating information exceeds our ability to critically evaluate it. For companies that make thousands or millions of marketing decisions every day in the form of price offerings, advertising placements and so on, this translates into making decisions that perpetually involve a greater amount of uncertainty relative to the amount of information we have. The root cause of this problem is a technological one: we do not have the computing power to slice and dice massive datasets in order to glean insight in time to support decisions. An even deeper explanation is that the gap between the rate of information accumulation in businesses and the pace of information transfer improvements will continue to widen at an increasing rate. This poses serious challenges to the capabilities offered by Business Intelligence, not to mention our ability to determine optimal prices.
An astute reader pointed out that solving this problem depends at least as much on software as hardware. Indeed, it is a deft blend of hardware as well as software optimization that provides the best hope for making vast reams of information imminently accessible. At the core of this information accessibility problem lies the most inescapable of all culprits – the Laws of Physics. Although the laws are immutable, there is hope.
No matter how fast the chips powering our computers become, there is a bottleneck between hard disk storage and main memory, or RAM. This condition is referred to as being “I/O bound” (I/O stands for input/output – essentially how fast information can be transferred from the disk to the processing units in a computer). Within a computer’s main memory, all activities are performed electronically, which essentially means varying levels of “rather fast.” The major disadvantage of typical disk storage systems is that reading information from them requires mechanical motion (full disclosure: this paradigm of mass storage is currently facing a major disruption from Solid State Disk (SSD) drives, although SSD drives are significantly more expensive today).

The Culprit - Read Head on a Hard Disk Drive
This mechanical motion significantly increases the time it takes to read and write information, slowing system performance. The computer’s throughput is thus bound by not by how fast electrons move, but by how fast the disk can rotate and the disk-head repositioned – to the tune of roughly 200 MB per second. We are physically bound to this mechanical limit, but through some clever tricks ranging from the conventional to the profoundly innovative can give us a quantum leap in the rapid accessibility of information.
Step 1: Cheat the Laws of Physics
If we are ever going to use data to make analytically-driven business decisions, we have to get into the technical weeds just a bit. A good first step is for us to put all of the information in our enterprise databases into what is referred to as a “RAID stack.” RAID stands for Redundant Array of Independent Disks. What this allows us to do (specifically, in what is termed a “RAID-0” configuration), is break up blocks of data and spread them across multiple disk drives. Breaking up information in this manner greatly improves I/O performance by distributing the load across many channels and drives. For reasons we won’t go into here, there is a limit to how many disks we can distribute data across and still get the desired results. Ultimately the attainable performance improvement tops out at 1.6 GB/s (8 disks at 200 MB/s).

Striping Data in a RAID-0 Array
The computer’s RAM has a read/write speed of around 7 GB/s, but if we can only write information into memory at 1.6 GB/s, then we are still under-saturating RAM and the CPU. The CPU can perform calculations at roughly 10 GB/s, which means that a dual quad-core chip architecture (8 cores) allows for 80 GB/s of calculations. We are still not close to maximizing the computer’s potential throughput. Whether or not we can devise a way to maximize the potential of our machines, however, is the difference between only having the bandwidth to perform deep analytics on the top 10% of your product catalog on the one hand, and being able to quickly analyze not only all of your products, but all possible customer-product combinations. Optimizing your business, in fact, depends on the ability to overcome this challenge.
A second way to circumvent I/O boundedness is through compression. By compressing the text stored in the database tables, we can easily achieve a 5:1 gain in throughput as well. This means that the 1.6 GB that has to be picked up by the disk head now gets unpacked into 8 GB by the CPU (although the performance cost of decompression can be high in some cases, there are advances that we can use to get us past this hurdle as well). So by combining two well-known tools right off the bat, we have already achieved a performance improvement of several orders of magnitude compared to how much time it used to take to get 8GB of data to the CPU. Think about how much more of your customer data this allows you to analyze in the same amount of time. But we still have a long way to go before we max out how much information the computer is capable of processing.
Step 2: Cheat the Limitations of Storage
Let’s say that over a two-year span, a company records 25 million individual transactions in its database, and that we are interested in knowing the total sales it made of a specific item: SKU5893. This section describes how a typical database would go about answering that query.
The manner in which we typically store data is not always the most conducive for high-performance computing. Most databases store information as collections of rows. Each row denotes a single unit of interest, such as a sales transaction. Each row has a number of columns that describe the attributes of that transaction, such as the data, customer’s name, item purchased, price, and so on. This can present some problems when data needs to be accessed in certain ways. If you wanted to find all the records involving a specific SKU, it might require scanning across 20 or more columns before getting to the column you need. The value in that column then has to be checked against the desired value to see if it matches, and then this search-and-check process is repeated for the next row until we have searched through the entire table. In massive data sets, all of this search time adds up and creates a crippling performance bottleneck. But there is a way out.
If it seems to you that there has got to be a way to store sales data more manageably, right? By taking advantage of what is called vertical fragmentation, we can do just that. Imagine that our transaction table only has five fields: Location, Customer Name, Item, Price, and Date, as in the example below.

Row-Based Table
With tens of thousands of transactions each day, this table quickly accumulates a lot of rows. However, if we decide to orient this table by its columns instead of its rows, we would get the following:

We now have five two-column tables after adding a unique id field to each column that maps back to the information for each row. Several important results come from this new orientation. First, notice how much repetition of data we have in certain tables such as Date and Location. For instance, the value “1/1/2007” will be repeated thousands of times in this table. The gains we can achieve by compressing vertically fragmented tables far exceed what we can achieve with row-based tables because run-length encoding (and other compression techniques) because the data model better supports it. The second crucial point is that vertical fragmentation enables us to send to the CPU only the information that it needs to see. The disk-head does not need to scan across columns of data that it doesn’t need – so the 200MB/s that it is able to read is focused only on the column necessary for the query. Tack on another several orders-of-magnitude in I/O improvements.
Step 3: Cheat the Laws of…Math
Now we’re getting somewhere. Data striping, compression, and vertical fragmentation provide a huge boost to the volume of information that we can access and process – indeed we are now getting to and probably exceeding the CPU’s number-crunching ability of 80GB/s. This brings us to our final bottleneck: we can’t speed up how quickly the chip can do the math required for heavy BI analytics.
The solution lies not in the CPU, but in many CPUs. Distributed computing allows us to bring more CPUs into the mix while also feeding them from their own disks – what is referred to as a “shared nothing” architecture. If you use ten machines, you can distribute a a billion rows of sales data evenly across all ten machines, leaving 100 million rows on each. Now each machine is executing at warp speed on only a portion of the database, completing our search for sales of a specific SKU in a fraction of the normal processing time.
Integrating all of the techniques that we have covered can be summed in the following steps:
- Distribute a database across multiple machines, so that each gets some fraction of the total number of rows in the original
- On each machine, vertically fragment the database section stored on it
- Stripe the data across multiple independent disks on each machine
- Finally compress that data,
We have now achieved massively parallel and high performance processing. Each machine now runs through all of the information it has and sends the relevant information to the CPU, which can finally hit the processing limits of current chip architectures. The I/O curse long since in our rear-view mirror, we can finally begin to unlock the incredible amount of information latent in our very own data.
Sentrana’s Role
I should note that the above story is an idealization. Real-world implementation of the interconnecting innovations that I have outlined here confront serious challenges of their own: decompression taxes the CPU; RAID is expensive and the energy drain of all those disks can be significant; management of multiple machines that are co-operating in a cluster is itself a sophisticated systems administration task, and the list goes on. Though valid concerns all, the more important point is that these are all solvable problems. These real-world implementation problems are the ones that Sentrana’s research has focused on in order to put business intelligence and predictive analytics at a person’s fingertips. Truly great business insights are like scientific discoveries – they stem from first asking an important question and then breaking it down into manageable pieces so that it can be answered. In order to support those moments of intuition in which the momentous questions are first asked, we have to be as fast as that other great computer at every decision-maker’s disposal.
[...] Other related breakthroughs are required to make this feasible in a business setting (see Cheating Your Way Into Business Visibility). To make better informed decisions with an understanding of the likelihood of possible outcomes, [...]