<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Sentrana Blog &#187; processing speed</title>
	<atom:link href="http://blog.sentrana.com/tag/processing-speed/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.sentrana.com</link>
	<description>Turning complexity into competitive advantage</description>
	<lastBuildDate>Thu, 02 Sep 2010 20:01:00 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Cheating Your Way into Business Visibility</title>
		<link>http://blog.sentrana.com/2009/06/12/cheating-your-way-into-business-visibility/</link>
		<comments>http://blog.sentrana.com/2009/06/12/cheating-your-way-into-business-visibility/#comments</comments>
		<pubDate>Fri, 12 Jun 2009 15:06:12 +0000</pubDate>
		<dc:creator>Christian Bonilla</dc:creator>
				<category><![CDATA[Managers View]]></category>
		<category><![CDATA[Tech Trends]]></category>
		<category><![CDATA[BI]]></category>
		<category><![CDATA[business intelligence and predictive analytics at a person’s fingertips]]></category>
		<category><![CDATA[cheat the laws of physics]]></category>
		<category><![CDATA[computer architecture]]></category>
		<category><![CDATA[data visibility]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[enterprise data management]]></category>
		<category><![CDATA[high-performance computing]]></category>
		<category><![CDATA[I/O]]></category>
		<category><![CDATA[maximizing the computer’s potential throughput]]></category>
		<category><![CDATA[optimal prices]]></category>
		<category><![CDATA[pricing]]></category>
		<category><![CDATA[processing speed]]></category>
		<category><![CDATA[RAID stack]]></category>
		<category><![CDATA[sentrana research]]></category>
		<category><![CDATA[the i/o curse]]></category>

		<guid isPermaLink="false">http://blog.sentrana.com/?p=269</guid>
		<description><![CDATA[Better business visibility vis-à-vis cheating A) the laws of physics, B) the limitations of storage, and C) the laws of math. Implementing some clever tricks – ranging from the conventional to the profoundly innovative – can give us a quantum leap in the rapid accessibility of information. ]]></description>
			<content:encoded><![CDATA[<p class="MsoNormal"><strong> <span style="font-weight: normal; ">Several weeks ago, I wrote a post about <a href="http://blog.sentrana.com/2009/03/29/what-happens-when-we-cant-keep-up-with-information/" target="_self">how the pace at which the world is accumulating information exceeds our ability to critically evaluate it</a>. For companies that make thousands or millions of marketing decisions every day in the form of price offerings, advertising placements and so on, this translates into making decisions that perpetually involve a greater amount of uncertainty relative to the amount of information we have. The root cause of this problem is a technological one: we do not have the computing power to slice and dice massive datasets in order to glean insight in time to support decisions. An even deeper explanation is that the gap between the rate of information accumulation in businesses and the pace of information transfer improvements will continue to widen at an increasing rate. This poses serious challenges to the capabilities offered by Business Intelligence, not to mention our ability to determine optimal prices. <span id="more-269"></span><br />
</span></strong></p>
<p class="MsoNormal"><span> An astute reader pointed out that solving this problem depends at least as much on software as hardware. Indeed, it is a deft blend of hardware as well as software optimization that provides the best hope for making vast reams of information imminently accessible. At the core of this information accessibility problem lies the most inescapable of all culprits – the Laws of Physics.<span> </span>Although the laws are immutable, there is hope.</span></p>
<p class="MsoNormal">No matter how fast the chips powering our computers become, there is a bottleneck between hard disk storage and main memory, or RAM. This condition is referred to as being “I/O bound” (I/O stands for input/output – essentially how fast information can be transferred from the disk to the processing units in a computer). Within a computer’s main memory, all activities are performed electronically, which essentially means varying levels of “rather fast.” The major disadvantage of typical disk storage systems is that reading information from them requires mechanical motion (full disclosure: this paradigm of mass storage is currently facing a major disruption from Solid State Disk (SSD) drives, although SSD drives are significantly more expensive today).</p>
<p class="MsoNormal">
<div id="attachment_270" class="wp-caption alignleft" style="width: 359px"><img class="size-full wp-image-270" src="http://blog.sentrana.com/wp-content/uploads/2009/06/hard_disk.jpg" alt="The Culprit - Read Head on a Hard Disk Drive" width="349" height="263" /><p class="wp-caption-text">The Culprit - Read Head on a Hard Disk Drive</p></div>
<p class="MsoNormal"><span>This mechanical motion significantly increases the time it takes to read and write information, slowing system performance. The computer’s throughput is thus bound by not by how fast electrons move, but by how fast the disk can rotate and the disk-head repositioned &#8211; to the tune of roughly 200 MB per second. We are physically bound to this mechanical limit, but through some clever tricks ranging from the conventional to the profoundly innovative can give us a quantum leap in the rapid accessibility of information.<br />
</span></p>
<p class="MsoNormal"><span> </span></p>
<p class="MsoNormal"><strong><span>Step 1: Cheat the Laws of Physics</span></strong></p>
<p class="MsoNormal"><span> </span></p>
<p class="MsoNormal"><span>If we are ever going to use data to make analytically-driven business decisions, we have to get into the technical weeds just a bit. A good first step is for us to put all of the information in our enterprise databases into what is referred to as a “RAID stack.” RAID stands for Redundant Array of Independent Disks. What this allows us to do (specifically, in what is termed a “RAID-0” configuration), is break up blocks of data and spread them across multiple disk drives. Breaking up information in this manner greatly improves I/O performance by distributing the load across many channels and drives. For reasons we won’t go into here, there is a limit to how many disks we can distribute data across and still get the desired results. Ultimately the attainable performance improvement tops out at 1.6 GB/s (8 disks at 200 MB/s). </span></p>
<div id="attachment_271" class="wp-caption alignright" style="width: 591px"><img class="size-full wp-image-271" src="http://blog.sentrana.com/wp-content/uploads/2009/06/raid.jpg" alt="Striping Data in a RAID-0 Array" width="581" height="265" /><p class="wp-caption-text">Striping Data in a RAID-0 Array</p></div>
<p class="MsoNormal">The computer’s RAM has a read/write speed of around 7 GB/s, but if we can only write information into memory at 1.6 GB/s, then we are still under-saturating RAM and the CPU. The CPU can perform calculations at roughly 10 GB/s, which means that a dual quad-core chip architecture (8 cores) allows for 80 GB/s of calculations. We are still not close to maximizing the computer’s potential throughput. Whether or not we can devise a way to maximize the potential of our machines, however, is the difference between only having the bandwidth to perform deep analytics on the top 10% of your product catalog on the one hand, and being able to quickly analyze not only all of your products, but all possible customer-product combinations. Optimizing your business, in fact, depends on the ability to overcome this challenge.</p>
<p class="MsoNormal">A second way to circumvent I/O boundedness is through compression. By compressing the text stored in the database tables, we can easily achieve a 5:1 gain in throughput as well. This means that the 1.6 GB that has to be picked up by the disk head now gets unpacked into 8 GB by the CPU (although the performance cost of decompression can be high in some cases, there are advances that we can use to get us past this hurdle as well). So by combining two well-known tools right off the bat, we have already achieved a performance improvement of several orders of magnitude compared to how much time it used to take to get 8GB of data to the CPU. Think about how much more of your customer data this allows you to analyze in the same amount of time. But we still have a long way to go before we max out how much information the computer is capable of processing. <span> </span></p>
<p class="MsoNormal"><strong><span>Step 2: Cheat the Limitations of Storage</span></strong></p>
<p class="MsoNormal"><span>Let’s say that over a two-year span, a company records 25 million individual transactions in its database, and that we are interested in knowing the total sales it made of a specific item: SKU5893. This section describes how a typical database would go about answering that query.</span></p>
<p class="MsoNormal">The manner in which we typically store data is not always the most conducive for high-performance computing. Most databases store information as collections of rows. Each row denotes a single unit of interest, such as a sales transaction. Each row has a number of columns that describe the attributes of that transaction, such as the data, customer’s name, item purchased, price, and so on. This can present some problems when data needs to be accessed in certain ways. If you wanted to find all the records involving a specific SKU, it might require scanning across 20 or more columns before getting to the column you need. The value in that column then has to be checked against the desired value to see if it matches, and then this search-and-check process is repeated for the next row until we have searched through the entire table. In massive data sets, all of this search time adds up and creates a crippling performance bottleneck. But there is a way out.</p>
<p class="MsoNormal">If it seems to you that there has got to be a way to store sales data more manageably, right? By taking advantage of what is called vertical fragmentation, we can do just that. Imagine that our transaction table only has five fields: Location, Customer Name, Item, Price, and Date, as in the example below.</p>
<p class="MsoNormal"><span> </span></p>
<div id="attachment_275" class="wp-caption alignnone" style="width: 464px"><img class="size-full wp-image-275" src="http://blog.sentrana.com/wp-content/uploads/2009/06/table_1c_blog.jpg" alt="Row-Based Table" width="454" height="151" /><p class="wp-caption-text">Row-Based Table</p></div>
<p>With tens of thousands of transactions each day, this table quickly accumulates a lot of rows. However, if we decide to orient this table by its columns instead of its rows, we would get the following:</p>
<p><img class="alignnone size-full wp-image-276" src="http://blog.sentrana.com/wp-content/uploads/2009/06/table_2b_blog.jpg" alt="table_2b_blog" width="519" height="137" /></p>
<p class="MsoNormal">We now have five two-column tables after adding a unique id field to each column that maps back to the information for each row. Several important results come from this new orientation. First, notice how much repetition of data we have in certain tables such as Date and Location. For instance, the value “1/1/2007” will be repeated thousands of times in this table. The gains we can achieve by compressing vertically fragmented tables far exceed what we can achieve with row-based tables because run-length encoding (and other compression techniques) because the data model better supports it. The second crucial point is that vertical fragmentation enables us to send to the CPU only the information that it needs to see. The disk-head does not need to scan across columns of data that it doesn&#8217;t need – so the 200MB/s that it is able to read is focused only on the column necessary for the query. Tack on another several orders-of-magnitude in I/O improvements.</p>
<p class="MsoNormal"><strong><span> <span>Step 3: Cheat the Laws of…Math</span></span></strong></p>
<p class="MsoNormal"><span>Now we’re getting somewhere. Data striping, compression, and vertical fragmentation provide a huge boost to the volume of information that we can access and process – indeed we are now getting to and probably exceeding the CPU&#8217;s number-crunching ability of 80GB/s.<span> </span>This brings us to our final bottleneck:<span> </span>we can’t speed up how quickly the chip can do the math required for heavy BI analytics. </span></p>
<p class="MsoNormal"><span>The solution lies not in the CPU, but in many CPUs. Distributed computing allows us to bring more CPUs into the mix while also feeding them from their own disks – what is referred to as a “shared nothing” architecture. If you use ten machines, you can distribute a a billion rows of sales data evenly across all ten machines, leaving 100 million rows on each. Now each machine is executing at warp speed on only a portion of the database, completing our search for sales of a specific SKU in a fraction of the normal processing time. </span></p>
<p class="MsoNormal"><span>Integrating all of the techniques that we have covered can be summed in the following steps:</span></p>
<ol type="1">
<li class="MsoNormal"><span>Distribute a database across multiple machines, so      that each gets some fraction of the total number of rows in the original</span></li>
<li class="MsoNormal"><span>On each machine, vertically fragment the database      section stored on it</span></li>
<li class="MsoNormal"><span>Stripe the data across multiple independent disks      on each machine</span></li>
<li class="MsoNormal"><span>Finally compress that data, </span></li>
</ol>
<p class="MsoNormal"><span> We have now achieved massively parallel and high performance processing. Each machine now runs through all of the information it has and sends the relevant information to the CPU, which can finally hit the processing limits of current chip architectures. The I/O curse long since in our rear-view mirror, we can finally begin to unlock the incredible amount of information latent in our very own data.</span></p>
<p class="MsoNormal"><span> </span></p>
<p class="MsoNormal"><strong><span>Sentrana’s Role</span></strong></p>
<p class="MsoNormal">I should note that the above story is an idealization. Real-world implementation of the interconnecting innovations that I have outlined here confront serious challenges of their own: decompression taxes the CPU; RAID is expensive and the energy drain of all those disks can be significant; management of multiple machines that are co-operating in a cluster is itself a sophisticated systems administration task, and the list goes on. Though valid concerns all, the more important point is that these are all solvable problems. These real-world implementation problems are the ones that Sentrana’s research has focused on in order to put business intelligence and predictive analytics at a person’s fingertips. Truly great business insights are like scientific discoveries – they stem from first asking an important question and then breaking it down into manageable pieces so that it can be answered. In order to support those moments of intuition in which the momentous questions are first asked, we have to be as fast as that other great computer at every decision-maker’s disposal.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.sentrana.com/2009/06/12/cheating-your-way-into-business-visibility/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>What Happens When We Can’t Keep Up with Information</title>
		<link>http://blog.sentrana.com/2009/03/29/what-happens-when-we-cant-keep-up-with-information/</link>
		<comments>http://blog.sentrana.com/2009/03/29/what-happens-when-we-cant-keep-up-with-information/#comments</comments>
		<pubDate>Sun, 29 Mar 2009 21:48:33 +0000</pubDate>
		<dc:creator>Christian Bonilla</dc:creator>
				<category><![CDATA[Tech Trends]]></category>
		<category><![CDATA[data storage]]></category>
		<category><![CDATA[demand for data storage]]></category>
		<category><![CDATA[economic downturn]]></category>
		<category><![CDATA[financial services]]></category>
		<category><![CDATA[HDDs]]></category>
		<category><![CDATA[historical data]]></category>
		<category><![CDATA[information]]></category>
		<category><![CDATA[microprocessor]]></category>
		<category><![CDATA[Moore's Law]]></category>
		<category><![CDATA[processing speed]]></category>
		<category><![CDATA[revenue optimization]]></category>
		<category><![CDATA[solid state and flash memory shipments]]></category>

		<guid isPermaLink="false">http://blog.sentrana.com/2009/03/29/what-happens-when-we-can%e2%80%99t-keep-up-with-information/</guid>
		<description><![CDATA[I ran into a former colleague the other day who, as it turns out, recently left his job and presently spends his days building options pricing models and trading from home on his own accounts. In turn, I described to him some of the recent work that we have done in revenue optimization and particularly [...]]]></description>
			<content:encoded><![CDATA[<p>I ran into a former colleague the other day who, as it turns out, recently left his job and presently spends his days building options pricing models and trading from home on his own accounts. In turn, I described to him some of the recent work that we have done in revenue optimization and particularly the breakthroughs that we have engineered for processing data. His face scrunched up a bit, and his response was uncharacteristically blunt: “You can always process numbers quickly if you need to,” he smirked.</p>
<p>Not so, in fact. When you start asking extremely detailed questions that require combing through years of detailed historical data and then performing mathematical transformations on each of those figures, you will find out rather quickly the limits of processing speed when your results finish compiling in a week or so. The thing is that most of us never push up against the processing speed frontier. We can see that every year computers get faster, chips get smaller, and Excel seems to have more rows. Moore’s Law prevails. The trouble is that all the while the rate at which the data universe expands is screaming past advances in processing capabilities, and that rate does not fluctuate with the economic downturn. Consider the markets for microprocessors, which allow us to perform those calculations and manipulate data, and hard drives, which allow for storage of data. Microprocessor sales have been dealt a sharp blow by the global downturn as computer sales have slowed, but worldwide shipments of hard disk drives (HDDs) roughly maintained 2007 levels even in the worst quarters of the recession (and the drives themselves contain more memory).  Solid state and flash memory shipments were down, but the evidence suggests that this is due to consumers substituting HDDs for other types of memory, rather than simply not storing more information. The demand for data storage, while not completely recession-proof, is nonetheless of the hardier variety.</p>
<p>Simply put, information of all kinds accumulates faster than we can analyze it. We are losing the race, and the gap is widening, not shrinking. As for what this ultimately means, I will now make a rather dour point. A fashionable explanation for the recession among both politicians and many “Main Street” types is that greed is what did us in. The greed of the bankers, the hedge funds, the fat cats, the small cats, whomever &#8211; greed is the culprit. But that doesn’t explain everything by a long shot. Even the greediest person doesn’t want the party to end and the money to stop coming in. Might it be possible that they weren’t able to ask the questions that might have led to certain debt instruments having never been created? Financial services employees have more information available to them than decision makers any other industry, and still here we find ourselves. Think about how many times each day similarly misinformed decisions are made inside corporations all across the world. The information is there, but we are more often than not letting it rot on the docks.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.sentrana.com/2009/03/29/what-happens-when-we-cant-keep-up-with-information/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>
