Chapter 2: The Problem

If you are old enough, think of the world before you had access to an Internet search engine; if you have always had access to the Internet, think of your world without it. Think of the length of time it would take you to answer certain types of questions. For example, if you have to make a purchasing decision, how would you gather information? If you have to travel some place new, how would you receive directions, and how detailed or accurate would they be? What kinds of questions would simply go unanswered, and at what costs?

Having considered the value of rapid access to information, now consider the limits to the types of information available through an Internet search engine. When was the last time you entered a number into a search term, except for an address or a year? It’s likely not often. The data we search for in an internet search engine is mostly textual, not quantitative.

Have you ever wondered about the value of the results in the search engine? Don’t get me wrong; I am not suggesting there is no value to the information available on the Internet. But consider that, by and large, search engine results are from information people gave away for free, or at most sold for advertising. By definition, the value cannot be consistently high because the highest value data isn’t given away or sold at advertising rates.

Lastly, the type of the data, by and large, tends to be what is called unstructured data. The words on this page are unstructured. Structured data, on the other hand, tends to look much more like what is gathered in a spreadsheet, with each row representing an occurrence, and columns headings describing what is in each cell on the row. Race results or donor lists are typical examples of structured data that can be found on the Internet.

The amount of structured data not released on the internet likely dwarfs the structured data released. This is because the majority of structured data is deemed high value, and by its nature isn’t easily digestible by search engines. Thus it is retained by the owners of the systems that gather it: businesses, governments, and organizations.

So if an Internet search engine provides valuable information, but it tends to be textual, (not quantitative), of questionable or moderate value (rather than consistently high value), and unstructured (rather than structured), then where do we search for quantitative, high value, structured data?

As an individual, the place we most often search for this kind of data is perhaps in our own financial data; your Quicken accounts, or checking or credit card registers. However, most of the time these “searches” are different from an Internet search. We certainly do search periodically for a specific transaction—a single row of data. Perhaps we need to know if we paid a certain bill. If our data was accessible by a search engine, it might be very useful in these situations.

Greater value, however, often comes from using this data in a different way, something a search engine cannot usually help with. This value comes from analyzing the results of your spending or the results of your investing. It answers questions like “Where does most of my money go?” or “How much have I paid in taxes in the last five years?” A search engine cannot answer these types of questions based upon your raw financial data, even if we were to put it on the Internet, because the answers require aggregation.

Aggregation is the process of combining rows of structured data , accumulated over time, to produce a balance. Search engines, by and large, do not do this. Rather they present each occurrence as an answer (and thus the numerous pages of search results from searching common terms). However, seeing each 401(k) transaction or each cell phone payment provides little insight; it is the cumulative results that provide insights.

The business world is not without systems capable of answering quantitative questions. In fact, these are some of the oldest systems in business, the financial systems. They are the bedrock of the capital markets. They are well established, highly controlled and, by and large, very reliable. Unreliability has serious implications for those involved, and eventually gets noted, when earnings are restated or businesses are investigated upon failure.

Yet for all the usefulness of these systems, those that use them consistently would not describe them as having the same simplicity of use as an Internet search engine. Why is that? There are two fundamental reasons.

The first is that when these systems were constructed, they were designed to provide information on a small set of attributes. For example, the General Ledger typically only answers financial questions about legal entities, cost or profit centers and perhaps product or type of customer, etc. A different system must be used to understand performance of departments or individuals; a third for specialized questions about compliance with regulations. And we haven’t even touched on systems which accumulate risk, trading positions, or a host of other accumulated financial metrics.

Said another way, it would be as if an Internet search engine indexed all documents on the Internet, but only for the letters A – G, another engine for the letters H – L, another specialized one for the letters M and N, and so forth. Questions which use more than one group of letters cannot be answered by searching just one engine.

The second reason is that decisions about how to aggregate the data were predetermined when the systems were turned on. Some systems accumulate results monthly, others perhaps less frequently and some more frequently. Some accumulate average daily balances, others do simple addition. Once the data is aggregated for one period or using one method, it cannot be “disaggregated” or even converted into another form, since the underlying detail is typically long gone.

Thus although in the last 15 years almost everyone has come to understand the efficiencies that might come with better information from an Internet search engine, quantitative results have not progressed at the same speed. Some have argued that “the epochal shift from qualitative to quantitative perception…” in the late Middle Ages and Renaissance “made modern science, technology, business practices, and bureaucracy possible”1. Organizations that recognize that the world is again ripe for a similar leap forward in the ability to effectively use all this quantitative data will be best positioned for the world that will undoubtedly emerge.

Next:  Chapter 3: The Solution

Previous:  Chapter 1: Introduction

Parent:  Part 1: The Pearl

Table of Contents

1 Alfred W. Crosby, The Measure of Reality: Quantification and Western Society, 1250 – 1600, [Cambridge University Press, © 1997], preface material.]