So what does an Internet search engine for quantitative analysis look like? How must the next generation of systems be constructed to unlock the potential power of quantitative data?
The answer may surprise some. It does not begin from scratch, but rather with today’s trusted systems of record – in summary. These systems have been developed over decades, the processes honed through incremental improvements and rigorous quality audits. By and large these systems capture accurate results – in summary. These are today’s financial, risk, management, and other similar systems.
The answer continues by finding the detailed business events that feed these existing systems. These detailed business events may go by other names, including economic events, recorded acts of commerce, or perhaps simply transactions. All of today’s quantitative analytical systems begin with these events as inputs. When originally captured, these business events contained, or at least had hope of containing or providing the basis for deriving, all the attributes – the letters of the alphabet A through Z – used in the subsequent aggregation and analytical processes that were implemented.
To get to a quantitative analytical engine as powerful as an Internet search engine, we will need to keep these business events at a much lower level of detail than any of the systems we build today. If we want to understand accumulated quantities for occurrences that contain the combinations of A and T, we have to be able to find transactions where A and T occurred together, and then accumulate the quantities.
We cannot approach the problem like many of today’s quantitative systems and attempt to accumulate all combinations of events for every combination of A through Z before someone asks the question, and then present the results if that question ever gets asked. If an Internet search engine followed a similar approach, it would be programmed to attempt to anticipate every question that might ever be asked, having all answers stored prefixed by the question: instead of searching for answers, the index would search for questions and then present the stored answers. Imagine the complexity of attempting to anticipate every question, let alone updating all those questions every time something changed on the Internet affecting the stored answers?
Instead, our new and improved quantitative analytical engine, similar to a search engine, must keep the raw materials for answering the questions. Our question can then work with any combination of attributes to select events of interest for analysis. There is no other way.
This introduces a problem though: as we have noted, quantitative analysis is not like an Internet search because these business events must be accumulated. Thus our question typically is not looking for one transaction or event, but the accumulated results of many events. In addition to the steps needed for textual search engine processing, which starts with Select, and a somewhat analogous step of Sort1, we need the truly unique processing step of Summarize.
These additional steps are performed in the existing systems – using data that has been reduced in size by summarization. These steps require much more computer processing power than the single step of Select. That is because many of the questions require handling many more business events. Many quantitative questions require all the business events for an entire month, quarter, year or more. Even for medium-sized enterprises this might mean several million business events.
Although computers continue to get faster all the time, building a system capable of performing such work can be very, very, very expensive. It would be very expensive for a company to own on their own the infrastructure necessary for a private copy of an Internet search engine. Yet as we have noted, the value of this quantitative data is so high that it must be kept within the company – a publicly accessible search engine cannot provide the answers. So each company must establish their own such environment, but how is it possible to ever get to Internet search engine levels of effectiveness for quantitative data?
The answer lies in creating a consolidated “data supply chain.” Today’s organizations have many systems which capture business events, select relevant events for the questions they are intended to answer, sort those events, and summarize them to provide the answers. This is a “data supply chain”. As noted, they have one for the letters A G, another for the letters H L, another specialized one for the letters M and N, and so forth. A consolidated data supply chain would capture and keep the business events at a much lower level of detail, and allow selection, sorting, and summarization to be done in response to questions asked of the data for any of the letters.
A consolidated data supply chain would eliminate the cost of maintaining multiple systems all of which accumulate quantitative data, replaced by a single system which serves the data purposes of the other systems. These cost savings, coupled with increased revenues from better decision making, and greater efficiencies and effectiveness from the quantitative search engine approach, will pay for the system for those organizations that undertake to build it.
The SAFR software and teams have been building such systems since before the advent of the Internet. SAFR is: (1) an information and reporting systems theory, (2) refined by 25 years of practical experience in creating solutions for a select group of the world’s largest businesses, (3) distilled into a distinctive method to unlock the information captured in business events, (4) through the use of powerful, scalable software for the largest organization’s needs, (5) in a configurable solution addressing today’s transparency demands.
The Theory
Companies expend huge sums of money to capture business events in information systems. Business events are the stuff of all reporting processes. Focusing reporting systems on exposing business events combinations can turn data into information much more rapidly, with much greater accuracy, and at much lower per report cost than any other available approach.
The Experience
Although analysis of business events holds the answers to business questions, as noted they aren’t to be trifled with, particularly for the largest organizations. Reporting processes – particularly financial reporting processes – accumulate millions and billions of business events. In fact, the balance sheet is an accumulation of all the financial business events from the beginning of the company! Such volumes mean that unlocking the information embedded in business events requires fundamentally different approaches. The 25 years of experience of building SAFR in services engagements has exposed, principle by principle, piece by piece, and layer by layer, maybe the only viable way, yet identified after all these years.
The Method
This experience has been captured in a method of finding and exposing business events, within the context of the existing critical reporting processes. It uses today’s recognized financial data supply chain, like a compass pointing north, to constrain, inform, and guide identification of additional reporting details. It facilitates definition of the most important questions to be answered, and configuring repositories to provide those answers consistently. It also explains how to gradually turn on the system without endangering existing critical reporting processes.
The Software
The infrastructure software is a hard asset with hundreds of thousands of lines of source code and a feature set rivaling some of the best known commercial software packages.
The Scan Engine is the heart of SAFR, performing in minutes what other tools require hours and days to do. The Scan Engine is a parallel processing engine, generating IBM z/OS machine code. In one pass through a business event repository, it creates many business event “views” or outputs providing rich understanding. It categorizes, through join processes, the business events orders of magnitude more efficiently than other tools. Built for business event analysis, in numerous tests on existing infrastructure at a number of companies, it has consistently achieved a throughput of a million records a minute. It is highly extensible to complex problems. Desired views are defined in the SAFR Developer Workbench or rule based processes in the SAFR Analyst Workbench or in custom developed applications. The Scan Engine executes as a scheduled process, resolving all specified requests in one execution.
The Indexed Engine provides one at a time view resolution through on-line access to Scan Engine and other process outputs. It uses Scan Engine performance techniques. Reports structure and layout are dynamically defined in the Analyst Workbench. The Indexed Engine creates reports in a fraction of the time required for other tools. Managed Insights users select parameters to drill down to increasing levels of business events, and perform multidimensional analysis through the Viewpoint Interfaces. The Insight Viewer enables discovery of business event meaning in an iterative development mode.
The Solution
The SAFR Infrastructure Software has been configured over a decade for a number of the largest financial services organizations to provide an incredibly scalable Financial Management Solution (FMS).
AL is fed daily business events from the source systems. The business events become complete journal entries at the customer-contract level, in the Accounting Rules Engine. Rules are under control of finance rather than embedded in programs in source systems, enabling Finance to react to changes in financial reporting standards.
The business event journal entries are posted by the Arrangement Ledger on a daily basis, while it simultaneously performs multi-currency, intercompany eliminations, GAAP reclassification, year-end close and other processing. It accepts and properly posts backdated entries, and summarizes daily activity to pass to the General Ledger. The General Ledger provides another control point for the traditional accounting view of the data. The Arrangement Ledger performs reclassification and accepts summary level GL adjustments, keeping the arrangement detail aligned with the summary General Ledger.
AL also accepts customer/contract descriptive information with hundreds of additional attributes to describe each customer-contract, and counterparty or collateral descriptive attributes. This enables “pivoting” financial balances by a nearly unlimited set of attributes, not just the traditional accounting code block. Extract processes produce various summaries, ultimately numbering in the thousands, to support information delivery for not only traditional accounting but also statutory, regulatory, management, and risk reporting. The SAFR one pass capability enables AL to load the repository, generate data, generate new business events, and create extracts all in one process. The SAFR AL outputs are loaded into the incredibly information rich Financial Data Store, and its attendant reporting applications.
Conclusion
The principles built up line upon line in this solution and outlined hereafter will help build the next generation of analytical and reporting systems. These principles include (1) the organizing nature of focusing on the stability of business events, rather than attempting to predict future desired analyses; (2) this in turn demanding greater levels of detail in reporting applications; (3) which delays aggregation processes until much closer to report time; (4) requiring greater scale of processing; (5) which can only be made cost effective by analytical process consolidation; (6) and all of this made practical by the tempering act of summarizing historical business events when necessary – a true balancing act.
There is no other course. These processes are employed in today’s systems in a haphazard and crosswise way to the principles involved because they are not recognized and understood. Employing them explicitly in the next generation of systems can fundamentally change decision making processes for individuals and organizations. And it can do so in the near term, having great benefits to society. But these new systems look very different than what we have today.
Understanding these principles requires being grounded in the basics of reporting and analytics, starting with the original such system: accounting. To do that, we need to return to school.
Next: Part 2: The Professor
Previous: Chapter 2: The Problem
Parent: Part 1: The Pearl