The subsystem architecture theory says that operational systems usually capture and process business events. In Rick’s lectures about the system, he would point out that the transaction files, which are archived off as soon as they are created, could be used instead of the posted balances. In July of 1995 I was asked to help build a pharmaceutical company litigation support database. A number of pharmaceutical companies were being sued for antitrust practices. They had agreed to join up and provide data that could be analyzed to show that their pricing practices were not anti-competitive.
Find the Event File
The first step to apply SAFR to any problem is to identify the event file. The event file contains business events; or more simply, transactions. Most systems have them stored somewhere. It is normally made up of a series of fields that simply have codes in them like 8456, QRM, etc., and have at least one amount somewhere in the record. They aren’t very meaningful in and of themselves, but they contain the raw data from business events.
The pharmaceutical companies created computer tapes of their sales files for the prior 5 years. These were pretty raw dumps of data; there was very little programmer time involved in making them. However, the records were identifiable, and representatives from each company could tell us what each field meant. In some cases the source system had changed during those five years, or sales were recorded in more than just one system, so we received multiple record formats for one company.
SAFR Components
SAFR at the time was composed of three major divisions, the Developer Workbench, the Scan Engine which produced multiple outputs called Viewpoints, and the Insight Viewer.
Our work begins with the Developer Workbench.
Define Metadata
SAFR as a tool was designed to be data independent. In other words, it was designed to read different types of files without having to change or write specific programs. The tool’s programs adjust to interpret the data in the file. The first thing to do is to define the file to SAFR, and its associated structure or layout. We call these Logical Files and Logical Records (LRs); logical because they may be used to describe records in many different files.
Logical Records are composed of many fields or attributes. SAFR is able to interpret different kinds of field formats. For example, on mainframes numeric data is often stored in a format called “packed.” SAFR can make sense of the numbers in these fields if we tell it how to interpret them.
The file definition, the LR, and the fields are part of the SAFR metadata. Metadata is data that describes data. A system administrator creates these once, and they are used over and over and over again.
Test Metadata by Creating a View
The next step on the project was to test to make sure the metadata was defined correctly. To do this, we construct a SAFR View. Back in The Ivory Tower, Eric passed out a piece of paper with a very simple architecture diagram on it. It described SAFR, and it looked somewhat like this:
Eric explained that each report generated is simply that user’s view of those business events. The accounting view of the data is independent from other legitimate views. Thus the Views don’t change the business events; they simply summarize and report them.
The business event repository isn’t necessarily some mysterious structure with hundreds of different database tables all interlinked and updated by a web of processes. Rather as in the case of the litigation support warehouse, it could be thought of containing a single type of record, with all the desired attributes on it. In data warehouses this is often called the fact table. This generalization isn’t quite accurate for the pharmaceutical data, in that each company’s record format was slightly different, but the concept is not much different.
Creating a view requires the following steps:
Step 1: Specify view format and level
What output do we want SAFR to produce? Do we want a file that can be processed in another system, do we want a report, a file to download? Do we want each record read to be an output record, or do we want to summarize the inputs records creating fewer, accumulated output records? We specify those on the View Properties screen.
Step 2: Select the event logical record
Which event file in the business event repository has the data we want to report on? For example, let’s assume we want a phone list of people on our street, and we have an event file with names and address of all the people in the country.
Step 3: Specify the logical file
Let’s assume the actual names and addresses are broken into files by state. We select the logical file containing our state.
Step 4: Select output fields
Next we select which fields from the LR we want on the output report. Let’s assume we want name and phone number on our report.
Step 5: Specify sort order
Then we specify in what order we want the fields sorted on the output. For example, a typical name and address report is sorted in ascending order by last name, then first name.
Step 6: Filter business events
Last we specify which records to include or exclude. Let’s assume we only want people who live on our street; so we’ll specify that the address field should contain the value “Maple Ave.”. Note that this field doesn’t have to be shown on our report, but is in the event file.
The next step in the process is to run the SAFR Scan Engine. This set of programs translates the metadata and view into a program. The program then reads the event file from top to bottom and produces the desired output.
Test Metadata
On the project, the first view we created after defining the metadata was to test if it had been created correctly. These views were very simple. We selected every field on the event file, in the order defined on the LR, and output them to another file. We ran the Scan Engine, and inspected the output file. If the output records looked like the input records, then the metadata (and the documentation provided from the pharmaceutical companies) was correct.
Finding the event file is the first step, even if not working with legacy transaction systems. Even in a database world, at the bottom of the database there are tables, and those tables are files. By examining the database structure and the SQL one can determine what the event file must be to start the process; it’s often in the “from” clause.
Pharmaceutical Litigation Support Project Results
This was the first step in the pharmaceutical litigation support project. Having established the ability to read and produce reports from the raw data, the team continued over four and a half months to process data for 17 different companies, reading 10 billion and extracting 5.9 billion records, performing 8.5 billion joins, and processing 6,000 views.1
The starting step in building a business event based insight system is locating the event file. This often starts with the journal entries for an existing ledger, defining that to SAFR, and then testing that the metadata has been created correctly. Now, the next step is to balance the event file to a known report.