Chapter 27. Find More Detailed Events

Starting work at the insurance company had happened a year earlier than the financial warehouse effort. Rick received an e-mail from Mike Schroeck about an opportunity at this insurance company. Rick never liked to commit his people full-time to other partners for projects; he was very careful about that. I don’t know why I responded the way I did when he told me about the e-mail. I said, “Rick I want to go do this.” Little did I know what a journey I had started. It lasted on and off for nearly seven years.

I was met on the project by Lynn Groves Zuccala and Anthony Boles, and later by Peter Corbett. They had worked with the insurance company for a number of months helping them craft a vision for the reporting environment. The company had concluded “…to build a single, consistent source of financially balanced data to be used by multiple business users for decision support, data mining, and internal/external reporting.”1 They determined that keeping event level data would provide the most flexible reporting framework possible.

The project needed to find an event file, at the policy and loss level, for statistical (rather than financial) reporting. Mark Kimmell, another project team member, worked on this with Doug Kunkel. They were both guided by Vanessa Menke from the client who really understood the data. Mark had started looking for the detailed events in this same Fire Divisional Processing system that Jim ended up using later for the financial warehouse “Show me the money” effort. The file they had to work with was anything but typical.

Lower Level Detail

The file at the heart of their home grown “ERP” system had the flavor of a home grown database system. The data was stored in a unique format in the file called Stacked Data Elements (SDE), and a set of utilities created by the company had to be used to interpret the file. This required Doug to write a program, called a user or a read exit, that SAFR Scan Engine called to read the file before the data was presented to the views. This program exposed the richness of the information contained in this file.

Mark and Doug found that this file had the policy and loss level transactions in it. It recorded changes to policies, the structure and parameters of the policy and each individual loss, policy renewals and payments, loss estimates and payments – all the business events at the core of the fire and casualty insurance business. This work required a number of months because of its more detailed nature and expansive scope. But the results were remarkable.

Jim later took this same file and found the additional records which were summary journal entries for the financial warehouse project. Whereas on Jim’s report the entries were summarized to something like the business unit, cost center, account level, Mark’s set of views could add policy number as well, where applicable.

Through this means, the company could produce policy level trial balances. The attributes on these trial balances weren’t limited to the small set of items on the journal entries; they could be by term, or risk, or policy, any of the elements that were captured by the policy and loss admin systems.

Insurance Statistical Warehouse Project Results

The insurance company created detailed Operations Data Stores (ODSs) which maintained significant descriptions of the property and casualty policies along with their premium and loss transactions and balances for 7 years of data. It isn’t just the ability to put the data into these systems that is significant, but the ability to also get it out, nightly for a variety of reporting purposes including regulatory reporting. The environment also balances and feeds the financial reporting environment; thus regulatory reporting at the most detailed levels is consistent and balanced with all financial reporting.

The following are statistics complied by Kevin Bly, the system architect for the Fire ODS, and Lyn Kilhoffer, the system architect for the Auto ODSs, on August 8th, 2001, about five years after having started the project on the first Fire system.2

Fire Auto
The Daily process on 10 CPU machine capable of 1,934 MIPS but limited to 80% of capacity by workload governor

Extract started at 03:15 AM

Extract completed at 03:45 AM

30 minutes of elapsed time

21 parallel threads were processed

The Daily process on 10 CPU machine capable of 1,934 MIPS but limited to 80% of capacity by workload governor

Extract started at 02:08 AM

Extract completed at 02:37 AM

Approximately 30 minutes of elapsed time

38 parallel threads were processed at any given time (169 total threads processed)

CPU Time:
00:28:54 minutes of clock time elapsed

02:55:02 hours of CPU time elapsed

00:29:15 minutes of clock time elapsed

01:11:08 hours of CPU time elapsed

Equates to 6.05 CPU engines running at full capacity during those 30 minutes of elapsed time. Equates to 2.43 CPU engines running at full capacity during those 30 minutes of elapsed time.
Records Processed:
906 input files were allocated

493 output files were allocated

Totaling 1,399 files allocated for I/O

1,029 input files were allocated

529 output files were allocated

Totaling 1,558 files allocated for I/O

Bytes Processed:
2.4 billion records read (190 Gigabytes)

72 million records written (10 Gigabytes)

Totaling 1.4 million records (118.35 Megabytes) processed

258 million records read (67 Gigabytes)

264 million records written (72 Gigabytes)

Totaling 297,333 records (81.21 Megabytes) processed

873 million ODS Joins were performed

105 million non-ODS Joins were performed

Totaling 978 million joins (563,624 joins per second)

293 million ODS Joins were performed

310 million non-ODS Joins were performed

Totaling 603 million joins (343,255 joins per second)

I am not aware of any batch process performing a larger workload in a shorter amount of time anywhere.3

The following diagram depicts the architecture components of the financial and statistical ODSs.

Figure 57. Insurance Company Financial & Statistical Architecture Diagram


Efficiently comparing values from two different events files—the summary and the detailed event file—requires slightly different views than Jim’s sub-totaled values from one file. Jim had to swap between two different reports and see if the subtotals match. A more efficient approach is to have a column in the report show any difference. To do this requires building something that looks like two different views reading the two different LRs, the summary and detailed LRs, but the results get combined into one output.

This is called a multiple source view, but we’ll describe it as if it were two views. Here are the common elements of both of them:

  • Specify view format and level: Summary hardcopy (printable) report.
  • Select the event logical record: Source 1: Summary event file and Source 2: Detailed event file.
  • Select output fields: Select the common fields, such as business unit or ledger account and the amount from each logical record.
  • Specify sort order: Select business unit, cost center, account.
  • Filter business event to include or exclude. None if all the records were included in the report.

To build this view, it is important to understand the difference between two key phases of the Scan Engine: the Extract and the Format phases.

Figure 58. Performance Engine Phases
The following table summarizes some of the key differences between the extract phase processes and the format phase.
Function Extract Phase Format Phase
In general output record count with no filtering:
  • Summary Views

One event record out for every record read

One record out for every sort key combination

  • Detail Views
One event record out for every record read One event record out for every record read

Filter Logic

Against input event file fields

Against view columns

Calculating and Subtotaling:
  • Logic
Against input event file fields Against view columns
  • Level
Detailed event file record level Detail or summary level
Record order Random as sequenced in event files and through parallelism In sorted order by view sort fields


Detailed, multi level joins, used in filter and calculation logic and column assignment

Only at summary level for sort titles on limited view output types.
Not used in filtering or calculations

Table 59. Extract and Format Phase Differences

So the extract phase works on event file records, and the format phase works on extracted records. Because the Format phase works on extracted outputs, common fields in different files can be combined into a single output. This is a different kind of “join” processing that is very efficient called a sort/merge.

The following figures show the view definitions and results of processing from the Extract and Format phases.

Figure 60. View Definition and Extract Phase output from Summary file
Figure 61. View Definition and Extract Phase output from Detail file
Figure 62. Format Phase output after summarization and calculation

This approach to processing isn’t strange to anyone who has written many batch programs. In fact, the type of program described in the Figure 32 figure performed this same function. It read two different types of records, and combined them into a single output based upon common keys and summarization.

These same approaches, using multi-sourced SAFR views, can be used to do union, full outer, and right outer joins in many cases.


At a later client, the team working with the client developed a formal method for transitioning from higher level summaries to lower level sources. This process was called tranching, since subsets of the full set of lower level sources, or feeds, can be brought in in “tranches” or groups (there is no need to convert all the sources at once, nor do they each need to be done individually one at a time). The process was really fairly simple in concept, but very powerful. It was used not just in the analysis phase of the project, but actually in production to change the level of detail available within the financial reporting environment—effectively a new GL but at a very detailed level. It looked something like the following:

  1. Identify the more detailed source to substitute for a source already provided to a legacy general ledger environment.
  2. Turn the detailed source into journal entries at the detailed level, with effectively the same information as the summary level but more attributes. Test if the detail has the same results as the summaries did.
  3. Capture the summary balances in the General Ledger for the source on a specified day and create reversal entries of all the balances, with the offset going to a due to/from account.
  4. Capture the balances from the source system on the same specified day, and create opening entries for all the balances with the offset going to the opposite side of the due to/from account.
  5. Load both sets of journal entries created from point in time balances onto the reporting environment, effectively removing the summary balances, and initializing the detailed balances.
  6. Stop the daily summary level feed to the legacy general ledger.
  7. Start the daily feed of the detailed data into the new reporting environment.

In this way, more detailed feeds can be substituted for existing summary feeds, without disrupting the reporting environment. These more detailed feeds can then be used to provide a richer reporting environment. Because this is a controlled process, it can go on, tranche by tranche, for years, as the organization tackles improving their financial reporting environment. In this way ultimately all summary feeds can be replaced with more detailed equivalents.

So after finding the event file, then balancing that file to a known source, we expanded the elements available to report on by searching for more detailed files and repeating the balancing process. The next step is to begin to define reference data to describe and classify those business events.

1 Client data warehouse strategy, 1997.
2 Statistics from presentation by Kevin Bly to Decision Support and Technical Leadership Group, Oct. 1, 2001.
3 The company found that the one pass architecture of SAFR meant that actually making the back-up copies during the daily process was more efficient than having a separate process read and write the data again through a backup utility at a different time of the day. So the daily critical path process includes the back-up process as well.