Chapter 26. Balance the Event File

Two years or so later, in 1997 I was working at a very large US insurer. They had determined to replace their legacy general ledgers, and had independently come to the idea of an event-based architecture simply by considering how accounting works. I worked with Lloyd Jackson, an IT employee at the insurance company, for three or four months determining how might be the best way to go about the financial data warehouse project.

I remember being surprised when I learned I was to participate in the formal oral proposal from Price Waterhouse to implement an ERP GL there; I had never before seen the ERP software. In our discussions, we were open about that. Rather, I was to be part of the reporting team, and discussed the approach Rick had written in the white paper about offloading the heavy lifting to another reporting tool.

PW didn’t win the proposal; another consulting company did. But our previous work at the insurer proved we could find the data needed for reporting. The custom GL team was struggling with how to begin. I suggested the best approach was to find the event file within the legacy general ledger. We named the effort after a line in a recently released movie and called it: “Show me the money!”.

Preliminary Analysis

Show me the money means balancing the event file to a known report or source. Transaction files sometimes are not complete; special processes may generate certain partial information, or additional transaction files may be processed periodically.

This insurance company had a ledger system that was imbedded as part of what might be called a home grown thirty-year old ERP system. The system flowed policy and loss level data from policy administration systems into the legacy ledger in a series of files and a network of programs. The system was so integrated, and had required so few changes that real in-depth knowledge of how the system worked was mostly lost. The best system diagrams were large printed paper copies.

Trying to recreate the system knowledge could have required hundreds of hours of tracing through listings of programs. The event based approach, and having a tool capable of some level of ETL like functions, meant this wasn’t needed.

The business users still knew the key reports to look at, and the IT team could trace those reports to the program that generated them. Those reports showed, on a daily basis, the dollars processed in summary by a few key attributes. The IT team identified from the scripting (JCL in this case) what files were read by the program. They then saved one day’s worth of files to use as the event files.

Jim Hladyshewsky, a new project team member, was drafted to define the files to SAFR without having much more than a ten minute introduction to the tool. His willingness to dive into the deep end with only a part-time lifeguard around was remarkable. He created the metadata, and then tested it to make sure it was accurate.

The next step in the process was to create a view that summarized the file. Working with the legacy system program, he looked over the names of the fields on the file and tried to guess which ones were on the report. In a few cases, he had to search the program to confirm if a field ended up on the report.

Using the steps to create views, Jim:

  • Specified view format and level: Summary hardcopy (printable) report to recreate the data on the control report.
  • Selected the event logical record: Fire System journal entry.
  • Specified logical file: All the files collected because we assumed they all went into the legacy process that created the report.
  • Selected output fields: Fields he thought showed up on the report, or at least some portion of them, starting with the highest level summary field and the dollar value.
  • Specified sort order: He selected the highest level field he thought corresponded to the first field on the report, and specified the view should summarize the dollar value by that field.
  • Filtered business event to include or exclude. Jim didn’t specify any filter criteria, thinking perhaps all the records were included in the report.

Jim then ran the Scan Engine and looked at the output. We had taken our first sample of the data.

Refining the Sample

As I remember, nothing matched on the two reports from the initial run. I think the dollars on the SAFR report were significantly overstated. Thus, we began the process of guessing at what might be wrong.

I don’t remember the exact steps Jim did; but from doing this multiple times I can describe his approach. I think Jim and I looked at the output and identified a row with the lowest dollar value. Perhaps we modified the view, adding a column with a constant of 1, sub-totaled to give us a count of the number of records from the event file that made up the dollar value we were seeing. We ran that view and saw that the row with the lowest dollar value also had the lowest record count. Perhaps the output might have looked like the following:

Figure 55. Event File Analysis Sample Output

From this report, if all amounts don’t match the values on the legacy system, Jim might have chosen to analyze the one record for the value 522339999, CC111, 123, since it is only one record to look up. He might then have created another view, this one with the following characteristics:

  • Specify view format and level: a detailed view of the input records read by SAFR. This means that SAFR doesn’t actually change the records it reads. It only applies filter criteria to the event records.
  • Select the event logical record: Fire System journal entry.
  • Specify logical file: All the files collected because we assumed they all went into the legacy process that created the report.
  • Filter business events to include or exclude. Whereas the prior view Jim hadn’t specified any selection or filtering criteria, this time Jim told SAFR that it should only select records with the value we saw on the report for the lowest dollar summary row.

The selection criteria might have looked like this:

Figure 56. Sample Filter Logic Text

Jim then ran the Scan Engine, with both views, producing two outputs but having only passed through the file once; (1) the control report view which we hadn’t changed so the results were exactly the same, and (2) the new view which selected only detailed records for one particular type of journal entry.

Jim then looked at the output file. He might have noted something about that record that told him why it was different than the number on the report. Perhaps a different amount field needed to be used, or two fields needed to be combined in some way.

Another test for another set of records might have selected a score of records. As he scanned the actual amount field, perhaps one record had the correct amount on it that matched the legacy system report. He then compared that record to the other records and noted one field was different on this one record. He then modified the control report view so instead of selecting every record in the file, it only included these records with the value he had just noted. He would have run the view again, and noted this time that one row matched exactly, and other rows in the report were closer in value.

Another useful approach, if faced with dead-ends, is to build a new view with permutations of the sort keys. In other words, instead of sorting by business unit, cost center, account, sort by business unit, account, cost center. These permutations give something similar to a 3D perspective on the data, and can help identify how to reduce the data down to a manageable set to investigate.

And thus began the process of determining what events produced the report. Working with two other team members, who knew no more about the system than he did, in about three weeks he was able to reproduce the report for one day’s data.

Note carefully that this process took about three weeks. If the control report chosen is a trial balance for a business unit, a critical set of business event records will have been identified.

The next step in the process was to collect the files for a month, and run them through the same refined view, refine it a bit more by identifying a special file that was received only at the end of the month, and prove that all the financial events were accounted for. In this way, one piece of the balanced based system—the control reports—were replaced with a very simple event-based system.

Insurance Financial Warehouse Project Results

Proving our ability to understand legacy code demonstrated our ability to do the integration work for the new financial system. These steps, these techniques, were taught to me by Jay. I watched him do these same steps over and over again as we did various benchmarks. Those benchmarks required understanding quickly what an existing process produced, and reproducing it in a more efficient way. It is amazing how consistent these legacy processes are, because processing patterns don’t change all that much. Having learned from Jay, I passed that knowledge onto Jim.

Other feeds had to be discovered for the Auto Company; Wendy Lucas, Fred Horwitz, and a team led the charge on this effort. A similar process had to be followed in reverse to make files for downstream feeds. Scott Penland, Andrea Orth, Michael Shapiro, and others led the charge here. In each case, determining what story the data told was critical to a consistent plot. Lynn Groves Zuccala and Laurie Lesniak helped manage the overall effort.

So having discovered and balanced event files, the next step is to expand the elements available to report, by searching for more detailed files and repeating the balancing process.