Chapter 42. Reference File Phase

Another of my early memories with Doug was one day during the large retailer project when Doug happened to be in the office in Sacramento. I am sure he was under a great deal of pressure to add features to SAFR to support the project. I went into his office and somehow complained about the workload Randall and I were carrying trying to finish up the software on our own. He looked up and said very quickly something to the effect of, “Life is pretty hard for everyone.” I understood what he meant immediately. Stop feeling sorry for yourself and get back to work. So I did.

Core Image

There are always resource constraints. The concept of a “core image” file used for in memory processes has been around for decades and was developed to get around computer memory constraints.

Computer programmers have always known that memory access is much faster than disk access, but memory is much more expensive than disk. So probably to solve some problem whereby a lot of data (for the day) had to fit into memory for rapid access, someone suggested that those parts of data known to not be needed be removed from the data in a preprocess before the data is actually loaded into memory for use.

SAFR performs this process in the reference file phase process. The reference file phase analyzes the logic table to determine all joins that will be performed, what the keys to the reference tables are, what effective date keyword constants need to be translated, and the resulting data fields on each table that will be use. It then reads the actual reference data and creates a copy that can be loaded in memory which is as small as possible. It also builds a control file and control report that tells the extract phase what memory will be required to load the data.

Logic Table Phase

The reference file process begins actually in the Logic Table phase. Think back to the extract only view we created above. Remember that it read the event file, and extracted only those columns of data that were required into the extract file. When any of the views need to perform joins, the Logic Table program dynamically creates similar views.

First, after the VDP Builder selects views that are required for this run, the Logic Table program creates a logic table we have been analyzing, called the XLT for Extract Logic Table, for these views.

The Logic table program then reads this XLT. Every time it sees a logical record that is not the event file, it knows a join is needed.

The Logic table program then creates a JLT, or Join Logic Table. For each new LR, it creates an extract only view, which is to read the look-up LR as an event file. The Logic Table program looks at the logical record definition to determine what the keys are to the target LR. For each key field, it creates a column, a DTE, that will extract that key data and place it in the reference file phase extract file for that reference file. It puts the start date field if the LR is effective dated as the first column after the key.

It also keeps track of all the fields that are required as data, or answer fields by the views from that reference file, for example, the customer name or an account description. In other words, each “L” field in DTL or CFLC in view 3263 would be another column in this generated extract only view. In the JLT each of these fields are DTEs.

Reference File Phase

The same program that executes the XLT is used to execute the JLT, GVBMR95. The event files for this phase are the reference files. There are no lookups performed in this phase of processes, each column is a DTE. When the JLT is executed, the outputs from these views are a set of files known as the RED, Reference Extract Data. These are the core image files.

Here is an example of the RED for the Account Title Master reference file.

SampleREDFile
Figure 107. Sample RED File
The RED is prefixed by 16 bytes (four full words) of data that are used in the Extract Phase after the data is loaded into memory. So the key begins in position 17. On this reference file, the key is 3 bytes in length. The values used in CFLC or DTL functions begin at position 20, the “checking account” on row 1.

 

The reference file phase also builds a view to produce a control record for each reference file. This view only puts out one record per reference file LR. Each of these views is also in the same JLT; they read the same event file, just like views 3261 and 3262. This view accumulates the number of reference file records that have been written to the RED. The Logic table contains a counter, including a DIM4, to define a four byte binary field, an ADDC to add a constant of 1 to the counter for each reference file record read, and a SETC to set the accumulator to zero before the next file is processed. The extract file containing these records is called the REH or Reference Extract Header records.

A small program produces a control report from these records to show what the memory utilization will be in the extract phase for loading the reference files.

REHReport
Figure 108. REH Report
The following is the GVBMR95 control report for the reference file phase.

 

 

RefFilePhaseGVBMR95ControlRpt
Figure 109. Reference File Phase GVBMR95 Control Report
It shows that the program read three different input files, the ACCTIT, CCDESC and the LEDESC files. Two views executed against each for a total of six views. It read 12 records and wrote 15: 12 reference file records and 3 REH records.1

Custom Processes

The format of the RED and REH files are open formats. Other processes could be used to build these files and present them to the extract engine. In fact, in some instances with voluminous reference files, specific views could be built and maintained that output these record structures. Unlike the standard SAFR processes where every record in the reference file will be written to the output RED (albeit likely shorter with fewer fields and the key and effective date at the front), these views could include selection logic, or join together data in preprocesses before executing the main SAFR extract process.

Additionally, Doug created a small subroutine, GVBUR45, containing the SAFR lookup algorithm. This routine can be called from any program to perform lookups. It accepts an REH file, and multiple associated REDs. It must be called once to initialize the process, and once to clean up at the conclusion. The parameters passed to it during mainline processing look much like the values passed by a join and the LK functions. In this way, the SAFR lookup capabilities can be used outside of SAFR.

Our next step in the process will be to understand the Extract Files.


1 Note that the reference file phase is executed in single thread mode, noted by the number of parallel threads executed message. This is because the REH views must not write their data until after all RED record have been written. The REH views contain a counter of the RED records, and must wait to be executed after the RED views are complete. This can most easily be accomplished by running in single threaded mode.