Chapter 43. Extract Files

I remember one day when Doug was in the office in Sacramento asking him a question, and him needing to go look at the source code. So we walked into his office and he booted up his PC. At the time, before Windows 95, our PCs had Windows 3.1 which ran on top of DOS. I was a bit surprised after the machine completely booted up with the Windows graphical interface to watch Doug go to the top of the screen and close the graphical user interface. He wanted to get back to the DOS prompt so he would be able to work. This fact speaks a great deal about removing layers between him and the machine.

The Format phase program is a generalized control break engine, a very different kind of program than the extract program. Control break logic is what performs a “group by” function in a database, or a “sum fields” function in a sort utility. It collapses a set of records with the same key together to produce a subtotal of some kind, and then performs the same function on the next set of rows.

The Extract Record was created to support this control break logic. It is vital to understand it’s structure before understanding the remaining logic table functions, and the remaining processes of the SAFR Scan Engine.

Extract Record Overview

I have created view 3264 and specified that it use the Format Phase. This changes some of the function codes that are used in the extract phase.

The extract records are written to Extract Files, temporary files between the Extract Phase and the Format Phase. The Extract Record has the following general layout.

Figure 110. Extract Record Structure
  • The control area contains values that describe the rest of the record, like the number of SAFR view columns on this record, and the length of the sort key. The last field in the control area is the view number. We’ll explain why later.
  • The Sort Key or SK area contains the values the user has specified the output file should be sorted and/or grouped by.
  • The Data Transform or DT area contains alphabetic and alphanumeric and numeric column data that is not used in a format time calculation, including sub-totaling, or format time selection logic.
  • The Calculated Transform or CT area contains column data that is either used in a format time calculation, including sub-totaling, or format time selection logic.

With this understanding, let’s look at the logic table for a view requiring the format phase.

Logic Table Phase

Figure 111. Format Phase View Logic Table
This logic table has an SKL function, a build sort key from looked-up values. It also has a SKC, build sort key from constant in case the lookup fails. It also has a CTE, a calculated transform from an event file field. It also has a WRXT, a write the extract record, rather than the WRIN for the input record, or WRDT for just the DT section of the extract record.

The following graph shows these additional functions.1

Figure 112. Format Phase Logic Table Functions

Extract Phase

The following is the Extract Phase Logic Table Trace for view 3264 for the first event file record.

Figure 113. Format Phase View Logic Table Trace
CT data in the extract file are stored in packed format in order to save CPU cycles converting the data into and out of other formats. z/OS, the mainframe operating system, performs numeric calculations most efficiently on data in packed formats. Since all CT data is used in a calculation of some kind, it is converted into packed format as soon as possible. To see the CT values in the trace, you must display the output in hex mode to read the packed numbers.

The following is the first extract record for this view shown in hex format:

Figure 114. Sample Extract Record
The following is the GVBMR95 control report for all the views we have constructed so far, run in one pass.
Figure 115. Extract Record Example GVBMR95 Control Report
View 3264 extracted 13 of the 18 event records read. But any view sent to the Format phase also receives a Header Record for each thread (each input file) processed, with control information about that view. Thus the total record count of 14 records, written to EXTR001 DD Name includes one header record from reading a single input file.

Only one view wrote to EXTR001. But each extract file also receives a Control Record which has a count of all the records written to the extract file. This record is the last record written to the extract file. Thus the total records written to the extract file is 15.2

At the conclusion of the Extract Phase, the following extract records are contained in the file, plus the header and control records.

Figure 116. Sample Extract File
Note that these extracted records are in random order. All are for K & N jones family, because our view only selected those records. But the account name, the second sort key is not in alphabetical order, and the rows with the same sort keys have not been collapsed, as shown in what will be the output from the format phase below.
Figure 117. Sample Format Phase Output
And that brings us to the next step in the process, the need to sort the extract file.

1 Additional class tests are also shown. CNE is Class Test Numeric in event record, CHE is Class Test High-values (hex Fs) in event record, CXE is Class Test Null (hex zeros) in event record, and CSE is Class Test Spaces (hex 40)in event record.
2 Also, note that view 3264 performed no lookups, although the logic table contained Join functions. This is because these same joins were performed in view 3263, and thus did not need to be repeated for view 3264, in this case saving the CPU time for 18 joins. This is join optimization at work.