Chapter 50. Piping, Tokens, and the Write Verb

I noted earlier in Optimize For Performance how the concept of piping came to be. It is useful when data needs to be passed from one or more views to one or more additional views. Tokens are another method to accomplish this function for special purposes. The Write statement is often used in conjunction with piping or other advanced SAFR processes.

Piping

Piping isn’t too difficult a concept. Without piping, one or more views in a SAFR thread read an event file and write to an extract file. Another execution of GVBMR95, often called another pass of the data, allows one or more additional views to read that extracted data. By defining the output extract file from the first set of views as a pipe, and then reading that file as an input to the next set of views allows SAFR to connect the two processes and run them in one GVBMR95 execution.

Doug accomplished this quite quickly. When a program requires data from disk, it calls an operating system module, called a system service, access method, or something in the PC world akin to a DLL, to get that data from disk. This program on a mainframe talks to a channel. So reading data is really just a subroutine call to another program.

When piping, SAFR simply calls a very simple little module that keeps track of the reading status and the writing status between the two threads. When the pipe “reader” thread starts up, it calls the little program to tell it to get data. The module knows that the pipe “writer” thread has to execute to fill up the buffer, so it simply says, “OK, I’ll be back in a bit with the data.” The pipe reader gets swapped off the CPU waiting for data, and as the pipe writer executes.

When the pipe writer has filled an output buffer and it is ready to be written to disk, instead of SAFR calling the BSAM access method to write the data to disk, it calls this same little module and tells it to write the data. The module says, “OK, be back in a minute” and the pipe writer is swapped off the CPU. This module then posts to the pipe reader that it has data for it to read. The pipe reader wakes up and begins processing the data.

This little dance continues until the pipe writer says to write the last buffer, and the pipe reader has read what was written. There is a small degree of parallelism occurring because of the overlapped IO, whereby the pipe reader is reading buffers that are different from the buffers being written by the pipe writer. But this parallelism is only as long as it takes to process one buffer of data in either of the threads. The little module in the middle is said to be emulating an access method.

The pipe reader views must use an LR that matches the output from the pipe writer views to interpret the data, just the same as if they were reading a physical file. This often means that the views writing to the pipe are extract only views, only writing the DT fields to the pipe, but this isn’t a requirement. Because extract phase summarization accumulates values in the CT columns, it isn’t very common to use it in the midst of piping processes. Rather, typically the last set of views reading the last piped data are “reporting” type views which would use extract phase summarization.

One purpose for piping is to consolidate logic that would need to be applied in many views in one view, the pipe writer. The pipe reader views becomes like a sub or called module that only acts on the output of the first view, not the original input data. This simply improves the maintainability of the views so logic isn’t replicated in multiple places; it doesn’t necessarily improve performance.

Note that if more than one view is writing to the extract file, the record count in the extract file will likely not equal the record count in the original event file. Selection criteria, filtering out certain event records, will certainly change it. Alternatively, if the same event record is selected and written by the more than one pipe writer view, the same event record will be duplicated in the pipe. Thus one input event record can become multiple output records.

Piped Allocation Process Example

This fact was used to allocate event records to lower levels of detail for the insurance company. Programs were created that generated SAFR views which wrote to pipes. Thus an expense record for, say, the CEO of the company, could be turned into many new records each ultimately representing the portion of the CEO’s salary attributable to an insurance policy, an extreme example. This would be done by allocating to lower and lower levels of detail in successive pipes.

For example, the first pipe writer views would have a column with the division number in it. There would be one view per division. They would each select the CEO’s salary record, and multiply it by the value of that division to the total divisions. These records would be written to a pipe. In the next to last pipe, each insurance policy could have a column with the insurance policy number as a column, and have another column with a calculation multiplying the allocated salary records for the division by the proportion of that insurance policy’s value to the total of all insurance policy values.

The last pipe reader thread would have reporting views within it. These would almost all use extract phase summarization because no one really cared how much of the CEO’s salary was attributable to an individual policy. But they were interested in accumulating the total cost per insurance policy in different ways. These different summaries were the only things written to disk.

Thus there is a huge record explosion occurring in memory, but never going to disk, as one input event record is turned into thousands of allocated records, that are then collapsed into scores of summarized output records. It was quite a remarkable design.

The following is a sample GVBMR95 control report showing view 3459 writing to a pipe, and view 3458 reading from the pipe. This is done by looking for linkages between DD Names and numbers of records written and read for threads. View 3459 writes to the DD Name F0003459. This is the file handle for the output side. At the bottom of the control report, we can see that the total records written to this DD Name, 18, are written to a pipe. The total records written are the same as the total records read in the top of the report from the pipe DD Name, EVNTPIPE, the file handle of the reading thread. In the middle of the report, View 3458 is shown reading from EVNTPIPE DD Name.

GVBMR95PipingControlReport
Figure 135. GVBMR95 Piping Control Report

The following is a portion of the trace output showing the end of the pipe writer thread, which contained view 3459, and the beginning of the pipe reader view 3458. Note the WRDT function in view 3459, an extract only view, writing records to the pipe.

SamplePipeTraceOutput
Figure 135. Sample Pipe Trace Output

Tokens

All threads run asynchronously. That means when they will and won’t be executing is unpredictable. There are certain problems where coordination between the views or, more often, between the pipe reader views and other data in the original event file is needed (see next two chapters). Piping can’t be used for this. Tokens can be used to solve this problem. They are not often used, but we will describe how they work.

Tokens can be thought of as a working storage or temporary variables. The token writing view creates the variable that can be used by other views within the same thread. Unlike piping, which passes data between threads, tokens only are available within threads.

Similar to piping, there are token writer views, and token user views. Let’s deal with a single token writer view first. The token writer operates upon event data like any other view, but its output is a token. A token record is simply written to memory. The output must match an LR that will be used to interpret the record similar to a pipe output. Any view that writes a token is moved to the top of the thread, so its output can be used by any other view in that thread.

Similar to piping, the token reader views can read the token output instead of the regular event file. This is accomplished by a little slight of hand in the thread. GVBMR95 simply changes the address of the event record to point to the single token record (always a single record) written. Thus there will never be more tokens written in a thread than there are event file records. Unlike pipes, tokens do not accomplish the data explosion needed for an allocation process. If selection criteria caused the token writer to not write a record, then the token reader will read one less record from the event file.

In addition to reading a token, the token can also be used as a look-up. Thus a view might still read the event file record as its input, but form a key of some kind to “join” to the token. This isn’t a true lookup in the sense of searching for many records to find the one with a matching key. There is always only one record available—the record written as a token. If no record is written as a token, then the lookup is treated as not found.

There can actually be multiple token writers, even though there is only one token record in a thread. All token writers are moved to the front of the thread. They are executed in order by their view ID. The last token writer to write a record wins; that record will be used by all token using views.

Thus logic can be consolidated in token views, similar to the purpose of piping. Any token user views can also be aware of all the other aspects of the thread. Tokens are most often used in conjunction with user exits, discussed in the next chapter, which have visibility to the aspects external to the thread. Tokens, though, accomplish no parallelism. The token writers and token readers execute serially.

In the following example, view 3460 is in the pipe reader thread. It performs a number of look-ups to create a single record that is pre-joined by the pipe writer view. This might be done because it is expected that doing all the joins up front in the thread will make the logic in the using views easier to understand. View 3461 looks up to the token, and view 3462 reads the token, both in the same pipe reader thread.

GVBMR95TokenControlReport
Figure 137. GVBMR95 Token Control Report

The following is a sample of the trace for these two views, with descriptions of the functions in the logic table for each.

TokenTraceOutput
Figure 138. Token Trace Output

Write Verb

The write verb allows a SAFR developer a finer level of control over when and where an extract record is written.1 In all the examples thus far, the views have an implicit Write statement, telling GVBMR95 to write the extract record after it has built the last column of the view and tested for extract summarization. Where the extract record was written is controlled by the View Properties. For extract only views, the default DD Name is “Fnnnnnnn” where nnnnnnn is the view number padded with 0’s to the left. Views can also be directed to write to a specific physical file in the SAFR metadata.

Suppose a single view may be able to satisfy multiple needs even though the output records are exactly the same. Two or more outputs may share many of the same output fields, but one requires another column or two. Inserting a WRITE function after the first record has been built causes GVBMR95 to write that record. Then additional columns can be added and another write verb inserted to write all the data from the first record plus the additional columns.

Thus the write verb allows multiple copies of the same extracted record for different purposes. One company actually makes on-site and off-sight backups in the midst of the business function SAFR process because it eliminates two additional IOs of the repository. It can also control the target file based upon a condition, or splitting one input record into many.

Another common use is in partitioning the output to continue parallelism through SAFR processes. Remember that the standard SAFR configuration is that the same view in multiple threads writes to the same extract file. When using piping or writing the output to the repository, it isn’t desirable to collapse them into single partitions on output. No parallelism will be achieved in subsequent processes.

The write verb also allows conditional calls to a write exit, discussed in the next chapter. This allows for write exit parameters to be changed based on column logic. Another less used reason is to break up a multiple occurring segment of a record into individual records. This would be similar to a record containing an “occurs” clause in COBOL.

In the following example, view 3463 is a copy input view reading both the EVENT1 and EVENT2 physical files. It contained the following logic text:

If {LEGAL_ENTITY} =  522349731 Then
	WRITE(SOURCE=INPUT,DESTINATION=EXTRACT=02)
Else 
	WRITE(SOURCE=INPUT,DESTINATION=EXTRACT=03)
EndIf

The file associated with DD Name DEST002 contains all the data for one legal entity and DEST003 contains all the data for the other legal entity. Note that because our input event files were already partitioned this way, the output files will be the same as the input files.

Let’s suppose that these two files are the weekly files, and we have a third input file that contains the new records for the day, a mixture of legal entities. If our view also read this file, but only writes the two new versions of the weekly file, we have read the daily and weekly files, other views could have produced outputs from all of this data, while performed the weekly master file update process. This is a common approach and use of the write verb to update the master file, while maintaining partitioning.

The control report shows the results of using the write verb in view 3463.

GVBMR95WriteVerbControlReport
Figure 139. GVBMR95 Write Verb Control Report

Note that view 3463 now writes to two different output files.


Next we’ll examine what to do if SAFR native capabilities are not up to solving a specific problem.

1 I am indebted to Stephen Frobish and Andrea Orth for portions of this material.