Chapter 48. Multi-Threading

I have heard Rick advocate using all available means to solve it. He meant we should not rule any approach out. Certain computing problems require the same approach. High system performance means being able to apply all available computing resources. One of the most perishable computing resources is a CPU cycle. Unused CPU cycles perish with the passing of each nanosecond.

Almost all modern computers have multiple CPUs in them, including many PCs. Such computers are capable of parallel processing. To utilize a box with 10 CPUs at 90% requires that 9 of those processors all be used at the same time. Although I don’t believe the terminology is precise and is somewhat platform and language dependent, for purposes of our discussion we’ll say parallel processing can be either multi-process parallelism or multi-thread parallelism. SAFR used both approaches.

Multi-process Parallelism

The format engine, GVBMR88 and associated programs are executed in a parallel process mode; in other words multiple “meetings” are started and held in separate “rooms.” In mainframe terms, we simply submit separate jobs; and they don’t share memory in any way. The extract engine, GVBMR95 is a multi-threaded parallel processing engine. A thread might be thought of as one of our sub-meeting with its own agenda meeting in the same room as all the other sub-meetings but sharing the white board and binders. One particular “meeting agenda” is being worked by each CPU while the main or initially started program (what Doug calls the Mother Task) waits and watches. The following diagram shows both types of processes.

ExtractProcessParallelism
Figure 129. SAFR Parallelism

The first three steps of the Scan Engine are executed essentially once in serial mode. They produce three outputs, the VDP, Logic Table, and Core Image Reference Data files.

The extract engine takes in these three files. Thus far in describing GVBMR95, we have only executed a single SAFR thread. Thus the view processing has primarily demonstrated the ability to minimize IO; the data was read once to produce all the outputs. CPU use has also been minimized by generating efficient machine code, optimizing lookups.

We also executed only a single Format Phase job, which sorted the extract file containing all the data extracted for all the views which required the format phase. Each row was prefixed by the view ID, so when the sort was complete, all rows for that view were in sorted order together.

However, GVBMR95 can write output to multiple files. In the early days of the system, typically a standard set of extract files were established for a SAFR run and views were assigned to use an extract file in a round robin fashion by the select phase. Thus view 1, 7, 13 would be assigned to extract file one, and 2, 8, 14 to extract file two and so on. The format phase is now used much less frequently because of extract only views, etc., so Format Phase configuration tends to be much more customized.

We could cause each view to write to a different extract file, and multiple Format Phase processes could then be executed: If the views running in the extract phase shown on Figure 118 in Chapter 44 were modified so each would write to its own extract file, then the control report would look like the following:

ParallelFormatPhaseGVBMR95CtlRpt
Figure 130. Parallel Format Phase GVBMR95 Control Report

Note that the DD Name on the end of each view is different. After the extract phase, four format phases would be executed, one to process each extract file.

Multi-process parallelism is not very unusual. Similar to running many meetings simultaneously in separate conference rooms, it is used quite frequently with fairly unsophisticated tools to solve particular problems more quickly.

The down side with multi-process parallelism is that memory cannot be shared between processes. Data must go through the operating system or much more frequently written to disk, to be shared among processes. Thus even within SAFR, the core image reference data used in creating sort titles in GVBMR88 must be loaded into each Format Phase. Thus multi-process parallelism allows application of more CPU resources to a problem, but isn’t the ultimate in efficiency.

Partitioned Event Files

Suppose we make one change to the last set of views we ran in our example, and we split the input journal entries into two files instead of one file. We place all the journal entries for one legal entity (family) in one file and the others in another file. Thus the data in both files has not changed, and if we concatenated the files together (in other words, told the operating system to present both files, one right after the other, to the program as if they were one file) the views would process without any other adjustments.

No change is required to the views to cause GVBMR95 to perform multi-threaded parallel processing. Views read Logical Record/Logical File combinations. The only change required is to define another Physical File under the existing Logical File in the SAFR metadata.

If this is done, GVBMR95 generates machine code for each file. After generating the machine code, GVBMR95, the mother task, instructs the operating system to execute these two programs and tell it when they have finished. The GVBMR95 control report for this execution is shown below.

ParallelProcessingGVBMR95CtlRpt
Figure 131. Parallel Processing GVBMR95 Control Report

The first line shows how many threads have been executed in parallel. Note now that the results of each view are shown by physical event file, including lookups found, not found, records written (including a header record for each physical file read per view) and which extract file the data was written to.

Making this one change means we may allow the operating system to process our data twice as fast, if it assigns our threads to different CPUs at the same time, and both CPU’s perform the same amount of work. This one change might cut the time of the process in half.1


 

 

The two files above both have the same kinds of records in them; the resulting generated programs to read each of the files would be nearly byte for byte the same. This is because we simply defined a new physical file under the logical file. The views are exactly the same. But multi-threading isn’t limited to reading multiple files containing record described by the same LR. In Find More Detailed Events, when discussing sort/merge processes, we described a multiple source view. A multiple source view can read different logical files, defined by different LRs, and combine the data into a single output, particularly when the two LRs include a common set of fields for sorting and summarizing. The two files may be read in parallel in GVBMR95.

 

Thus partitions can be assigned for various reasons. They may have no meaning, and are simply to increase parallelism like the example above. The partitions might have meaning as well, such as containing event data from history or containing a different record type. The generated code in each thread may be unique, because some views might read two different logical files, while other views in a thread do not. Each thread is effectively a custom developed program for extracting data from that event file.

 

 


Parallelism can get out of hand though, as shown in the next chapter. Next let’s turn to understanding how to control performance, and really get the most out of parallelism when needed.

1 Note that in the example given, the time would be nowhere near half because the thread processing is almost instantaneous due to the small size of the event files. The start up time for reading the logic table, VDP and reference data files, code generation, and the shut down time of closing extract files and printing the control report is always single threaded.