Chapter 34. Consider Complex Joins

Over the years, join processing in SAFR has been enhanced to perform more and more functions. Understanding these capabilities is important to creating a SAFR solution.

Although the primary focus of our reporting is business event, the reference files used in reporting have to be maintained as well; and these changes to reference data are business events. For example, employees are promoted, addresses change, departments are reorganized, and customer accounts are closed.

Date Effective Joins

These changes in reference files do have tremendous impact on how the business events are reported. Since its earliest days SAFR has had a unique capability to perform date effective joins.

Sometimes when this term is used it means having a date be part of a key that is matched byte for byte against another record which has the exact same date in it. This means that business events can be combined that occur on the date of the reference data change, but business events on other days cannot without updating either the business event or the reference data. Alternatively, the requirement is sometimes met by having an additional field that marks which reference file record is currently “active”, and a constant is used to force the join to that particular record. SAFR can use either of these approaches, but its native date effective join capabilities are more sophisticated.

To trigger SAFR date effective joins, one needs to mark the target reference file LR—the LR being looked up to—as a date effective LR. To do so, one field on the reference file can be selected as a start date. This field is technically not part of the key in defining a join in the SAFR metadata; no value has to be passed to this field in the join.

However, this field must be present on the reference file and must be the last field in the sort order of the reference file. The value in the effective date field can be used to make reference file records unique. In other words, duplicates may exist in the file for all other fields if the start date field is different. This is the only exception to the SAFR rule of no duplicates in join processing.

Which effective date to use in a join—what specific value to search for in that last position—can be passed to a join in a number of ways. The syntax for the join statement used in Logic Text allows for specifying what effective date the report is as of. A value in a date field on the event record can be passed, a hard coded date inside quotation marks in the Join statement can be used, or if nothing is specified, the RUNDATE, a parameter passed to the Scan Engine as the effective date of that run can be used1.

SAFR does not require an exact match for effective date. As it performs its search, the last step in the process is to test if the value passed as the effective date is less than the date on the record to be returned. If so, it returns that record. If not, it tests the record before that to ensure the key is the same. If it is not, it returns a not found condition.

The reference file can also contain an End Date field. If two records have the same key except for the start date and no end date field is specified, the start date of the later record is effectively the end date of the earlier record. If an end date is specified, SAFR tests to ensure that the effective date supplied is between the start and end date of the record located. If it is not, then a not found condition is returned to the view. This creates the ability to make gaps between when rows are effective.

Date Effective Reporting

These types of date effective joins allow very sophisticated reports. The cookie manufacturer maintained the sales, material, and customer hierarchies as date effective files whereby any changes were captured as new rows with an effective start date. They would test potential reorganizations by adding contemplated changes as rows to the table, and run reports with future effective dates at the same time other reports were being run for the current date. They could then see the impact of the reorganizations before deciding to make them permanent.

This ability presents two different reporting options:

  • Recast: Joined attribute changes result in the entire historical events being associated with the latest attributes. Report totals are not able to provide views of the data in different attributes. This option would typically use either the current (RUNDATE) or hard coded date option. Thus if the reference data contains my home state and I move to a new state, using current date option all of my amounts for all time are shown under the new state. Alternatively, the view could use a hard coded effective date of when I was in the prior state, and all balances, including today’s would be shown as if I were in the prior state.
  • Switch: Business events are joined to the attributes which were applicable at the time the business event occurred. This option would typically use a field containing a date on the event record as the effective date, typically the date when the business event occurred or balance was created. Using this option a portion of my balance would be shown in two places, under both states depending on the date of the business event field. One would need to add the two categories together to equal the amounts shown under one of the states in the recast method.

Another approach that allows generation of both switch and recast type reports requires that events be generated when attributes in reference files change. This is called reclass processing. Join attribute changes result in explicit events being generated by the system to move the sum of the business events (historically) from older set of attributes to the latest set of attributes. Thus report totals can provide historical views of data in different attributes (like switch) and in addition the summation of the events at any point in time will include the events that moved the sum of business events from one attribute to another. This approach tends to be more complete and accurate from an accounting perspective. See Reclassification for additional details.

Reference File Reports

Reports against standard types of reference files themselves are possible, but not tremendously common. Reference files tend to be less voluminous than event files, and the number of reports produced is significantly less. Typical reference files also usually do not contain amounts which can be meaningfully summarized. The one exception is that counts of records of a particular type are sometimes produced, like how many employees are at grade 5. Although there aren’t many reports of such a size and consistent frequency that demand a solution like SAFR, there is nothing about SAFR which prevents it from reading reference files to produce reports. However, SAFR will read the data twice if the file is used as an event file and as a reference file.

At times the simple idea, of reference files loaded into memory to do lookups with event files having all the volume, breaks down. In these cases the reference files, usually filled with the static data describing customers and arrangements, etc., become very large, just slightly smaller than the event files. Because of their size, SAFR is an effective means to report against them. However, having to read the file twice, once for use as a reference file and once as an event file is anathema to SAFR performance principles. Building an efficient solution requires careful modeling of the repository.


Next:  Chapter 35. Model the Repository

Previous:  Chapter 33. Define Processes

Parent Topic:  Part 4. The Projects

Table of Contents

1 Note that RUNDATE is not the same as system date. It is important to be able to set the date for which the reports should be produced as of in cases of reruns on a different day than when they should have been produced.