In my training I learned how to write a batch reporting program in COBOL, a ubiquitous language for such processes. I believe we may have even printed the report on something called green bar paper. The paper contained shaded light green bars to help trace information across the rows. The principles behind these reports are still valid today.
Reporting is a process of classifying attributes. Some of the earliest arithmetic children are taught is how to classify objects. Very young children begin by making piles, even though those piles may not be intuitive to an adult. One pile might include “things that have some shiny part to them” and thus a toy car with a shiny plastic window and a mirror are combined in the same pile. Over time and with maturity, commonly understood and measured characteristics are used, such as size, shape and color.
Making piles reduces the number of objects the child has to perceive at any one time. If asked what colored objects exist—a request for a report—the child can name the piles. To the question of how many big objects exist, the count of objects in one pile is the response. The child understands the nature of the objects more easily by classifying them.
Later, we use more complex schemes by making enough piles for the combinations of all of the possible values for all of the possible groups. For example, if we wanted to classify objects by shape (round, long, or flat), size (large, medium, or small), and color (red, yellow, or blue), we would make 27 piles; objects that are flat, red, and large in one pile, and objects that are yellow, medium sized, and round in another.
The different characteristics of physical objects, color, size, shape, might be called attributes by IT. In business systems, IT assigns attributes to many things, including customers, employees, even IT systems. Common accounting attributes include business unit, cost or profit center, and account. We use these groups to decide what the business events mean, what action should be taken because of them, or what relationship they should have to other events.
Because business events aren’t “real” objects, they don’t have a natural set of characteristics. What are legal entities, accounts and cost centers? They aren’t some naturally occurring thing, but something we make up and assign to business events to “plan, execute, control, or evaluate” them.1 A “legal entity” might be a corporation—a made up person that can own property, sue, and do other things real people can do before the law. A “cost center” is a group of people, often with someone in charge and accountable for a budget. And perhaps account means what everyone agrees the business event means as interpreted by the accounting standards. Our example financial and car maintenance system had planned about 25 different things—attributes—we wanted to track.
Of course, comprehending nine piles of things is possible but nine hundred thousands piles is not. Because the piles are the product of possible values, the numbers grow quickly. Quickly finding three piles containing yellow objects (large, medium and small) and adding the numbers together to determine the number of yellow objects is possible, although more tedious than having all the yellow in one place. Dealing with larger numbers of piles becomes impractical.
Today’s businesses of course have millions, and in some cases billions of events daily. Understanding the events themselves has long been impossible. Major businesses are run today by keeping track of various piles.
And every new question someone asks—the need for a report—has the potential to require a new pile.
Answers in a Single Row
The reports I created in my training class, and most of those still produced today, are usually composed of rows of information with numbers on the right hand side, and the descriptions of those numbers to the left. Most spreadsheets still look much like this. Here is something that looks much like the reports I would have created in my class.
Reports are useful when they contain answers to questions. Sometimes the answers come from a single row on a report; how many flat red large objects there are, or how much were expenses for the sales department of XYZ Company for last year.
These hard copy reports contained a great deal of information. Because the sort keys, those values on the left of the report, were in sorted order, one can search through them in order and quickly find the legal entity, cost center, or account total they are interested in. The report, in a certain sense, contains a number of different reports within it: account total by cost center and legal entity, cost center totals by legal entity, and legal entity totals. For each of these it contains the debits, the credits, and the grand total. In more recent IT trends, these three things may have ended up being three or even nine different reports or database tables.
But the end of the reporting process might still result in the need for one row of information.
This need to find one row of information is sort of ironic since the beginning of the subsystem architecture begins with data, captured as business events, often being recorded on one row as data. Thus the beginning of the basic business system starts with a single row, and it ends with a single row as well.
Although single rows of data are needed at both ends of the process, the two rows are very different things. There is work that must be performed in the middle to create the reporting record. This is because the row at the beginning most often represents a single business event; a deposit, a sale, a receipt of inventory, a payroll payment. But the row at the back of the system represents an accumulation of many business events; total deposits, total revenue, total inventory received, total payroll expense. Somewhere between the beginning and the end of the system, these business events must be accumulated before the row that answers the question can be displayed.
And note there is, by definition, a passage of time between the business event and the reporting record, because the reporting record is an accumulation of business events. This accumulation can be done one record at a time, as each business event occurs, or it can be done after batches of business events have occurred.
The Posting Process
Of course this accumulation is what the posting process does. Historically, it did this at the end of the day, after the batch of business events had been received. The structure of these programs has probably changed very little since standardized in structured programming not long after the first ones were written. The program reads two files, and writes one file. The basic structure is
The parts of the program are as follows:
- Perform initialization. This section starts the program and gets ready for processing. It also usually reads the first record from the journal entry file, and the first record from the ledger file.
- Process records. This process compares the account number on the journal entry with the ledger record. Obviously a comparison has three possible results, less than, equal, or greater than.
- If the ledger account is lower, then updates to the ledger account are complete. The program writes the ledger balance for this account to Today’s Ledger Balance file. The program then reads the next ledger record from Yesterday’s Ledger Balance file and goes back to step 2.
- If they match, the program updates the Ledger Balance record with the amount from the journal entry. It does not write the new record to Today’s Ledger Balances file yet, but rather reads a new journal entry record, and returns to step 2.
- If the ledger account is high, then the journal entry is for a brand new account that we don’t have on yesterday’s file. So the program creates a new Ledger Balance record and adds the Journal entry amount to it. Again, it does not write the new record, but rather reads a new journal entry record and returns to step 2.
Logic has to be added to force the program to continue until the end of each file is reached. In other words, it has to continue to copy all the ledger balances out to the new file even after it reaches the end of Journal File, or ensure that all new accounts created in the Journal File are created on Today’s Ledger file.
- Perform termination. Having read the last input record from both files, this section prints out a control report and the program cleans up after itself and stops processing.2
After the update program, a report using the updated balances was produced by another program. It simply read the new Ledger Balances, and created a report of the new balances. Because it only read the ledger balances, only the attributes that were part of that record were available for reporting; balances by other attributes on the journal entries could not be reported.
This posting process of the subsystem architecture defines what answers can be provided at the time it is created; whatever attributes are assigned to the summary file (effectively the ledger in our accounting example) are the only things that can be used in reporting. Other attributes are immediately dropped and thus unavailable for use in answering any other questions. In effect the piles are predefined at the time the system is started and the system can create no others.
So alternatively, what if the posting process was defined to not drop any attributes? This could be done, usually with significant performance implications because almost no summarization would happen. Each business event is almost unique simply because it has recorded on it the date and time the business event happened. This would also result in the posting processes answering almost no questions; it simply delays the problem of accumulating the rows to get to the answer to another place in the system. Ultimately these rows have to be accumulated to present the single row answer to a question.3
So the fundamental question is where in the system architecture to do the accumulation of business events to reporting answers? That depends on which approach is taken to the reporting problem.