As part of the trend towards client server applications, Price Waterhouse caught the wave of ERP system implementations. These, like most major systems projects tend to focus first on getting the data into the system—the operational systems that do data capture. All of the project time gets eaten by defining and capturing business events. Then, when it comes time to produce reports the system has performance problems.
Through the mid 90’s, many SAFR projects were started by following a common trend:
- Get a call from a project that now had accumulated a lot of data but can’t report on it
- Do a benchmark with SAFR to show that using a different approach works
- Build the reporting solution around SAFR.
We have already noted the Global Investment Bank example above which followed this trend. Before talking more about the solutions to these problems, let’s highlight other examples of the tell-tale signs of analytic challenges.
In the fall of 1995 a major US cookie manufacturer installed an ERP system to manage all aspects of its manufacturing, HR and financial operations. The system captured data and provided adequate information for managing most operational aspects of the business. But it could not deliver the detailed sales analysis necessary to support the highly detailed, store-specific sales process.
The reporting requirements were substantial. The company required detailed information at the customer/line item level over various time slices from daily to year-to-date, for over 60,000 customer stores and 2,000 items, selecting and summarizing by over 100 fields. The company sold over a million items per week, and they needed comparative information for each time period for at least the last two years. They also needed comparative data to be reorganized to reflect changes in organizational structure over time.
The team attempted to write a query—one query—against the main reporting table. The query ran for more than 24 hours without completing. This would have solved only one of the hundreds or thousands of permutations that ultimately would be needed. They canceled the query and called Rick.
Rick analyzed the reporting requirements. After the fact he wrote a white paper that summarized the results of the project.
Implementation experience is demonstrating that current [ERP] reporting and data warehousing facilities effectively support most information delivery requirements. In some cases, primarily where substantial data volumes are involved, a reporting architecture that includes additional components is required to meet needs that are outside the scalability bound of baseline [ERP] facilities.
In other words, operational reporting, mostly satisfied by indexed access, works well within the framework of the ERP tool set.
Although the number of high data-basis operational reports tends to be dramatically smaller than the number of low data-basis reports, the data basis and size for these reports tends to be dramatically larger. The experience at A-REAL-Co. clearly bears this out: in the original analysis of high data-basis operational reporting requirements, only 45 sales analysis reports were identified, but the data basis for these reports turned out to be approximately 23 gigabytes per week on average1 (peak loads at end of period are much higher than the average indicates). Examples of these reports are included below:
- FS Sales Force Ranking
- FS Item Customer Pch: SIUs
- FS Item Customer Pch: Cases
- Food Service Ranking – ITEM
- Food Service Ranking – Customer
- Food Service Comparison CASES
- Food Service Comparison $$$
- Food Service Budget – PTD to HYTD
- Item Sales to Budget (Div.)
- Diversified Retail Sales – Final
- SBU Item Sales for Period: (by Sales Org)
- SBU Item Sales for Period: (SBU-Brand-Item) Corp Type
- Item Sales to Budget (Store Door)
- Budget Buster RANKED
- Final Sales BT
- Budget Buster
- Military Account Item
- Account Item – Trend & YTD
- Customer Purchase (Store Detail)
- Gain & Loss, w/ Allow.
- FS Cat. Grouping
As should be evident from the titles, these do not represent “nice to have” reporting requirements, but are blocking and tackling period-to-period comparisons of key volume statistics needed to run the company. The 23 gigabytes also does not include A-REAL-Co.-specific requirements for data reorganization due to material, sales and other account hierarchy changes, legacy systems data integration, replacement of reporting currently outsourced and substantial ad hoc requirements that are anticipated. Unless a detailed analysis of the high data-basis operational reporting requirements is done, it is easy to dismiss these kinds of basic reporting requirements as something that will be handled in background processes when time is available.
Note that Rick isn’t talking about optional reporting, reports that don’t matter if they don’t get done today. These are operational reports that have to be done to run the business every day, and yet they require summarization of a large number of business events—the definition of data basis.
In order to understand whether a given reporting load could be handled in background when time is available, it is necessary to have some basic metrics about how fast this kind of processing can be done in a given environment. Assuming an HP T-500 8-Way world with EMC disk (the A-REAL-Co. configuration), extract processing can be done at a rate of about 6 megabytes/minute (if table joins are required, calculations need to be adjusted to reflect processing at a rate of about 3,000 joins/minute). These numbers are consistent with information from [the ERP vendor] on performance in reporting and general Price Waterhouse experience with UNIX SMP machines running third-party relational data bases. In the A-REAL-Co. case, this would mean approximately 65 hours per week of continuous processing just to complete the initial extract, assuming no table joins were required. Taking into account table joins would mean the basic extract processing would take about 260 hours of continuous single-threaded processing. Further, a general rule of thumb is that reports with a data basis of several hundred thousand records will require several hours to process. Considering that the 23 weekly A-REAL-Co. gigabytes represent about 210 million extract records, something on the order of 2,000 processing hours per week would be a reasonable estimate (albeit a practical impossibility to implement) of the end-to-end processing load, given the A-REAL-Co. environment.2
In other words the work involved in producing these reports would completely consume the servers purchased to do the operational tasks; there was no additional capacity to accumulate the needed business events and produce the reports.
This paper was one of the key elements for following engagements. The methodology and approach it outlined to analyze data basis are discussed in depth in Part 4. The results of the project are described in Define Summary Structures.
In March of 1996, I was asked to go help warn a Fortune 100 PC chip manufacturer about the experience of the cookie maker. They said they didn’t have significant volumes, and so they wouldn’t have a problem. We’ll talk more about this example in Maintain Focus.
US Insurance Provider
After assisting the chip manufacture for a couple of months to solve their problem I was at the front of convincing a Fortune 100 Insurance Company that their data volumes for reporting and allocation processes would likely overwhelm the ERP financial implementation. After two years of work by scores of people trying to get the ERP allocation engine to work, the team constructed a SAFR process that generated all the outputs but at much greater speeds. Programs were developed which generate over 6,000 SAFR processes based upon over 7,000 ERP rules. SAFR executes these views to scan the financial repository selecting records eligible for allocation. It then allocates these costs through four allocation layers, such as products, and geographical units. At 1999 year-end, this process read over 50 million records selecting nearly 3 million that were eligible for allocation. These 3 million records were exploded into over 290 million virtual allocation records, of which 186 million summarized records were written to physical files. The process ran in 7½ hours wall clock time and 28½ hours of CPU time.
In 2009, after 5 years of work and tens of millions of US dollars (if not a hundred million), a major global bank determined that neither they, nor the ERP vendor, could make the ERP GL posting and other financial functions to process much more than 6,000 records a minute. This was substantially below the 140,000 rows a minute published by the vendor in a benchmark for a similar hardware configuration. A major difference is the functionality performed in each test. The customer had chosen to use almost all features of the product simultaneously, including daily accounting periods, large numbers of key values, multi-currency processing, etc. This was in comparison to the SAFR performing the same functions at, on average, a million records a minute.
People find it difficult to believe that major software products produced by the largest software vendors in the world are only capable of these types of results in spite of spending millions of dollars by those implementing them. The root cause is the automation pattern of the tools does not match what the actual computers have to do, and these tools are added layer upon layer resulting in greater and greater inefficiencies. Let’s analyze some of these layers.
In the fall of 1994, I was surprised to learn one day that Doug Kunkel, the key SAFR developer, had added the ability for SAFR to read data base tables instead of just reading sequential files, and he had added the ability to do this two different ways. One way was going through the database facilities, in effect presenting the entire database to SAFR within SQL. The other was by actually reading the file the database stores the data in, called a VSAM file. 3
I was the first to test and use this feature at a benchmark for a large transportation company in November. In that test I saw SAFR using the Direct VSAM method extract 2 million rows from SQL tables in about the same CPU and wall time as a BMC load utility, admittedly at the time the fastest way to get data into and out of a SQL table. The BMC utility also accessed the underlying VSAM files. So SAFR was effectively as fast in resolving a single query as the load utility.4
In 1999 a SAFR customer wanted to compare the difference in accessing data using these methods and the standard approach accessing through SQL. They devised a test, and here are the results:5
Using SQL, the only way anyone every really goes at data in the database, is seven times slower than it could be. So to begin with, the first layer of software, SQL has longer instruction path lengths than other approaches. The lowest level technology is not all that efficient.
Commercialized software is built upon the premise that software does not need to be created from scratch for each customer. However, unlike the PC with the ubiquitous Windows operating system, the server market was not dominated by any one operating system. Software vendors require a large enough potential market to justify the investment in building the software. Thus they need a way to make their product work across the various operating systems used by large businesses. Many hit upon the idea to use SQL6.
SQL was standardized well enough that they could have their packages generate and issue SQL, and then the database vendors had the responsibility for working with the various operating systems. Thus the software vendors did not need to manage small differences between the operating systems.
The result, though, was that another level of abstraction was introduced into the system. The ERP package generated SQL, which the database then used to generate machine code. This approach works just fine for indexed access—those processes at either ends of the architecture.
In other cases—most notably batch processes—the work pattern that is automated by the ERP package has little to do with the work the machine will actually have to do; in fact I suspect in some cases it may be geared towards reducing the work effort of programmers at the software vendors; an easy to read and debug form of language might be used, even though its form has little to do with what the computer actually does.
Certainly there is value to be gained by purchasing the investment in software made by others, even if the investment requires additional computing capacity and larger machines. I remember Rick saying to a customer about this time that on many projects the ERP aspects of the program were critical, particularly for transaction processing. Certainly data that is not captured cannot be used in reporting, and as we’ll see in the next part of this book, consistency in the coding structures used in that data is critical to exposing connections and patterns in reporting. It is, however, critical to recognize the scale limitations of the tools for reporting. The scale, even for the largest organizations in the world, for payroll analysis typically does not exceed the capacity of the tools, or make them too expensive. But for analysis of higher volume transaction processing areas, there are times when choosing a more efficient method, but with more up front costs in development time, will be more appropriate.
The weekly data basis would have been over 20 times the 23 gigabytes if all processing had been done using transaction level data (basic transaction volume is 120,000 order lines per day @ 1,608 bytes per line). Based on the reporting requirements identified, summary file structures were defined that would reduce total data manipulation load necessary to produce the reports. As new reporting requirements emerged during development, summary files were repeatedly modified and regenerated from the transaction detail as necessary to support efficient production processing. It is expected that the summary file structures will continue to be modified and regenerated over time as new reporting requirements evolve.”
4 SAFR (module STGMR95) extracted 2,047,630 from 32 physical partitions in the following time.
IEF373I STEP /STEP65 / START 94314.0706
IEF374I STEP /STEP65 / STOP 94314.0717 CPU 4MIN 49.51SEC SRB 0MIN
17.16SEC VIRT 4036K SYS 664K EXT 22272K SYS 9792K
The BMC Load utility (ADUUMAIN) extracted 2,036,921 rows from the same 32 partitions:
IEF373I STEP /UNLDSTEP/ START 94303.0632
IEF374I STEP /UNLDSTEP/ STOP 94303.0738 CPU 4MIN 12.45SEC SRB 0MIN
14.53SEC VIRT 236K SYS 472K EXT 7420K SYS 8964K
The difference in the records extracted is unexplained, but may be attributable to activity in the database between the runs. Earlier test with small test files showed no differences in the outputs when compared byte for byte. Copies of control reports from November 10, 1994 in possession of author.
Note that the single pass architecture of SAFR means that the incremental time for the second extract of the same data for different criteria performed at the same time is minimal.