This week’s episode of Conversations with Kip concludes our discussion of Apache Spark and a Data Supply Chain. In the first, we discussed data input choices; the second discussed the consolidation of data after a potential shuffle of the data needed. This week we discussed the steps of a Consolidated Data Supply Chain, wherein business events often lead to creation of other transactions which must also be applied to balances to perform our analytical process.

There is a fundamental difference between the nature of transactions–better called Business Events–and balances. Balances accumulate transactions, to give a position as of a point in time. They also make computer systems more efficient by eliminating the data volumes from transactions.

Data supply chains create and update balances. Balances can then be used to generate new transactions, like interest payable, or multi-currency effects. These new transactions need to update other balances.

Can Apache Spark be used to perform all these steps efficiently, instead of the much more simple process of scanning transactions or balances to produce analytical outputs? That is yet to be demonstrated. Watch the 97th episode of Conversations with Kip here, the best financial system vlog there is.

Next in the Series Apache Spark Parallelism and Financial Analytics