10 May 2007

22938 views

39 0

A Primer on Managing Data Bitemporally

In systems that require, for auditing purposes, advanced logging and reproducibility of reports between runs, a straightforward update, insert, or delete may be counter-productive. In such circumstances, a bitemporal model is necessary. Adam Machanic explains how it works.

Adapted from Expert SQL Server 2005 Development (Apress, 2007), Chapter 10, “Working with Temporal Data”

A central truth that we need to embrace to be successful as database developers is that not all data is as great as it could be (or as we might wish it to be). Sometimes, we’re forced to work with incomplete or incorrect data, and correct things later as a more complete picture of reality becomes available.

Modifying data in the database is simple enough – a call to a DML statement and the work is done. But, in systems that require, for auditing purposes, advanced logging and reproducibility of reports between runs, a straightforward update, insert, or delete may be counter-productive. Doing such a data modification can destroy the possibility of re-creating the same output on consecutive runs of the same query.

To get around doing a simple update in the case of invalid data, some systems use the idea of offset transactions. An offset transaction uses the additive nature of summarization logic to fix the data in place. For instance, consider a financial reporting system with a table that describes customer transactions. The system should see a particular transaction as doing $1500.00 worth of business, but a data entry clerk accidentally keys it in as $1600.00. By later adding a ($100.00) transaction to the table, aggregation would result in the correct output.

Following is a highly simplified representation of what such a table might look like:

CREATE TABLE Transactions
(
   TransactionId INT,
   Customer VARCHAR(50),
   TransactionDate DATETIME,
   TransactionType VARCHAR(50),
   TransactionAmount DECIMAL(9,2)
)

On June 12, 2005, customer Smith deposited $500. However, due to a teller’s key error that was not caught in time, by the time the reporting data was loaded the amount that made it into the system was $5000:

INSERT Transactions
VALUES
(1001, ‘Smith’, ‘2005-06-12’, ‘DEPOSIT’, 5000.00)

The next morning, the erroneous data is detected. Although the transaction row itself could be updated in-place, this would destroy the audit trail, and so an offset transaction must be issued. There are a few ways of handling this scenario. The first method is to issue an offset transaction dated the same as the incorrect transaction:

INSERT Transactions
VALUES
(1001, ‘Smith’, ‘2005-06-12’, ‘OFFSET’, -4500.00)

Back-dating the offset fixes the problem in summary reports that group any dimension (transaction number, customer, date, or transaction type), but fails to keep track of the fact that the error was actually caught on June 13. Properly dating the offset record is imperative for data auditing purposes:

INSERT Transactions
VALUES
(1001, ‘Smith’, ‘2005-06-13’, ‘OFFSET’, -4500.00)

Unfortunately, proper dating does not fix all of the issues-and introduces new ones. After properly dating the offset, a query of the data on customer Smith, for all business done through June 12, does not include the correction. Only by including data from June 13 would the query return the correct data. Although a correlated query could be written to return the correct summary report for June 12, the data is in a somewhat-strange state when querying for ranges after June 12, e.g. June 13-15. The offset record is orphaned if June 12 is not considered in a given query, along with June 13.

To get around these and similar issues, a bitemporal model is necessary. In a bitemporal table, each transaction has two dates:

The actual date that the transaction took place, and
A “valid” date, which represents the date that we know the updated data to be correct.

The following, modified version of the Transactions table, shows the new column:

CREATE TABLE Transactions
(
   TransactionId INT,
   Customer VARCHAR(50),
   TransactionDate DATETIME,
   TransactionType VARCHAR(50),
   TransactionAmount DECIMAL(9,2),
   ValidDate DATETIME
)

When inserting the data for Smith on June 12, a valid date of June 12 is also applied:

INSERT Transactions
VALUES
(1001, ‘Smith’, ‘2005-06-12’, ‘DEPOSIT’, 5000.00, ‘2005-06-12’)

Effectively this row can be read as, “as of June 12, we believe that transaction 1001, dated June 12, was a deposit for $5000.00.” On June 13, when the error is caught, no offset record is inserted. Instead a corrected deposit record is inserted, with a new valid date:

INSERT Transactions
VALUES
(1001, ‘Smith’, ‘2005-06-12’, ‘DEPOSIT’, 500.00, ‘2005-06-13’)

This row indicates that as of June 13, transaction 1001 has been modified. But the important difference is that the transaction still maintains its correct date-so running a report for transactions that occurred on June 13 would show no rows for June 12. In addition, this model eliminates the need for offset transactions. Rather than use an offset, queries should always find the last update for any given transaction, within the valid range.

To understand this a bit more, consider a report run on August 5, looking at all transactions that occurred on June 12. The person running the report wants the most “correct” version of the data; that is, all available corrections should be applied. This is done by taking the transaction data for each transaction from the row with the maximum valid date:

SELECT
   T1.TransactionId,
   T1.Customer,
   T1.TransactionType,
   T1.TransactionAmount
FROM Transactions AS T1
WHERE
   T1.TransactionDate = ‘2005-06-12’
   AND T1.ValidDate =
   (
      SELECT MAX(ValidDate)
      FROM Transactions AS T2
      WHERE T2.TransactionId = T1.TransactionId
   )

By modifying the subquery, it is possible to get “snapshot” reports based on data before updates were applied. For instance, assume that this same report was run on the evening of June 12. The output for Smith would show a deposit of $5000.00 for transaction 1001. To reproduce that report on August 5 (or any day after June 12), change the ValidDate subquery:

Note that, in this case, the subquery could have been eliminated altogether, and the search argument could have become AND T1.ValidDate = ‘2005-06-12’. However, the subquery is needed any time you’re querying a range of dates, so it’s a good idea to leave it in place for ease of maintenance of the query.

Using this same pattern, data can also be booked in the future, before it is actually valid. It’s common when doing wire transfers, credit card payments, and other kinds of electronic funds transactions to be able to set the “posting date” on which the business will actually be executed. By working with the valid date, Smith can make a request for an outgoing transfer on June 14, but ask that the transfer actually take place on June 16:

INSERT Transactions
VALUES
(1002, ‘Smith’, ‘2005-06-16’, ‘TRANSFER’, -1000.00, ‘2005-06-14’)

Since the transaction date is June 16, a report dealing with transactions that occurred between June 1 and June 15 will not show the transfer. But a business manager can query on June 15 to find out which transactions will hit in the coming days or weeks, and audit when the data entered the system.

The examples shown above use only one table, for the sake of simplicity. In a real system, a base transactions table would be used, along with a running ledger table in which the actual transaction data would be persisted. The base table would contain information like the customer involved and the transaction type, while the ledger table would include such attributes as the transaction date, valid date, and amounts:

CREATE TABLE Transactions
(
  TransactionId INT,
    Customer VARCHAR(50),
    TransactionType VARCHAR(50),
    BaseTransactionDate DATETIME
)

CREATE TABLE TransactionLedger
(
    TransactionID INT,
    TransactionDate DATETIME,
    ValidDate DATETIME,
  TransactionAmount DECIMAL(9,2)
)

It’s important to differentiate between the BaseTransactionDate in the Transactions table and the TransactionDate in the TransactionLedger table. The former is there for record keeping purposes only-when did we first hear about this transaction? The latter is there for bitemporal queries-when is the actual transaction date? As an example, consider an online credit card payment system. You log in and ask to have your payment posted next Tuesday (future posting dates are currently available on all three of the online credit card payment systems to which I belong). Next Monday arrives and you realize that your paycheck isn’t going to clear on time, so you log in and change the posting date to Thursday. In this case, the new transaction took place on Monday-and, therefore, that date needs to be updated. And of course the valid date needs to be updated. However, the base transaction date does not change-it still happened the week before, and that fact should never change.

As a final note, when modeling bitemporal data you may want to investigate the possibility of implementing cutoff date rules, after which changes to transactions cannot be made. For example, the system may have a policy whereby transactions are said to be closed after 90-days. In this case, a simple CHECK constraint would do the trick, to ensure that the ValidDate is within 90 days of the TransactionDate. Another example would be data that has been used to generate an official report, such as for a government agency. In that case, you’d want a rule so that no transaction can be back-dated to before the report was run (lest it change the data in the report). In that case a trigger would be needed, in order to verify the date against a table containing the report run history.

Summary

Modeling data bitemporally allows for an auditable, accurate representation of historical or future knowledge as data is updated. This can be tremendously useful in many scenarios-especially in the realm of financial reconciliation when you can be forced to deal with backdated contracts that change the terms of previous transactions and business booked in the future to avoid certain types of auditing issues.

A Primer on Managing Data Bitemporally

Summary

Subscribe for more articles

Rate this article

Adam Machanic

Related articles

SQL Server Row Level Security Deep Dive. Part 6 – RLS Attack Mitigations

Temporary Tables in SQL Server

What’s new in T-SQL in SQL Server 2022

Tags