SQLServerCentral Editorial

The Rate of Change of Data

,

When you get into a muddle with your data, it manifests in many ways. Caching is one symptom. Often, people tackle problems with the scaling of data-driven websites by introducing caching. This means that, instead of getting the data from the database, it is held, temporarily, on the web server. It is worth looking at what is going on with caching.

For example, I once worked on a booking website that cached on the web server the data for a calendar of the year, with a 10-second refresh rate. It didn't make a lot of sense, seeing that the dates for religious and national holidays are fairly-firmly fixed. Unless some election propels an unlikely eccentric into power who goes on to create four new national holidays, we're pretty safe with even a one-year refresh rate for a calendar.

Some data should never change, such as account entries. I've seen many databases where the previous years' trading is recalculated from the individual entries on every report. If that sort of data ever changes retrospectively in your organization, I'd be fascinated to know why. I once came across a database (well, ten linked databases actually, one for each year!) that recalculated a whole ten year grant-allocation history every time anyone enquired. The whole topic of aggregation tables, along with several other aspects to database science, were a mystery to the team in charge of it.

I'm not suggesting that web server caching should hold data for very long, but that the database itself should deal as intelligently as possible with repeated calls for the same information.

Generally, I like to create data-driven websites where each entity has its safe caching range defined within the database. After all, the rate of change of the data, and the characteristics of the way it changes, are aspects of your data that it is important to know about, if you are interested in performance and scaling. Where values change constantly you never cache them, and if they change randomly, you need a 'stale' toggle to trigger a refresh all the way to the application.

Nowadays, I do the preliminary caching within the database, in a variety of ways, but I also expose the metadata about the caching requirements of the database entities, which then allows the individual applications who need to access the data to do caching as well

Some might argue that this is putting "application logic" in the database. I disagree. Surely, it is the database that should know about the data and be the primary source, not just of the current values but also of the volatility of the data, the rate of change, the type of information (personal, financial, organization, and so on). It makes sense to hold this information in just one place and there are plenty of reasons for favouring the database as that place.

Phil Factor.

Rate

5 (1)

You rated this post out of 5. Change rating

Share

Share

Rate

5 (1)

You rated this post out of 5. Change rating