Process Tracking
Introduction
One of the constructs that I find myself building quite
often is a process that periodically retrieves an incremental amount of data.
So many reports that I build, whether for myself or for others, are looking for
the new data since some x amount of time. It might be the sales for the last
day, changes on a particular table, etc., but usually it is the change in some
data over the last day.
Of course, this is usually a fairly easy report structure to
build. I have routines that determine the previous day’s date, resetting the
time to midnight (00:00:00) and then select all data that is new since this
date. Since most of the tables I am working with have some sort of datetime
field to track changes, this results in a query that looks something like:
declare @dt select @dt = cast( substring( cast(dateadd( day, -1, getdate()) as char( 20)), 1, 12) as datetime) select * from sales where SaleDate > @dt
Where @dt is the previous day’s date, calculated with a time
set to some value.
The Problem
This works great for the first week that it is implemented.
The next Monday when someone comes into work and runs this report, they usually
see a smaller report than they are expecting. Why?
Well, in a few e-commerce applications I have worked with,
there is a bunch of activity on Friday and Saturday. I guess it’s the end of
the week and people do more surfing and purchasing. If we are calculating the
previous day’s date, then on Monday, we get all the sales for Sunday. This
leaves out the sales from Friday and Saturday.
No problem, we can easily change our code that calculates
the beginning date to account for weekends. With the following case statement,
we can decide whether we need to get one day or three days worth of data.
declare @dt datetime, @d tinyint if datepart( weekday, getdate()) = 2 select @d = -1 else select @d = -3 select @dt = cast( substring( cast( dateadd( day, -1, getdate()) as char( 20)) , 1, 12) as datetime)
This will check for a Monday (datepart = 2) and then set the
number of days to go back for the next statementl.
Since most of my processes are automated and I send an
incremental amount of data to a user automatically on some schedule, this used
to be my preferred solution. I have used this and it usually works until one of
two things happens. Either the task fails for some reason and it takes a day or
two to get fixed, or a holiday occurs. If the task breaks on Monday and it does
not run again until Tuesday ( this usually happens when I go on vacation) the user
gets a report on Tues that shows Mon sales, but is missing all the weekend
sales. After a holiday, the user gets two (or more ) reports instead of a
single consolidated report.
The Solution
I came up with this solution while building an extract of
data for a business partner. They wanted to receive an incremental load of mail
information every two weeks rather than a complete load of data. Since our mail
information was growing rapidly, this was a reasonable request and so I was
tasked with building this.
I used one of the techniques above which worked for a month
and then the task broke. By the time I had fixed it, a couple of days had gone
by and I had to manually generate the report, otherwise my task would have
missed a few days of data. It was then that I decided to build my process
tracking table. Here is the code
Create table PrcssTrckr ( PrcssNm varchar( 80), LastXfr datetime, LastStat tinyint )
Nothing fancy, but the coumns are defined as follows:
PrcssNm | ProcessName | This is an ad-hoc field with a unique index that describes which process I am tracking. |
LastXfr | Last Transfer date | This is the last date when I transferred the data using the particular process. |
LastStat | Last Status | I keep this column updated with an exit code from the process. When it breaks, I can then check this to see what the result code was. The result codes vary from process to process and are not necessarily stored procedure return codes. |
How I Use This
This table allows me flexibility in a few ways. One, I can
use the same data set, like mail information, for a number of different
processes. Suppose I have snail mail and email extraction processes that run on
the same customer information. For these two data feeds, I would add two rows
to my PrcssTrckr table as follows:
PrcssNm LastXfr LastStat -------------- ------------------- ------------------------------ SnailMailFeed 01-01-1900 00:00:00 0 EmailFeed 01-01-1900 00:00:00 0
These would be the initial rows inserted. Now we start the
snail mail export process which runs a query something like the following:
select name,address, city, state, country, zip from Customer c, PrcssTrckr p where c.created > p.lastxfr and p.PrcssNm = 'SnailMailFeed'
This returns a list of all members (since non were created
before the beginning of the last century. After this data is extracted, the
LastXfr date is set to GETDATE() and the table looks like the following:
PrcssNm LastXfr LastStat -------------- ------------------- ------------------------------ SnailMailFeed 02-10-2001 08:15:00 0 EmailFeed 01-01-1900 00:00:00 0
Now we know that this process last completed the time
stamped on the first row. If we now run the same process tomorrow, the query
returns an incremental feed of data containing customers created between 2-10-01
and 2-11-01. If I were to wait and run this every other day, then the
incremental feed would include two days of customers.
The process now tracks its own progress and always returns
an incremental feed of data. This is independent of the email process which
could be running on a completely different schedule.
This also allows the process to fail and when it is fixed,
still return the correct incremental amount of data to the user.
Conclusion
Over time, I find myself using this technique for many different
processes, from data extractions to customized reports. Anyplace the
incremental data feeds are needed, this has proven to be a valuable technique.
It has even ensured correct reports when my server was down for a few days
after our company moved to a new office. As with most techniques, I am sure
that many of you will find even more and better uses for this than I have.
As always, I welcome feedback.
Steve Jones
©dkRanch.net February 2001