Process Tracking

,

Process Tracking

Introduction

One of the constructs that I find myself building quite

often is a process that periodically retrieves an incremental amount of data.

So many reports that I build, whether for myself or for others, are looking for

the new data since some x amount of time. It might be the sales for the last

day, changes on a particular table, etc., but usually it is the change in some

data over the last day.

Of course, this is usually a fairly easy report structure to

build. I have routines that determine the previous day’s date, resetting the

time to midnight (00:00:00) and then select all data that is new since this

date. Since most of the tables I am working with have some sort of datetime

field to track changes, this results in a query that looks something like:

declare @dt
select @dt = cast( substring( 
             cast(dateadd( day, -1, getdate()) as char( 20)), 1, 12) as
             datetime)
select *
 from sales
 where SaleDate > @dt

Where @dt is the previous day’s date, calculated with a time

set to some value.

The Problem

This works great for the first week that it is implemented.

The next Monday when someone comes into work and runs this report, they usually

see a smaller report than they are expecting. Why?

Well, in a few e-commerce applications I have worked with,

there is a bunch of activity on Friday and Saturday. I guess it’s the end of

the week and people do more surfing and purchasing. If we are calculating the

previous day’s date, then on Monday, we get all the sales for Sunday. This

leaves out the sales from Friday and Saturday.

No problem, we can easily change our code that calculates

the beginning date to account for weekends. With the following case statement,

we can decide whether we need to get one day or three days worth of data.

declare @dt datetime, @d tinyint
if datepart( weekday, getdate()) = 2
  select @d = -1
else
  select @d = -3
select @dt = cast( substring( cast( dateadd( day, -1, getdate()) as char( 20))
                   , 1, 12) as datetime)

This will check for a Monday (datepart = 2) and then set the

number of days to go back for the next statementl.

Since most of my processes are automated and I send an

incremental amount of data to a user automatically on some schedule, this used

to be my preferred solution. I have used this and it usually works until one of

two things happens. Either the task fails for some reason and it takes a day or

two to get fixed, or a holiday occurs. If the task breaks on Monday and it does

not run again until Tuesday ( this usually happens when I go on vacation) the user

gets a report on Tues that shows Mon sales, but is missing all the weekend

sales. After a holiday, the user gets two (or more ) reports instead of a

single consolidated report.

The Solution

I came up with this solution while building an extract of

data for a business partner. They wanted to receive an incremental load of mail

information every two weeks rather than a complete load of data. Since our mail

information was growing rapidly, this was a reasonable request and so I was

tasked with building this.

I used one of the techniques above which worked for a month

and then the task broke. By the time I had fixed it, a couple of days had gone

by and I had to manually generate the report, otherwise my task would have

missed a few days of data. It was then that I decided to build my process

tracking table. Here is the code

Create table PrcssTrckr
(    PrcssNm varchar( 80),
     LastXfr datetime,
     LastStat tinyint
)

Nothing fancy, but the coumns are defined as follows:

PrcssNm ProcessName This is an ad-hoc field with a unique index that describes which process I am

tracking.

LastXfr Last Transfer date This is the last date when I transferred the data using the particular

process.

LastStat Last Status I keep this column updated with an exit code from the process. When it

breaks, I can then check this to see what the result code was. The result codes

vary from process to process and are not necessarily stored procedure return

codes.

How I Use This

This table allows me flexibility in a few ways. One, I can

use the same data set, like mail information, for a number of different

processes. Suppose I have snail mail and email extraction processes that run on

the same customer information. For these two data feeds, I would add two rows

to my PrcssTrckr table as follows:

PrcssNm           LastXfr                 LastStat
--------------    -------------------     ------------------------------
SnailMailFeed     01-01-1900 00:00:00     0
EmailFeed         01-01-1900 00:00:00     0

These would be the initial rows inserted. Now we start the

snail mail export process which runs a query something like the following:

select name,address, city, state, country, zip
 from Customer c, PrcssTrckr p
 where c.created > p.lastxfr
 and p.PrcssNm = 'SnailMailFeed'
 

This returns a list of all members (since non were created

before the beginning of the last century. After this data is extracted, the

LastXfr date is set to GETDATE() and the table looks like the following:

PrcssNm           LastXfr                 LastStat
--------------    -------------------     ------------------------------
SnailMailFeed     02-10-2001 08:15:00     0
EmailFeed         01-01-1900 00:00:00     0

Now we know that this process last completed the time

stamped on the first row. If we now run the same process tomorrow, the query

returns an incremental feed of data containing customers created between 2-10-01

and 2-11-01. If I were to wait and run this every other day, then the

incremental feed would include two days of customers.

The process now tracks its own progress and always returns

an incremental feed of data. This is independent of the email process which

could be running on a completely different schedule.

This also allows the process to fail and when it is fixed,

still return the correct incremental amount of data to the user.

Conclusion

Over time, I find myself using this technique for many different

processes, from data extractions to customized reports. Anyplace the

incremental data feeds are needed, this has proven to be a valuable technique.

It has even ensured correct reports when my server was down for a few days

after our company moved to a new office. As with most techniques, I am sure

that many of you will find even more and better uses for this than I have.

As always, I welcome feedback.

Steve Jones

©dkRanch.net February 2001


Return to Steve Jones Home

Rate

Share

Share

Rate