Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

Process Tracking

By Steve Jones,

Process Tracking

Introduction

One of the constructs that I find myself building quite often is a process that periodically retrieves an incremental amount of data. So many reports that I build, whether for myself or for others, are looking for the new data since some x amount of time. It might be the sales for the last day, changes on a particular table, etc., but usually it is the change in some data over the last day.

Of course, this is usually a fairly easy report structure to build. I have routines that determine the previous day’s date, resetting the time to midnight (00:00:00) and then select all data that is new since this date. Since most of the tables I am working with have some sort of datetime field to track changes, this results in a query that looks something like:

declare @dt
select @dt = cast( substring( 
             cast(dateadd( day, -1, getdate()) as char( 20)), 1, 12) as
             datetime)

select *
 from sales
 where SaleDate > @dt

Where @dt is the previous day’s date, calculated with a time set to some value.

The Problem

This works great for the first week that it is implemented. The next Monday when someone comes into work and runs this report, they usually see a smaller report than they are expecting. Why?

Well, in a few e-commerce applications I have worked with, there is a bunch of activity on Friday and Saturday. I guess it’s the end of the week and people do more surfing and purchasing. If we are calculating the previous day’s date, then on Monday, we get all the sales for Sunday. This leaves out the sales from Friday and Saturday.

No problem, we can easily change our code that calculates the beginning date to account for weekends. With the following case statement, we can decide whether we need to get one day or three days worth of data.

declare @dt datetime, @d tinyint
if datepart( weekday, getdate()) = 2
  select @d = -1
else
  select @d = -3

select @dt = cast( substring( cast( dateadd( day, -1, getdate()) as char( 20))
                   , 1, 12) as datetime)

This will check for a Monday (datepart = 2) and then set the number of days to go back for the next statementl.

Since most of my processes are automated and I send an incremental amount of data to a user automatically on some schedule, this used to be my preferred solution. I have used this and it usually works until one of two things happens. Either the task fails for some reason and it takes a day or two to get fixed, or a holiday occurs. If the task breaks on Monday and it does not run again until Tuesday ( this usually happens when I go on vacation) the user gets a report on Tues that shows Mon sales, but is missing all the weekend sales. After a holiday, the user gets two (or more ) reports instead of a single consolidated report.

The Solution

I came up with this solution while building an extract of data for a business partner. They wanted to receive an incremental load of mail information every two weeks rather than a complete load of data. Since our mail information was growing rapidly, this was a reasonable request and so I was tasked with building this.

I used one of the techniques above which worked for a month and then the task broke. By the time I had fixed it, a couple of days had gone by and I had to manually generate the report, otherwise my task would have missed a few days of data. It was then that I decided to build my process tracking table. Here is the code

Create table PrcssTrckr
(    PrcssNm varchar( 80),
     LastXfr datetime,
     LastStat tinyint
)

Nothing fancy, but the coumns are defined as follows:

PrcssNm ProcessName This is an ad-hoc field with a unique index that describes which process I am tracking.
LastXfr Last Transfer date This is the last date when I transferred the data using the particular process.
LastStat Last Status I keep this column updated with an exit code from the process. When it breaks, I can then check this to see what the result code was. The result codes vary from process to process and are not necessarily stored procedure return codes.

How I Use This

This table allows me flexibility in a few ways. One, I can use the same data set, like mail information, for a number of different processes. Suppose I have snail mail and email extraction processes that run on the same customer information. For these two data feeds, I would add two rows to my PrcssTrckr table as follows:

PrcssNm           LastXfr                 LastStat
--------------    -------------------     ------------------------------
SnailMailFeed     01-01-1900 00:00:00     0
EmailFeed         01-01-1900 00:00:00     0

These would be the initial rows inserted. Now we start the snail mail export process which runs a query something like the following:

select name,address, city, state, country, zip
 from Customer c, PrcssTrckr p
 where c.created > p.lastxfr
 and p.PrcssNm = 'SnailMailFeed'
 

This returns a list of all members (since non were created before the beginning of the last century. After this data is extracted, the LastXfr date is set to GETDATE() and the table looks like the following:

PrcssNm           LastXfr                 LastStat
--------------    -------------------     ------------------------------
SnailMailFeed     02-10-2001 08:15:00     0
EmailFeed         01-01-1900 00:00:00     0

Now we know that this process last completed the time stamped on the first row. If we now run the same process tomorrow, the query returns an incremental feed of data containing customers created between 2-10-01 and 2-11-01. If I were to wait and run this every other day, then the incremental feed would include two days of customers.

The process now tracks its own progress and always returns an incremental feed of data. This is independent of the email process which could be running on a completely different schedule.

This also allows the process to fail and when it is fixed, still return the correct incremental amount of data to the user.

Conclusion

Over time, I find myself using this technique for many different processes, from data extractions to customized reports. Anyplace the incremental data feeds are needed, this has proven to be a valuable technique. It has even ensured correct reports when my server was down for a few days after our company moved to a new office. As with most techniques, I am sure that many of you will find even more and better uses for this than I have.

As always, I welcome feedback.

Steve Jones
©dkRanch.net February 2001
Return to Steve Jones Home

Total article views: 4637 | Views in the last 30 days: 4
 
Related Articles
FORUM

incremental load process

incremental load process

FORUM

Increment data while select

Increment data while select

ARTICLE

Building an Incremental Load Package – Level 3

The Stairway to BIML continues, with a lesson on how you might build a more complex package: an incr...

ARTICLE

Speeding Up Incremental ETL Processes in SSIS by Using MD5 Hashes

Speeding up incremental ETL processes in SSIS by tracking changes to records in an MD5 Hash

FORUM

Preview while building report - action on web

Links when building report in preview, do no function correctly when processed off server

Tags
 
Contribute

Join the most active online SQL Server Community

SQL knowledge, delivered daily, free:

Email address:  

You make SSC a better place

As a member of SQLServerCentral, you get free access to loads of fresh content: thousands of articles and SQL scripts, a library of free eBooks, a weekly database news roundup, a great Q & A platform… And it’s our huge, buzzing community of SQL Server Professionals that makes it such a success.

Join us!

Steve Jones
Editor, SQLServerCentral.com

Already a member? Jump in:

Email address:   Password:   Remember me: Forgotten your password?
Steve Jones