What are the alternatives to CDC for capturing and identifying the delta for a warehouse extraction?

  • davepc

    Old Hand

    Points: 320

    Hi,

    I'm putting together a spec for a warehouse that extracts data from 12+ sites. I've told them that CDC is the way to go and that they'll need SQL Server 2008 Enterprise to do this. They won't support R2.

    They say that this will cost them 600K and to look at other options.

    I know it can be accomplished (albeit inefficiently) with shadow tables and triggers, but I would like to know what other options there may be.

    Note: I need to capture ALL change, including that between loads, so a LastModifiedDateTime field added to each table won't cut it.

    Thanks in advance.

  • Andy Leonard

    SSCrazy Eights

    Points: 9865

    Hi Dave,

    Bret Flippen wrote a good article on this here[/url]. I implemented his suggestions recently at a client and, during an exchange on Twitter, Brett directed me to a cool custom component on CodePlex that generates hashes in a data flow. It's called the SSIS Multiple Hash[/url] and works well.

    Hope this helps,

    Andy

    Andy Leonard, Chief Data Engineer, Enterprise Data & Analytics

  • davepc

    Old Hand

    Points: 320

    Thanks Andy. I'll check it out.

  • Koen Verbeeck

    SSC Guru

    Points: 258832

    Andy Leonard (2/25/2011)


    Bret Flippen wrote a good article on this here[/url].

    Thanks for the good reference!

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Phil Parkin

    SSC Guru

    Points: 243282

    A question to the OP:

    Consider the following scenario ...

    Package runs at 1pm.

    1) Change made to record at 1.05pm.

    2) Another change made to same record at 1.10pm.

    Package runs again at 1.15pm.

    Your requirement is to pick up 'all changes', so I guess that means both (1) and (2), even if (2) overwrites (1) - is that correct? Otherwise, Last Modified would work OK wouldn't it?

    If the answer to your question can be found with a brief Google search, please perform the search yourself, rather than expecting one of the SSC members to do it for you.

  • davepc

    Old Hand

    Points: 320

    Hi Phil,

    Yes. LastModifiedDateTime would work if you don't need to capture all change and don't allow deletions (which is a different story).

    If you allow deletions and don't just mark them as deleted (i.e. IsActive = false), then CDC or triggers can keep track of the deletions for you.

    I'm just curious about what others have done in similar situations.

    Cheers,

    Dave

  • Koen Verbeeck

    SSC Guru

    Points: 258832

    davepc (2/28/2011)


    Hi Phil,

    Yes. LastModifiedDateTime would work if you don't need to capture all change and don't allow deletions (which is a different story).

    If you allow deletions and don't just mark them as deleted (i.e. IsActive = false), then CDC or triggers can keep track of the deletions for you.

    I'm just curious about what others have done in similar situations.

    Cheers,

    Dave

    What is your exact definition of "all changes"?

    If a row is updated twice, you want the two seperate updates?

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • davepc

    Old Hand

    Points: 320

    Hi Koen,

    Yes. I want the two updates.

    They will both be entered into the warehouse and have appropriate EffectiveFrom and EffectiveTo datetime fields. The latest one has a null EffectiveTo value.

    They will also get a surrogate key that will be used as a foreign key for related data that falls in the same datetime range.

    I know a warehouse isn't an auditing tool, but I want to capture the context of reference data at the time it was used. It's important for this particular warehouse as it's medical based.

    Cheers,

    Dave

  • Koen Verbeeck

    SSC Guru

    Points: 258832

    With the built-in utilities from SQL Server Standard edition, you're stuck with the triggers as you said before, since change_tracking doesn't take all updates into account.

    CDC does...

    If you can modify the system that inputs the data, you can add OUTPUT clauses everywhere, but as soon as one forgets them in a new query, it breaks your system.

    So I guess triggers are your only option (unless I'm missing something). Or convince them to go for CDC 🙂

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • vijay

    Newbie

    Points: 1

    Hi Andy,

    It is really helpful if you share that link where i can get the solution that Brett provided.

Viewing 10 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply