Does Change Data Capture require a table have a primary key

  • or can it record before and after column changes based on the LSN only?

    An extract from a file based legacy accounting system is performed every night. The system does not have a primary key because transactions are managed through program code. (the more things change...). The extract is copied to text in Unix and FTP'd to Windows, where the file is loaded into SQL Server by kill & fill. Because of the expense of modifying the source system, there is enormous inertia/resistance to injecting a primary key at the source, so kill & fill it stays.

    In reading about Change Data Capture, it seemed to me that column level insert update and delete are stored in helper tables that remember the before and after content of each column tracked. In my reading I have seen many references to the LSN to decide when and what to record as changed, but I have not seen any refereference to the necessity of a primary key for Change Data Capture to work. This is in contrast to replication, where the requirement for the existence of a primary key is made plain.

    Is it possible to use Change Data Capture against a table without a primary key?

    I would love to use it to change the extract from kill and fill to incremental.

    thanks very much for your help

    drew

  • To my knowledge the inbuilt CDC component in SQL server doesnt require a Primary Key column.

    The reason is this is simply a log of the activity, so if you insert, update then delete a record you will get 4 rows in the CDC table, the Original Insert, the Update Before, the Update After and finally the Delete.

    From a data gathering perspecive you also have the opyion of either retrieving all the different changes or just the last change.

    Hope this clarifies things.

    _________________________________________________________________________
    SSC Guide to Posting and Best Practices

  • CDC requires a unique index. That makes sense given what it has to be able to do.

    I think by default it uses the PK as the unique index, but you can specify a different unique index when you enable CDC on the table and SQL will use that instead of the PK. So a PK should not be required, although a unique index will be.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Scott,

    Do you have a link to that, as I didnt think that either a unique or PK were required for CDC as I cant find anything on Technet or in the CDC documentation that states one of these must be present.

    _________________________________________________________________________
    SSC Guide to Posting and Best Practices

  • A unique index is required if the table is to support net changes as well as all changes

    http://msdn.microsoft.com/en-us/library/cc627369%28v=sql.105%29.aspx

    A capture instance will always include a table valued function for returning all change table entries that occurred within a defined interval. This function is named by appending the capture instance name to "cdc.fn_cdc_get_all_changes_".

    If the parameter @supports_net_changes is set to 1, a net changes function is also generated for the capture instance. This function returns only one change for each distinct row changed in the interval specified in the call.

    To support net changes queries, the source table must have a primary key or unique index to uniquely identify rows. If a unique index is used, the name of the index must be specified using the @index_name parameter. The columns defined in the primary key or unique index must be included in the list of source columns to be captured.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • First, thank you Jason, Scott and Gail for your attention, time and advice.

    Second, NUTS! the requirement of a pk or unique index torpedo's my intention, as the source does not create uniqueness and attempts to create the minimum requirement, a unique index, consistently fail on the existence of duplicates.

    The stewards of the accounting system (Finance and Operations) assert that it is not possible to have either constraint on the table in the accounting system.

    When we get the source data on the SQL side, trying to uniqueify the table using the most important columns (claim number, date of service, date submitted, member, provider, diagnosis, procedure) consistently fails on duplicates because a provider can decide to rebill for the same exact event. Adjustments and reversals also complicate the uniqueification.

    At the risk of the lash, I could try to create a 'covering' index to uniqueify the table, but the table is twelve million rows deep and 138 columns wide.

    Third, I must be dumber than dirt because in tinkering with CDC I came to realize that the kill and fill method leaves me no where...CDC does not permit truncation and deletion indeed marks the target rows as deleted, but that means the source data has nothing in the target to compare it to and that the new source data is all there is!

    This makes me to think we must use two tables for source and target, target holding yesterday's extract, and source holding today's, but since there is no key and no index we'd need to compare each source row and column to each target row and column with something like a Merge When Not Matched, Outer Join or Not In but against 138 columns?...sounds like a non-starter.

    I do not want to give up replacing kill and fill with something more rational, but my pea brain is not big enough to wrap itself around the problem.

    thanks again

    drew

  • add a new column int identity and add the key on it , or create a checksum value for each roe

    Jayanth Kurup[/url]

  • My initial reaction is while that would create uniqueness based on the id, the content of the row is still duplicate.

    I will try it out and see if that gets me home.

    thanks very much

  • drew.georgopulos (1/15/2013)


    Second, NUTS! the requirement of a pk or unique index torpedo's my intention, as the source does not create uniqueness and attempts to create the minimum requirement, a unique index, consistently fail on the existence of duplicates.

    Maybe read what I said again.

    A unique index is needed if you want to enable the table for the net change function as well as the all changes function. If all you want is all changes, then the requirement for a unique index (for the net changes) falls away

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • i missed that...sorry to have been so dense.

    i'll try again at work

    thanks very much

  • Hello,

    I'm looking use CDC in my ETL process and want to make sure I understand things correctly. So CDC does not need a Primary Key or Unique Key to identify changed data as my source do not have Primary keys or Unique keys nor can I add them.

    Please advise, thank you.

    Data

  • so if the source table doesn't have either pk or unique index.. what will be syntax to enable the table?

    is it

    USE Db

    GO

    EXEC sys.sp_cdc_enable_table

    @source_schema = N'DBO',

    @source_name = N'tbl_Name',

    @role_name = NULL;

    GO

    Or we need to specify some value for @supports_net_changes

    USE Db

    GO

    EXEC sys.sp_cdc_enable_table

    @source_schema = N'DBO',

    @source_name = N'tbl_Name',

    @role_name = NULL

    @supports_net_changes = ??;

    GO

    Am attempting CDC for the first time.. please point out if am missing something.

    TIA

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply