Data Dictionary Table in Staging Database

  • We are trying to extract flat files into a staging database. The extracted data from the flat files should be "raw" in the Kimball sense. That means there shouldn't be any changes to the data as they are being extracted. The staging tables should have the exact format as the source flat files and there should be a table for each type of flat file.

    The boss wants an additional table in the staging database: a data dictionary. There should be data flow task that populates the data dictionary table the same time as the staging tables are populated. I have copied and pasted the script that creates the data dictionary table below.

    CREATE TABLE [dbo].[DataDictionary](

    [ExtractDate] [datetime] NULL,

    [TblName] [varchar](65) NOT NULL,

    [ColName] [varchar](50) NOT NULL,

    [MacroName] [varchar](30) NULL,

    [LongName] [varchar](65) NULL,

    [Description] [varchar](500) NULL,

    [PKPos] [smallint] NOT NULL,

    [Datatype] [varchar](50) NULL,

    [Length] [int] NULL,

    [Prec] [smallint] NULL,

    [Scale] [smallint] NULL,

    [IsNullable] [bit] NOT NULL,

    [IsIdentity] [bit] NOT NULL,

    [DefaultValue] [varchar](100) NULL,

    [Active] [bit] NOT NULL,

    [CreatedByObj] [varchar](50) NULL,

    [IgnoreOnCompare] [char](1) NOT NULL,

    [ChangeHashBytes] [varbinary](max) NULL

    ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

    GO

    Here are my questions: 1) do we need this data dictionary table in our staging database that is basically the flat files in tables? 2) Why or why not?

  • imani_technology (6/30/2014)


    We are trying to extract flat files into a staging database. The extracted data from the flat files should be "raw" in the Kimball sense. That means there shouldn't be any changes to the data as they are being extracted. The staging tables should have the exact format as the source flat files and there should be a table for each type of flat file.

    The boss wants an additional table in the staging database: a data dictionary. There should be data flow task that populates the data dictionary table the same time as the staging tables are populated. I have copied and pasted the script that creates the data dictionary table below.

    CREATE TABLE [dbo].[DataDictionary](

    [ExtractDate] [datetime] NULL,

    [TblName] [varchar](65) NOT NULL,

    [ColName] [varchar](50) NOT NULL,

    [MacroName] [varchar](30) NULL,

    [LongName] [varchar](65) NULL,

    [Description] [varchar](500) NULL,

    [PKPos] [smallint] NOT NULL,

    [Datatype] [varchar](50) NULL,

    [Length] [int] NULL,

    [Prec] [smallint] NULL,

    [Scale] [smallint] NULL,

    [IsNullable] [bit] NOT NULL,

    [IsIdentity] [bit] NOT NULL,

    [DefaultValue] [varchar](100) NULL,

    [Active] [bit] NOT NULL,

    [CreatedByObj] [varchar](50) NULL,

    [IgnoreOnCompare] [char](1) NOT NULL,

    [ChangeHashBytes] [varbinary](max) NULL

    ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

    GO

    Here are my questions: 1) do we need this data dictionary table in our staging database that is basically the flat files in tables? 2) Why or why not?

    It looks like a log table to me and I think it's a way to keep track of what files where loaded and how they were handled. Something like this is super important during audits and investigations.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • +1 Jeff,

    By the way, did your boss shared any information when he asked for this table. He might have shared something didn't he?

  • Yes there are uses for these files. It logs what has been delivered and lets others see what fields are available.

    It also gives you a way of checking if the file exists before you try to import it.

  • P Jones (7/1/2014)


    Yes there are uses for these files. It logs what has been delivered and lets others see what fields are available.

    It also gives you a way of checking if the file exists before you try to import it.

    Just to be clear, it's not a file. It's a table. And, sorry but I have to ask, how does the simple existence of a table help you with determining if the file exists before you try to import it?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (7/1/2014)


    Just to be clear, it's not a file. It's a table. And, sorry but I have to ask, how does the simple existence of a table help you with determining if the file exists before you try to import it?

    Presumably one could check the table to see whether a file of the same name had already been imported, but I agree, it won't do anything to determine the actual existence of the file. I'm not so sure that the purpose of the table is to track imported files though; it looks more like a table that holds data about other tables to me - a metadata table if you will.

    Regards

    Lempster

  • Sorry, it is a metadata table, not a logging table. It contains information about the structure of the source system. The question is whether a metadata table of the source system belong in a staging database. My boss (who has never done ETL) seems to think so. I'm not so sure.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply