SSIS: Dynamically map metadata in a Data Flow Task

  • I am looking for some assistance. I am tasked with truncating and reloading tables from one server to another. Company policy prevents cross-server queries, but allows SSIS packages with cross-server connections. I am doing this for about 25 tables. I have the table names in a single table & I have created an FEL to execute tasks against each table one-by-one. It works fine to truncate all the tables. I run into issues, though, with the DataFlowTask. I'm able to tell it which server & table to dynamically connect from and to, but it doesn't know how to map the metadata. They're the exact same columns and field names in both source & destination, so my hope was that they would automap without me having to do anything, but that's not happening. Any suggestions?

  • Quick question, are there frequent changes in the schema/tables? If not, you are better off using the import export wizard to create the packages and edit as needed. To dynamically construct a pipeline, you will have to dynamically construct the package, not too much of a hassle but I wouldn't bother with that for only handful of tables unless there are frequent changes.

    😎

  • No, there are not frequent schema changes. In fact, Schema changes are handled through source control deployment. The movement of the data itself is not, though. I've not used the wizard. Is it possible to create one master package with this wizard so that all the tables are loaded at one time? I don't want 25 separate ssis packages. That's what I'm trying to avoid. I'm also wanting to do some package and error logging. I had hoped to do it for each table separately. My goal was to have a For Each Loop (FEL) that would loop through the 25 tables one-at-a-time & do these steps: 1) truncate destination table, 2) move data from source to destination, 3) log # of rows inserted, start & end time, etc... and 4) write any errors to an error file for inspection later. I was hoping to do those steps for each table one-at-a-time to help in troubleshooting/restarting a package if it ever failed. Also, I was hoping that by controlling which tables are used by using a single table to drive the FEL, I could easily add future tables without having to pull the package down & edit it. Am I overcomplicating things?

  • I take it you are doing a simple lift and shift of data from one source to the other?

    I had a similar issue, except I had to do this for around 500 tables from various sources, there was no way I was going to manually write the packages, they weren't created for execution on the fly, but it did create them for pasting into an SSIS project.

    I've attached the core of the package, the one issue I had was with the connections, but if you create them in a base package, it shouldn't be too much of an issue.

    The basic process is

    1) Get list of Tables, along with the package name

    2) Loop through the list

    3) Get a Base package (needed for the connections to save on the fly creation)

    4) Create a Package and save to an output location (c:\packages) using the name above and save

    If you open the script task you will see that I also create a row counter (214-227), and logging component after the pipeline that uses variables (Lines 308-347) in the base package.

    I also use Project level connections but you might be able to solve that with embedded connections in the base.

    Its not the neatest of code as it was hacked together and I've not had time to go back through and make changes.

    Hopefully it can be used as a guide for the dynamic builder, and then all you need to do is execute the package that was created.

    _________________________________________________________________________
    SSC Guide to Posting and Best Practices

  • lduvall (10/1/2014)


    No, there are not frequent schema changes. In fact, Schema changes are handled through source control deployment. The movement of the data itself is not, though. I've not used the wizard. Is it possible to create one master package with this wizard so that all the tables are loaded at one time? I don't want 25 separate ssis packages.

    The wizard creates one single huge package with 25 data flows. If you want to run them in parallel, you can simply delete the precedence constraints.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • lduvall (10/1/2014)


    I am looking for some assistance. I am tasked with truncating and reloading tables from one server to another. Company policy prevents cross-server queries, but allows SSIS packages with cross-server connections. I am doing this for about 25 tables. I have the table names in a single table & I have created an FEL to execute tasks against each table one-by-one. It works fine to truncate all the tables. I run into issues, though, with the DataFlowTask. I'm able to tell it which server & table to dynamically connect from and to, but it doesn't know how to map the metadata. They're the exact same columns and field names in both source & destination, so my hope was that they would automap without me having to do anything, but that's not happening. Any suggestions?

    If you allow 3rd party components, CozyRoc [/url]has a data flow that can handle changing metadata.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Sorry missed the whole 'don't want 25 packages' quote, but still the script might help others.

    _________________________________________________________________________
    SSC Guide to Posting and Best Practices

  • Thanks for all the suggestions. I think I'm going to use the Import/Export wizard to create the initial package & then go in and modify each step to include rowcount & error handling. Such a pain. I'm not sure I can figure out the script component idea. It sounds cool but I don't know if I can justify the time for the sprint board. My company won't allow CozyRoc or Pragmatic Works or any of the cool 3rd party transforms. Ugh. Great responses, though, and very much appreciated!

  • lduvall (10/2/2014)


    Thanks for all the suggestions. I think I'm going to use the Import/Export wizard to create the initial package & then go in and modify each step to include rowcount & error handling. Such a pain. I'm not sure I can figure out the script component idea. It sounds cool but I don't know if I can justify the time for the sprint board. My company won't allow CozyRoc or Pragmatic Works or any of the cool 3rd party transforms. Ugh. Great responses, though, and very much appreciated!

    No Problem on the script task, if your back is against the wall for a delivery the Import/Export wizard is the quickest way.

    _________________________________________________________________________
    SSC Guide to Posting and Best Practices

  • Using the wizard ...

    Depending on the relationship of the tables, you can also then group the individual Data Flows (DFs) together into sequence containers and run those serially, with the DF tasks within each container running in parallel. It makes it so that you can roll back a transaction/container and avoid data inconsistency or foreign contstaint violations (if existing).

    Also, is this a process that is a one time task or will it happen periodically?

    ----------------------------------------------------

  • Hi, Jason,

    Can you please provide the table structure with data . As well as please explain why is these tow tables are used.

    Thanks

    Revan

  • revanappa.shivanagi (11/20/2014)


    Hi, Jason,

    Can you please provide the table structure with data . As well as please explain why is these tow tables are used.

    Thanks

    Revan

    The Table structure is very simple and it fulfils a number of roles,

    CREATE TABLE [dbo].[ImportTableList]

    (

    [ImportTableListId] INT NOT NULL IDENTITY (1,1) NOT FOR REPLICATION,

    [SourceSystem] NVARCHAR(100) NOT NULL,

    [ProcessStream] TINYINT NOT NULL,

    [SchemaName] NVARCHAR(50) NOT NULL,

    [TableName] NVARCHAR(100) NOT NULL,

    [LoadPackageName] NVARCHAR(100) NOT NULL,

    [IsTableActive] BIT NOT NULL CONSTRAINT [DF_ImportTableList_IsTableActive] DEFAULT 0,

    CONSTRAINT [PK_ImportTableList] PRIMARY KEY ([ImportTableListId]),

    )

    I changed the SourceSystem to an NVARCHAR(100) for simplicity, but it could be split out into a table with an FK if required.

    The key columns for the Builder are SchemaName, TableName, PackageName and IsTableActive

    The SchemaName refers to the Schema on the source, while TableName is used by Both source and destination.

    PackageName is the name that the is called though I use a generic format and so don't use it for anything other than the ETL control process.

    IsTableActive is used both by the Builder to identify packages that need to be built tables, as well as the ETL process to identify which tables are to be loaded.

    It has a few issues, especially with non-standard characters (eg %,£,$,?) in field names, when creating the package.

    In regards to Data I cant provide that as its unique to every system, but a row may look like

    INSERT INTO ImportTableList

    (

    SourceSystem

    ,ProcessStream

    ,SchemaName

    ,TableName

    ,LoadPackageName

    ,IsTableActive

    )

    VALUES

    (

    'MySource'

    , 0

    , 'dbo'

    , 'aTable'

    , 'Staging.MySource.dbo_aTable.dtsx'

    , 1

    )

    NOTE : Both the source Table and Destination table need to be exactly the same definition.

    _________________________________________________________________________
    SSC Guide to Posting and Best Practices

  • The reason the dataflow does not work inside a for each loop container, my thinking, is that every column passed through the dataflow data buffers has a unique lineageID. Although the column names across inputs and meta data may be equal, these lineage numbers are not for SSIS. It needs to be able to keep track of them as different columns.

    ----------------------------------------------------

  • Hi,

    Thanks!!!. In my project column names of source and destination table are diffrent. So is ther any way to map individual column between source and destination transformation.

  • revanappa.shivanagi (11/22/2014)


    Hi,

    Thanks!!!. In my project column names of source and destination table are diffrent. So is ther any way to map individual column between source and destination transformation.

    I suggest starting a new thread so as to allow us to focus on your question. A quick answer is ... have you not used the Mappings tab in the OLE DB destination (assuming that is the destination component you are using).

    ----------------------------------------------------

Viewing 15 posts - 1 through 15 (of 25 total)

You must be logged in to reply to this topic. Login to reply