Loading data from fields with generic column names to another table with descriptive column names based on a mapping table.

  • I have recently been working on importing a file that my company receives from one of our partners which has a fixed number of columns that always appear in the file, but after these columns there are about 60-70 extra columns that vary file to file and aren't always named consistently. I have already talked to the partner extensively on the possibility of them sending us a regular file that is consistent in header structure, but they informed me it wouldn't be possible. Importing a file like this that varies in # of columns can be difficult using SSIS, but I have developed a solution that solves most of the issue but now my last hurdle is a T-SQL one.

    Mapping table:

    Staging table:

    My solution involves loading the data to a staging table that has generic names for the fields that change file to file, then loading a separate mapping table with the information specific to each file loaded which maps the 'real names' of these generic fields to their generic counterparts. I have a third table, not shown, which has the fixed columns which appear in every file, followed by a column for every possible field that could appear in the files we're receiving from our partner. My issue now is I need to create a stored procedure, or develop some other solution in SSIS, to be able to merge the data from my staging table to the production table and map the values correctly from columns with their temp names, to columns with their real names relative to the file that was most recently loaded.

    I haven't done anything in SQL yet which compares column names to  values in a table like I am attempting to do here, so I figured I would see if anybody else had ideas on how to best approach this.

    (please don't make fun of my ssms font)

  • Going to need a LOT more details.   Give us a test scenario with mocked up data, including table create scripts and the mocked up data as INSERTs, and then what the output SHOULD BE based on the sample data.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • sgmunson - Thursday, February 14, 2019 7:43 AM

    Going to need a LOT more details.   Give us a test scenario with mocked up data, including table create scripts and the mocked up data as INSERTs, and then what the output SHOULD BE based on the sample data.

    Gotcha, I can provide all that and will start writing it up now. I was just mainly curious if anybody knew of a way to basically compare column names in a table to field values in another table. If you know of any way to do this I would appreciate it, but i'll provide more info on my specific situation if that helps to paint a better picture.

  • You primarily need information in the system tables, which you can generally get from INFORMATION_SCHEMA.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • Okay, here are the table structures and some mock data (very simplified and condensed for the sake of the example) that should explain my situation.

    When mockdata1 is loaded to my staging table, the FAX POD values will be inserted into the 'x1' field. The Freight values will be inserted into the 'x2' field. The Deliver Confirmation values will be inserted into the 'x3' field. My mapping table will have 3 records with values like Temp Header: x1, Real Header: 013: FAX POD and so on and so forth for each of the fields that appear after the UpsReceiver field.

    The solution I am trying to implement is one which merges the data from this staging table to the production table, and correctly inserting the x1 values into the '013: FAX POD' field, as well as the x2 and x3 values into their respective fields

    Afterwards, my second file would be loaded and the values for the last 5 fields will be inserted into the x1-x5 fields in my staging table respectively. My mapping table will be truncated and re-populated with the appropriate values to map x1 to the 013: FAX POD field, and x2 to the 016: Shipping Charge Correction field, etc. As you can see, when loading the 2 files the number of fields after the UpsReceiver field there is a different number of fields, with some of the same names and some different names.

    In both files the x1 field ended up being mapped to the 013: FAX POD field, but x2 in the first file represents a different field in the production table than the x2 field in the second file. The solution I am trying to get to is one that can properly map the data from the staging table to the correct fields in the production table based on the mapping values that exist in the mapping table. So when mockdata1 is loaded and the records are merged into the production table, I need the x2 values to be inserted into the 400: Freight field. Then when the second file is loaded to the staging table and the records are merged into the production table I need the x2 values to be inserted into the 016: Shipping Charge Correction field.

    I'm not sure if I'm explaining this in the best way, so please let me know if you need me to clarify anything.

    My staging table:

    CREATE TABLE [dbo].[StagingUpsReturnsData](
        [UpsInvoice] [varchar](75) NULL,
        [UpsPackageTrackingNumber] [varchar](75) NOT NULL,
        [UpsService] [varchar](75) NULL,
        [UpsBillOption] [varchar](75) NULL,
        [UpsZone] [varchar](3) NULL,
        [UpsAccount] [varchar](25) NULL,
        [UpsHWT] [varchar](75) NULL,
        [UpsShippedDate] [varchar](75) NOT NULL,
        [UpsNetCharges] [varchar](75) NULL,
        [UpsApproved] [varchar](75) NULL,
        [UpsNumberPackages] [varchar](75) NULL,
        [UpsBilledWeight] [varchar](75) NULL,
        [UpsGLCode] [varchar](25) NULL,
        [UpsDeclValue] [varchar](75) NULL,
        [UpsRef1] [varchar](75) NULL,
        [UpsRef2] [varchar](75) NULL,
        [UpsSender] [varchar](255) NOT NULL,
        [UpsReceiver] [varchar](255) NULL,
        [x1] [varchar](50) NULL,
        [x2] [varchar](50) NULL,
        [x3] [varchar](50) NULL,
        [x4] [varchar](50) NULL,
        [x5] [varchar](50) NULL,
    ) ON [PRIMARY]

    My production table:


    CREATE TABLE [dbo].[UpsReturnsData](
        [UpsInvoice] [varchar](75) NULL,
        [UpsPackageTrackingNumber] [varchar](75) NOT NULL,
        [UpsService] [varchar](75) NULL,
        [UpsBillOption] [varchar](75) NULL,
        [UpsZone] [varchar](3) NULL,
        [UpsAccount] [varchar](25) NULL,
        [UpsHWT] [varchar](75) NULL,
        [UpsShippedDate] [varchar](75) NOT NULL,
        [UpsNetCharges] [varchar](75) NULL,
        [UpsApproved] [varchar](75) NULL,
        [UpsNumberPackages] [varchar](75) NULL,
        [UpsBilledWeight] [varchar](75) NULL,
        [UpsGLCode] [varchar](25) NULL,
        [UpsDeclValue] [varchar](75) NULL,
        [UpsRef1] [varchar](75) NULL,
        [UpsRef2] [varchar](75) NULL,
        [UpsSender] [varchar](255) NOT NULL,
        [UpsReceiver] [varchar](255) NULL,
        [013: FAX POD] [varchar](50) NULL,
        [016: Shipping Charge Correction] [varchar](50) NULL,
        [018: Delivery Confirmation] [varchar](50) NULL,
        [275: Incentive] [varchar](50) NULL,
        [400: Freight] [varchar](50) NULL,
    ) ON [PRIMARY]

    My mapping table:


    CREATE TABLE [dbo].[UPSReturnsMapping](
        [TempHeader] [varchar](5) NULL,
        [RealHeader] [varchar](255) NULL
    ) ON [PRIMARY]

  • I think dynamic SQL is the way to go with this, but if you were to do it with normal SQL you would end up with something like this:
    INSERT INTO UpsReturnsData
    (
      [UpsInvoice],
      [UpsPackageTrackingNumber],
      [UpsService],
      [UpsBillOption],
      [UpsZone],
      [UpsAccount],
      [UpsHWT],
      [UpsShippedDate],
      [UpsNetCharges],
      [UpsApproved],
      [UpsNumberPackages],
      [UpsBilledWeight],
      [UpsGLCode],
      [UpsDeclValue],
      [UpsRef1],
      [UpsRef2],
      [UpsSender],
      [UpsReceiver],
      [013: FAX POD],
      [016: Shipping Charge Correction],
      [018: Delivery Confirmation],
      [275: Incentive],
      [400: Freight]
    )
    SELECT [UpsInvoice],
       [UpsPackageTrackingNumber],
       [UpsService],
       [UpsBillOption],
       [UpsZone],
       [UpsAccount],
       [UpsHWT],
       [UpsShippedDate],
       [UpsNetCharges],
       [UpsApproved],
       [UpsNumberPackages],
       [UpsBilledWeight],
       [UpsGLCode],
       [UpsDeclValue],
       [UpsRef1],
       [UpsRef2],
       [UpsSender],
       [UpsReceiver],
       CASE WHEN x1map.RealHeader = '013: FAX POD' THEN s.[x1]
        WHEN x2map.RealHeader = '013: FAX POD' THEN s.[x2]
        WHEN x3map.RealHeader = '013: FAX POD' THEN s.[x3]
        WHEN x4map.RealHeader = '013: FAX POD' THEN s.[x4]
        ELSE NULL
       END,
       CASE WHEN x1map.RealHeader = '016: Shipping Charge Correction' THEN s.[x1]
        WHEN x2map.RealHeader = '016: Shipping Charge Correction' THEN s.[x2]
        WHEN x3map.RealHeader = '016: Shipping Charge Correction' THEN s.[x3]
        WHEN x4map.RealHeader = '016: Shipping Charge Correction' THEN s.[x4]
        ELSE NULL
       END,
       CASE WHEN x1map.RealHeader = '018: Delivery Confirmation' THEN s.[x1]
        WHEN x2map.RealHeader = '018: Delivery Confirmation' THEN s.[x2]
        WHEN x3map.RealHeader = '018: Delivery Confirmation' THEN s.[x3]
        WHEN x4map.RealHeader = '018: Delivery Confirmation' THEN s.[x4]
        ELSE NULL
       END,
       CASE WHEN x1map.RealHeader = '275: Incentive' THEN s.[x1]
        WHEN x2map.RealHeader = '275: Incentive' THEN s.[x2]
        WHEN x3map.RealHeader = '275: Incentive' THEN s.[x3]
        WHEN x4map.RealHeader = '275: Incentive' THEN s.[x4]
        ELSE NULL
       END,
       CASE WHEN x1map.RealHeader = '400: Freight' THEN s.[x1]
        WHEN x2map.RealHeader = '400: Freight' THEN s.[x2]
        WHEN x3map.RealHeader = '400: Freight' THEN s.[x3]
        WHEN x4map.RealHeader = '400: Freight' THEN s.[x4]
        ELSE NULL
       END
    FROM dbo.StagingUpsReturnsData s
    LEFT JOIN dbo.UPSReturnsMapping x1map ON TempHeader='x1'
    LEFT JOIN dbo.UPSReturnsMapping x2map ON TempHeader='x2'
    LEFT JOIN dbo.UPSReturnsMapping x3map ON TempHeader='x3'
    LEFT JOIN dbo.UPSReturnsMapping x4map ON TempHeader='x4'

  • Jonathan AC Roberts - Thursday, February 14, 2019 9:02 AM

    I think dynamic SQL is the way to go with this, but if you were to do it with normal SQL you would end up with something like this:
    INSERT INTO UpsReturnsData
    (
      [UpsInvoice],
      [UpsPackageTrackingNumber],
      [UpsService],
      [UpsBillOption],
      [UpsZone],
      [UpsAccount],
      [UpsHWT],
      [UpsShippedDate],
      [UpsNetCharges],
      [UpsApproved],
      [UpsNumberPackages],
      [UpsBilledWeight],
      [UpsGLCode],
      [UpsDeclValue],
      [UpsRef1],
      [UpsRef2],
      [UpsSender],
      [UpsReceiver],
      [013: FAX POD],
      [016: Shipping Charge Correction],
      [018: Delivery Confirmation],
      [275: Incentive],
      [400: Freight]
    )
    SELECT [UpsInvoice],
       [UpsPackageTrackingNumber],
       [UpsService],
       [UpsBillOption],
       [UpsZone],
       [UpsAccount],
       [UpsHWT],
       [UpsShippedDate],
       [UpsNetCharges],
       [UpsApproved],
       [UpsNumberPackages],
       [UpsBilledWeight],
       [UpsGLCode],
       [UpsDeclValue],
       [UpsRef1],
       [UpsRef2],
       [UpsSender],
       [UpsReceiver],
       CASE WHEN x1map.RealHeader = '013: FAX POD' THEN s.[x1]
        WHEN x2map.RealHeader = '013: FAX POD' THEN s.[x2]
        WHEN x3map.RealHeader = '013: FAX POD' THEN s.[x3]
        WHEN x4map.RealHeader = '013: FAX POD' THEN s.[x4]
        ELSE NULL
       END,
       CASE WHEN x1map.RealHeader = '016: Shipping Charge Correction' THEN s.[x1]
        WHEN x2map.RealHeader = '016: Shipping Charge Correction' THEN s.[x2]
        WHEN x3map.RealHeader = '016: Shipping Charge Correction' THEN s.[x3]
        WHEN x4map.RealHeader = '016: Shipping Charge Correction' THEN s.[x4]
        ELSE NULL
       END,
       CASE WHEN x1map.RealHeader = '018: Delivery Confirmation' THEN s.[x1]
        WHEN x2map.RealHeader = '018: Delivery Confirmation' THEN s.[x2]
        WHEN x3map.RealHeader = '018: Delivery Confirmation' THEN s.[x3]
        WHEN x4map.RealHeader = '018: Delivery Confirmation' THEN s.[x4]
        ELSE NULL
       END,
       CASE WHEN x1map.RealHeader = '275: Incentive' THEN s.[x1]
        WHEN x2map.RealHeader = '275: Incentive' THEN s.[x2]
        WHEN x3map.RealHeader = '275: Incentive' THEN s.[x3]
        WHEN x4map.RealHeader = '275: Incentive' THEN s.[x4]
        ELSE NULL
       END,
       CASE WHEN x1map.RealHeader = '400: Freight' THEN s.[x1]
        WHEN x2map.RealHeader = '400: Freight' THEN s.[x2]
        WHEN x3map.RealHeader = '400: Freight' THEN s.[x3]
        WHEN x4map.RealHeader = '400: Freight' THEN s.[x4]
        ELSE NULL
       END
    FROM dbo.StagingUpsReturnsData s
    LEFT JOIN dbo.UPSReturnsMapping x1map ON TempHeader='x1'
    LEFT JOIN dbo.UPSReturnsMapping x2map ON TempHeader='x2'
    LEFT JOIN dbo.UPSReturnsMapping x3map ON TempHeader='x3'
    LEFT JOIN dbo.UPSReturnsMapping x4map ON TempHeader='x4'

    Thank you for your response, your regular SQL solution looks promising but it was your mention of dynamic SQL that led to my solution. I ended up making two dynamic SQL statements that I used in conjunction with each other to create an INSERT INTO statement that finds which fields are loaded into the mapping table and only loads those. This turned out to be much simpler than the solution I was imagining in my head, so thanks for starting me down this path!

    Here's the SQL I wrote in case anyone is interested:


    declare @sql2 nvarchar(max)
    set @sql2 = 'select
                [UpsInvoice]
        ,[UpsPackageTrackingNumber]
        ,[UpsService]
        ,[UpsBillOption]
        ,[UpsZone]
        ,[UpsAccount]
        ,[UpsHWT]
        ,[UpsShippedDate]
        ,[UpsNetCharges]
        ,[UpsApproved]
        ,[UpsNumberPackages]
        ,[UpsBilledWeight]
        ,[UpsGLCode]
        ,[UpsDeclValue]
        ,[UpsRef1]
        ,[UpsRef2]
        ,[UpsSender]
        ,[UpsReceiver] '
    select @sql2 = @sql2 + (select ', [' + TempHeader + ']' from UPSReturnsMapping where TempHeader not like 'x-%' and TempHeader <> 'x0' for XML PATH (''))
    set @sql2 = @sql2 + ' from StagingUpsReturnsData'

    declare @sql nvarchar(max)
    set @sql = 'Insert into [dbo].[UpsReturnsData](
                [UpsInvoice]
        ,[UpsPackageTrackingNumber]
        ,[UpsService]
        ,[UpsBillOption]
        ,[UpsZone]
        ,[UpsAccount]
        ,[UpsHWT]
        ,[UpsShippedDate]
        ,[UpsNetCharges]
        ,[UpsApproved]
        ,[UpsNumberPackages]
        ,[UpsBilledWeight]
        ,[UpsGLCode]
        ,[UpsDeclValue]
        ,[UpsRef1]
        ,[UpsRef2]
        ,[UpsSender]
        ,[UpsReceiver] '
    select @sql = @sql + (select ', [' + RealHeader + ']' from UPSReturnsMapping where TempHeader not like 'x-%' and TempHeader <> 'x0' for XML PATH (''))
    set @sql = @sql + ')'
    set @sql = @sql + @sql2
    exec sp_executesql @sql

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply