Import many column CSV into 2-column sql table

  • Hi there,

    I have a source file that has many columns (25) and want to find the most efficient way to load it into a table with just two columns.

    I.e I always want be first column to appear so I would want columns 1+2, then 1+3, then 1+4, then 1+5..... Up to 1+25

    I was thinking of an insert command that pulls data from a staging table - however bit every row uses the full 25 columns- some are only 3 or 4 and I don't want rows in the final table that just have data in first column 1.

    Any help would be appreciated!!

  • Given that I'm not good at SSIS, I did this in Access, because it's what I know, and it works. (So it is good for some things!)

    In my source table "srcTable", the structure was like this:

    CREATE TABLE srcTable (

    ID INT IDENTITY(1,1) PRIMARY KEY,

    FieldA VARCHAR(10),

    FieldB VARCHAR(10),

    FieldC VARCHAR(10),

    FieldD VARCHAR(10)

    );

    Then my destination table was like this:

    CREATE TABLE destTable (

    ParentID INT NOT NULL,

    TheValue VARCHAR(10) NOT NULL);

    Then I created a module to do the dynamic SQL dirty work...

    Option Compare Database

    Option Explicit

    Public Sub NormalizeData()

    Dim tdf As DAO.TableDef

    Dim i As Integer

    Dim strSQL As String

    DBEngine(0)(0).TableDefs("srcTable").Fields.Refresh

    Set tdf = DBEngine(0)(0).TableDefs("srcTable")

    For i = 1 To tdf.Fields.Count - 1

    strSQL = "INSERT INTO destTable ( ParentID, TheValue ) SELECT srcTable.ID, srcTable.[" & tdf.Fields(i).Name & "] FROM srcTable WHERE srcTable.[" & tdf.Fields(i).Name & "] IS NOT NULL;"

    Debug.Print strSQL

    DBEngine(0)(0).Execute strSQL

    Next i

    Set tdf = Nothing

    End Sub

    Basically, it grabs the first column name (tdf.fields(0).Name) and the nth column (tdf.fields(i).Name) and inserts the non-null values into the destination table. then it processes the next column until there are no more columns to process.

    For the SQL Smarties, I would love to know how to do this in T-SQL... I know about the sys.columns stuff, but not sure how to use it... so I resorted to something I know and that works. =)

  • Being a visual oriented individual I have no idea even where to start. It would help if you provided sample and the expected results based on the sample data. The sampe data should be representative of your data, not actual production data.

  • elliottcoyne (12/14/2013)


    Hi there,

    I have a source file that has many columns (25) and want to find the most efficient way to load it into a table with just two columns.

    I.e I always want be first column to appear so I would want columns 1+2, then 1+3, then 1+4, then 1+5..... Up to 1+25

    I was thinking of an insert command that pulls data from a staging table - however bit every row uses the full 25 columns- some are only 3 or 4 and I don't want rows in the final table that just have data in first column 1.

    Any help would be appreciated!!

    It's real easy. Just load the data into a staging table table (as you've identified) and then do qualified CROSS APPLYs to unpivot the data which will also allow you to ignore any rows where the 2+ column have no data.

    If you'd like a coded example, please refer to the first link in my signature line below "Helpful Links" for the right was to post the example data.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Jeff - thanks for the offer of help. I've followed your guide (as far as I was able to) and created the following to mimic my data source:

    --truncate table tstStreamlineCode

    --===== Create the test table with

    CREATE TABLE tstStreamlineCodes

    (ID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,

    NHScode varchar(7),

    sl1 varchar(5),

    sl2 varchar(5),

    sl3 varchar(5),

    sl4 varchar(5),

    sl5 varchar(5),

    sl6 varchar(5),

    sl7 varchar(5),

    sl8 varchar(5),

    sl9 varchar(5)

    )

    --===== All Inserts into the IDENTITY column

    SET IDENTITY_INSERT tstStreamlineCodes ON

    --===== Insert the test data into the test table

    INSERT INTO tstStreamlineCodes

    (ID, NHScode, sl1 ,sl2 ,sl3 ,sl4 ,sl5 ,sl6 ,sl7 ,sl8 ,sl9)

    SELECT '1','10002K','4068','4065','4086','4069','4096','4120','4121','4097','4353' UNION ALL

    SELECT '2','10006P','4068','4065','4086','4069','4096','4120','4121','4097','' UNION ALL

    SELECT '3','1003T', '3632','','','','','','','','' UNION ALL

    SELECT '4','1007B','3633','','','','','','','','' UNION ALL

    SELECT '5','1024X','1589','2044','','','','','','','' UNION ALL

    SELECT '6','1037N','1589','2044','','','','','','',''

    --===== Set the identity insert back to normal

    SET IDENTITY_INSERT tstStreamlineCodes OFF

  • elliottcoyne (12/17/2013)


    Hi Jeff - thanks for the offer of help. I've followed your guide (as far as I was able to) and created the following to mimic my data source:

    --truncate table tstStreamlineCode

    --===== Create the test table with

    CREATE TABLE tstStreamlineCodes

    (ID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,

    NHScode varchar(7),

    sl1 varchar(5),

    sl2 varchar(5),

    sl3 varchar(5),

    sl4 varchar(5),

    sl5 varchar(5),

    sl6 varchar(5),

    sl7 varchar(5),

    sl8 varchar(5),

    sl9 varchar(5)

    )

    --===== All Inserts into the IDENTITY column

    SET IDENTITY_INSERT tstStreamlineCodes ON

    --===== Insert the test data into the test table

    INSERT INTO tstStreamlineCodes

    (ID, NHScode, sl1 ,sl2 ,sl3 ,sl4 ,sl5 ,sl6 ,sl7 ,sl8 ,sl9)

    SELECT '1','10002K','4068','4065','4086','4069','4096','4120','4121','4097','4353' UNION ALL

    SELECT '2','10006P','4068','4065','4086','4069','4096','4120','4121','4097','' UNION ALL

    SELECT '3','1003T', '3632','','','','','','','','' UNION ALL

    SELECT '4','1007B','3633','','','','','','','','' UNION ALL

    SELECT '5','1024X','1589','2044','','','','','','','' UNION ALL

    SELECT '6','1037N','1589','2044','','','','','','',''

    --===== Set the identity insert back to normal

    SET IDENTITY_INSERT tstStreamlineCodes OFF

    The following should do it (assuming you don't want to include the ID column which could easily be added in):

    SELECT ca.NHSCode,ca.SLCode

    FROM dbo.tstStreamLineCodes

    CROSS APPLY

    (

    SELECT NHSCode,sl1 UNION ALL

    SELECT NHSCode,sl2 UNION ALL

    SELECT NHSCode,sl3 UNION ALL

    SELECT NHSCode,sl4 UNION ALL

    SELECT NHSCode,sl5 UNION ALL

    SELECT NHSCode,sl6 UNION ALL

    SELECT NHSCode,sl7 UNION ALL

    SELECT NHSCode,sl8 UNION ALL

    SELECT NHSCode,sl9

    )ca(NHSCode,SLCode)

    WHERE ca.SLCode > ''

    ;

    Results:

    NHSCode SLCode

    ------- ------

    10002K 4068

    10002K 4065

    10002K 4086

    10002K 4069

    10002K 4096

    10002K 4120

    10002K 4121

    10002K 4097

    10002K 4353

    10006P 4068

    10006P 4065

    10006P 4086

    10006P 4069

    10006P 4096

    10006P 4120

    10006P 4121

    10006P 4097

    1003T 3632

    1007B 3633

    1024X 1589

    1024X 2044

    1037N 1589

    1037N 2044

    (23 row(s) affected)

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply