Loading x number of CSV files into x number of unique tables of a database

  • Seems like you could create a step in your SSIS package that groups all the files that start or end with the same prefix/suffix together. Then just use a For Each file loop, and you're all set. Or did I miss something?

  • masterelaichi wrote:

    The filenames are named with the groups there are in - Filename1_Group1.csv, Filename2_Group1.CSV, Filename3_Group1.csv; Filename1_Group2.csv, Filename2_Group2.CSV, Filename3_Group2.csv, and so on

    When I create the dataflow task, I add the name of the file as the name of the OLEDB destination component. So when I go to e OLEDB destination->table load or fast view-> NEW, the SQL code is CREATE TABLE Filename1_Group1. I edit this step by adding an identifier column using IDENTITY (1,1)

    Once I do this, the resulting tables in SQL are named the same as the raw file mentioned above. To answer your question as to how I identify, it is manual as the names of the file is the name of the sql table that is meant to be created. I can do these in batches, i.e, load all of Group1, followed by Group2, and so on

    Hope that helps

    I urge you to consider refining this process such that you do not keep creating new tables.

    Instead, have a single table per group and add Filename (and any other useful attributes, eg, LoadedAt) as a column in the table.

    If you do this, you can process all of the files in a single group using a FOREACH container in SSIS. With seven FOREACH containers, you could import all of the files to your seven group tables.

    If you aren't happy single, you won't be happy in a relationship.

    Remember, happiness comes from guitars, not relationships.

  • That would have been my preferred method as well. But the project has been going on for a few years and this is what has been followed historically, even before I started.  I am afraid it might be too complicated to change it now and might cause more damage

     

     

     

     

     

     

  • masterelaichi wrote:

    The filenames are named with the groups there are in - Filename1_Group1.csv, Filename2_Group1.CSV, Filename3_Group1.csv; Filename1_Group2.csv, Filename2_Group2.CSV, Filename3_Group2.csv, and so on

    When I create the dataflow task, I add the name of the file as the name of the OLEDB destination component. So when I go to e OLEDB destination->table load or fast view-> NEW, the SQL code is CREATE TABLE Filename1_Group1. I edit this step by adding an identifier column using IDENTITY (1,1)

    Once I do this, the resulting tables in SQL are named the same as the raw file mentioned above. To answer your question as to how I identify, it is manual as the names of the file is the name of the sql table that is meant to be created. I can do these in batches, i.e, load all of Group1, followed by Group2, and so on

    Hope that helps

    Perfect.  We're almost there.  Can you attached the CREATE TABLE code that's used for each of the first 3 groups?

    Also, you mentioned the "text qualifier" in your Step 3.  Are these files comma separated and all columns are "double quoted" or just some of the "cells" are "double quoted"?  And which version of SQL Server are you using?  2016 or less or 2017 or greater?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • masterelaichi wrote:

    That would have been my preferred method as well. But the project has been going on for a few years and this is what has been followed historically, even before I started.  I am afraid it might be too complicated to change it now and might cause more damage

    Not to worry... Let's peel one potato at a time.  See what I think is my final question in my previous post above.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Perfect. We're almost there. Can you attached the CREATE TABLE code that's used for each of the first 3 groups?

    The create table sql code gets automatically generated when I go to the edit property of the OLEDB destination-> New(table load or fast view). The only thing I do for every new table is add an ID column as shown below

    CREATE TABLE FILENAME1_GROUP1 (

    id_num INT Identity (1,1),

    Column1 varchar (800)

    Column2 varchar (800)

    .

    .

    .

    ColumnN varchar (800)

    )

    Also, you mentioned the "text qualifier" in your Step 3. Are these files comma separated and all columns are "double quoted" or just some of the "cells" are "double quoted"? And which version of SQL Server are you using? 2016 or less or 2017 or greater?

    I believe some of the column could be double quoted which is why I specify the column delimiter. To be honest, I am not sure what the Column Delimiter field does and how adding ' " ' affects the table. I am using SQL Server 2019

    The other steps I perform is I set the column length for each column as varchar(800). I believe this is to prevent truncation of certain text columns; and, in the advanced property of the FLAT FILE Source, I add the FileNameColumn in  Custom Properties in the Component Properties tab

     

    Hope that helps

     

  • Edit: the sql version at work I believe is SQL Server 2016. As for the version on my personal machine it most likely is SQL Server 2019 as I only installed in the last month or so

  • Ok.  Understood on the column names and the table creation.

    Is the data in the files separated by commas or tabs?

    Also, is there any chance of attaching one of the files or is the data proprietary or contain PII, etc?  Totally understood if you cannot.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • The CSVs were created from Excel. I just save as CSV. I am guessing they are tabbed (they do appear in a table format)

    The tables do contain confidential information so I cannot share them. However, I can dummy up some raw data, or provide a couple of  adventure works csv files to simulate the different filenames. Would that do ?

    Thanks

  • masterelaichi wrote:

    The CSVs were created from Excel. I just save as CSV. I am guessing they are tabbed (they do appear in a table format)

    The tables do contain confidential information so I cannot share them. However, I can dummy up some raw data, or provide a couple of  adventure works csv files to simulate the different filenames. Would that do ?

    Thanks

    If you open one of the CSVs in Notepad++ and then take the menu option View/Show Symbol/Show All Characters, you should be able to see exactly what the row and column delimiters are.

    If you aren't happy single, you won't be happy in a relationship.

    Remember, happiness comes from guitars, not relationships.

  • masterelaichi wrote:

    The CSVs were created from Excel. I just save as CSV. I am guessing they are tabbed (they do appear in a table format)

    The tables do contain confidential information so I cannot share them. However, I can dummy up some raw data, or provide a couple of  adventure works csv files to simulate the different filenames. Would that do ?

    Thanks

    No need on the latter and understood on the former.  Excel files that are exported as CSV files aren't actually true "Comma separated files".  Rather, they're "Comedy Separated Files" in that only those cells that contain the delimiter as part of the data will be double quoted in the file.  It makes for a bit of a train wreck for tools that use are capable of only using true CSV files.

    It's not a problem though.  It only takes a little, easy to do pre-proceessing.

    Since the data is taboo for posting, can you attache one of the actual column header lines from one of the files without adulteration and without any of the data?  I can probably do without that (I'll set the code up to figure out what is being used in actual files) but it would provide a confirmation for a test of that bit of functionality.

    I know it seems like there's a lot but there's really not.  I just have to make sure the "gazintas" are known. 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • I think some of the columns may potentially have sensitive content or other identifying names but I will check that and, if necessary, dummy the columns or mask it before sending. Hope that will help

     

     

  • Understood there, as well.  No worries.  It's not difficult for me to cobble some example tables.  I was just going "for the real stuff" just to be sure.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Thanks, Jeff. Really appreciate that!

     

  • Sorry  for not getting back sooner as I was away from work and didn't have access to files. I had a look and there aren't too many columns that I can share over here

    The common ones are Employee_ID, FirstName, Last Name, Org_Unit, SessionStartdatetime,SessionEndDateTime, WorkSessionCode

    I have masked some of the column names in the list. Hope it helps

     

     

     

Viewing 15 posts - 16 through 30 (of 49 total)

You must be logged in to reply to this topic. Login to reply