SSIS Package Help with , Delimiter File.

  • rocky_498

    SSCertifiable

    Points: 6321

    Hi Guys,

    Need Help in SSIS Package to Pull the data from .CSV and load into SQL DB.

    Here is sample data looks like.

    ADV                    ,A_DAT       ,A_TIM      ,P_tit            ,S_Tit   
    "SOCLEAN INC","5/16/2018","14:39:44",""Z-QUNICY,"DR K 60 V1"
    "SOCLEAN INC","5/16/2018","14:39:44","Z,QUNICY ,"DR K 60 V1"

    The .csv file is (,) delimiter and they put " quote to accommodate, in the data. However, In Some data, they are Putting "" double quotes and it is failing
    my package. In "Flat File Connection" Manager I can use ( Text Qualifier = " ). Is it possible I can use both Text Qualifier (, and ") to resolve this type of
    issue or any other suggestion? I know, someone will suggest, I will ask the client to fix the data or change the file to "|" delimiter but it is not easy to accomplish.

    Thanks for your help in advance.

  • frederico_fonseca

    SSChampion

    Points: 14187

    Is that really the content of the file? if so it is a piece of crap data really.

    ADV ,A_DAT ,A_TIM ,P_tit ,S_Tit
    "SOCLEAN INC","5/16/2018","14:39:44",""Z-QUNICY,"DR K 60 V1" -- this one would fail as it has a double quote not escapped - 
    "SOCLEAN INC","5/16/2018","14:39:44","Z,QUNICY ,"DR K 60 V1" -- > this one would fail as it has a missing quote

    "SOCLEAN INC","5/16/2018","14:39:44","""Z-QUNICY","DR K 60 V1" -- if indeed it has a quote on the content this is how it should be
    "SOCLEAN INC","5/16/2018","14:39:44","Z,QUNICY ","DR K 60 V1" -- > this required the missing quote to be added 

    your package should have " as text delimiter, and , (comma) as column delimiter - standard on CSV files (which the above content is not)

    so what you need to ask to whoever gave you the file is to generate it correctly - no need to change the delimiter to something else, just needs to be generated the right way.

  • rocky_498

    SSCertifiable

    Points: 6321

    Hi Frederico,
    Thank you for your time to comment. I am sure you already read "it is a SAMPLE data".  The client is sending the data from "SalesForce" system.
    I already mentioned in above comment. I know I can go back and ask Client to fix but it is not easy, that's why I need someone advice.
    The solution that I can come with and I will share with you if you want. 
    Open a source file in "Script Task" and Replace Column Delimiter ( ","  TO | ) and then Drive column transformation clean the " data . 

    Thank You.

  • Jeff Moden

    SSC Guru

    Points: 994937

    rocky_498 - Friday, May 18, 2018 12:10 PM

    Hi Guys,

    Need Help in SSIS Package to Pull the data from .CSV and load into SQL DB.

    Here is sample data looks like.

    ADV                    ,A_DAT       ,A_TIM      ,P_tit            ,S_Tit   
    "SOCLEAN INC","5/16/2018","14:39:44",""Z-QUNICY,"DR K 60 V1"
    "SOCLEAN INC","5/16/2018","14:39:44","Z,QUNICY ,"DR K 60 V1"

    The .csv file is (,) delimiter and they put " quote to accommodate, in the data. However, In Some data, they are Putting "" double quotes and it is failing
    my package. In "Flat File Connection" Manager I can use ( Text Qualifier = " ). Is it possible I can use both Text Qualifier (, and ") to resolve this type of
    issue or any other suggestion? I know, someone will suggest, I will ask the client to fix the data or change the file to "|" delimiter but it is not easy to accomplish.

    Thanks for your help in advance.

    You shouldn't have to do anything fancy other than setting the delimiter to a comma and the text qualifier to ".

    That's provided that any quotes embedded as a part of the name (or whatever) are properly escaped with double quotes, which is likely what your failure is.  If that type of poorly formed data is the problem, then import without the text qualifier and post-process the data to STUFF out the first and last quotes in the column or use a SUBSTRING in your final SELECT to select from character 2 to the length of the data minus 1.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."
    When you put the right degree of spin on it, the number 3|8 is also a glyph that describes the nature of a DBAs job. 😉

    Helpful Links:
    How to post code problems
    Create a Tally Function (fnTally)

  • frederico_fonseca

    SSChampion

    Points: 14187

    I am sorry but not much you can do without them fixing the extraction of the data.

    Lets analyze the sample you gave us - which I trully hope is a exact match from the source file you got.
    Because if it is not them you are asking us to look at and comment on something that does not match the real issue.

    ADV ,A_DAT ,A_TIM ,P_tit ,S_Tit
    "SOCLEAN INC","5/16/2018","14:39:44",""Z-QUNICY,"DR K 60 V1"
    "SOCLEAN INC","5/16/2018","14:39:44","Z,QUNICY ,"DR K 60 V1"

    Ignore the " for time being - just assume they are part of the data
    Also assume that the delimiter is "," as would be logical to admit on the sample supplied

    Row 1 would then be split as follows
    col 1 = "SOCLEAN INC"
    col 2 = "5/16/2018"
    col 3 = "14:39:44"
    col 4 = ""Z-QUNICY
    col 5 = "DR K 60 V1"

    Row 2 would be split as
    col 1 = "SOCLEAN INC"
    col 2 = "5/16/2018"
    col 3 = "14:39:44"
    col 4 = "Z
    col 5 = QUNICY
    col 6 = "DR K 60 V1"

    Oopsss. second row contains 1 more column than what is expected.

    Should you manually replace "," with "|" the issue will be exactly the same so that is not a solution at all.
    if you change "," to | -- I assume you meant changing 3 chars to 1 on your last comment as opposed to globally change , to |

    data then becomes
    "SOCLEAN INC|5/16/2018|14:39:44|"Z-QUNICY,"DR K 60 V1"
    "SOCLEAN INC|5/16/2018|14:39:44|Z,QUNICY ,"DR K 60 V1"
    again splitting would give
    col 1 = "SOCLEAN INC
    col 2 = 5/16/2018
    col 3 = 14:39:44
    col 4 = "Z-QUNICY,"DR K 60 V1"
    Again oops -- 1 less column than required

    lets try another replacement change >,"< to >|<
    "SOCLEAN INC"|5/16/2018"|14:39:44"|"Z-QUNICY|DR K 60 V1"
    "SOCLEAN INC"|5/16/2018"|14:39:44"|Z,QUNICY |DR K 60 V1"
    split
    row 1
    col 1 = "SOCLEAN INC"
    col 2 = 5/16/2018"
    col 3 = 14:39:44"
    col 4 = "Z-QUNICY
    col 5 = DR K 60 V1"

    row 2
    col 1 = "SOCLEAN INC"
    col 2 = 5/16/2018"
    col 3 = 14:39:44"
    col 4 = Z,QUNICY
    col 5 = DR K 60 V1"

    this time it works - but if your what you get is not reliable it can stop working at any time one of the fields does not get a " at the start, or if a field contains within the real data a sequence of ," (which could easily happen)

    So if the sample is an exact match of the content of the file issue your solution will never be solved on SSIS with absolute assurance it will work on all situations - as Jeff hinted you need to load the record without processing the quotes - but on this case you need to go even further and load the record as a single field, and then process it on SQL (or on C# within SSIS) - won't be easy depending on what else you get on the file.
    And if for some reason one of the source columns contain a row delimiter (LF or CR+LF) that matches your current file row delimiter you have further issues with SSIS. Although you may not have any at the moment does not mean you will not get it on a future file.

Viewing 5 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply