import flat file through SSIS

  • the problem is I can't post any data for security purpose. But I do appreciate your help. I got it work, but just not the way I expected. So maybe I'll just live with it for now.

    Thanks!

  • Understood about the security thing... lemme see what I can dig up for an example...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • If you have a text qualifier of double quotes and then you have nested double quotes within that are 2 in a row such as "data ""text data within field"" more data", then you can put a derived column task in your SSIS package to strip out the double-double quotes to get "data text data within field more data" . However if you only have one extra double quote then this won't work.

    REPLACE(import_text_field,"\"\"","")

    use the above in your Derived Column task and then use the newly created derived field to map to your destination

  • Hi Jeff- I realize this is an old post, but the problem's current. :hehe:

    The issue I have is similar to others. I received .txt files with the pipe or vertical line as the field delimiter, and double quotes as the text qualifier, and I need to upload these files in a new db in SQL 2008. Below is an example of my dilemma. I tried a few ways but none helped. In the example below, the second line will be a problem because of the present double quotes in the address field. The last examples has the pipes embedded in the field. When files are relatively small I open them, I correct the issue, and then I try importing again. But when we talk about millions of records (30 to 50 m), then I'm out of luck. Not to mention that some .txt files may contain a "Comments", "Description", or "Notes" fields with a max number of char holding long texts. Inevitably, you'll find quotes, commas, pipe lines (usually in email signatures from emails that have been copied/pasted), etc. At any rate, any help will be tremendously appreciated. Many Thanks! Marcelo

    “ID”| “Name”| “Address”

    ---------------------------------------------

    “123”| “John Smith”| “6050 N Washington Ave”

    “345”| “Susan Perez”| “4566 “West” Main Road”

    “677”| “Robert |Champ| Izzo” | “344 Thorn Street East”

  • one option which might work for you is to just import it as a raw file. in this case you won't have any delimiters, and it will just be one big set of text per line.

    then, you can use some data transformations to "fix" your file into a more usable format. for example, you could do something like replace all instances of "|" with some other random non-repeatable delimiter, like say <>. then, find all instances of |, and replace it with something else, like say **. then replace back the <> with "|", and then save it back to a text file.

    then you do another import, using the flat file manager, and the text qualifier of " and seperator of |.

    after the import, you replace the ** within each column, with the | that it should have.

    it's convoluted i know, but it probably will work, and will allow you to automate everything nicely.

  • thank you for the tip. the solution seems logical and doable. I'll give it a shot and see how it goes. my concern is that I might still run into issues if I don't pick a unique delimiter as a replacement. will see. I'll post back when I have something in concrete.

    thanks again!

  • Chelo (8/11/2011)


    Hi Jeff- I realize this is an old post, but the problem's current. :hehe:

    The issue I have is similar to others. I received .txt files with the pipe or vertical line as the field delimiter, and double quotes as the text qualifier, and I need to upload these files in a new db in SQL 2008. Below is an example of my dilemma. I tried a few ways but none helped. In the example below, the second line will be a problem because of the present double quotes in the address field. The last examples has the pipes embedded in the field. When files are relatively small I open them, I correct the issue, and then I try importing again. But when we talk about millions of records (30 to 50 m), then I'm out of luck. Not to mention that some .txt files may contain a "Comments", "Description", or "Notes" fields with a max number of char holding long texts. Inevitably, you'll find quotes, commas, pipe lines (usually in email signatures from emails that have been copied/pasted), etc. At any rate, any help will be tremendously appreciated. Many Thanks! Marcelo

    “ID”| “Name”| “Address”

    ---------------------------------------------

    “123”| “John Smith”| “6050 N Washington Ave”

    “345”| “Susan Perez”| “4566 “West” Main Road”

    “677”| “Robert |Champ| Izzo” | “344 Thorn Street East”

    Nope... if you use a BCP format file, this is not a problem. I create a demonstration tonight when I get home from work and after evening chores.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • That'd be extremely helpful! Thank you!

    I'm just reading more and getting acquainted with the BCP utility, as I believe I'll need to use it often.

    m@rcelo

  • hi! I tried your suggestion on a small tbl, and worked well. However, I got some issues on bigger tlbs (50+ million records). I'm trying other options for those biggie ones. thx!

  • hi Jeff- so I tried BCP using a format file in xml, but I keep getting the same truncation error, which I believe it is still due to the embedded delimiters within the data set. attached are the content of the xml and the error. oh, by the way, I'm running a bulk import from sql to load the data set an xml format file. any advice?! thank you!

  • I sure am sorry. :blush: I lost track of this post. Are you all set or do you still need help?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi there!

    Yes please, I would love some help!

    I've put together a format file to read and upload a sample txt file, but still have trouble with it.

    The issues I encountered aren't only with the delimiters, but also with some CR or LF that basically cut a row in 1/2, sending the second 1/2 down to the next line.

    anyhow. I'll send what I've got and will see what we can do...

    thanks much!!

    marcelo

  • I just took a look at your BCP format file... it seems wicked complicated for what we need to do according to the format you explained to look like the following...

    “ID” | “Name” | “Address”

    ---------------------------------------------

    “123” | “John Smith” | “6050 N Washington Ave”

    “345” | “Susan Perez” | “4566 “West” Main Road”

    “677” | “Robert |Champ| Izzo” | “344 Thorn Street East”

    I'll work out a "regular" format file for that (unless you wanted to attach a file with NO personal information, please).

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • It would appear that each of those example lines has a trailing space. Is that true or just a "paste" anomoly?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Crud... BCP format files won't accept "smart quotes" as valid delimiter characters. It'll take regular quotes when you "escape" them with a backslash, but it just won't take the "smart quotes" with or without the escape character.

    Who the heck named those things "smart" anyway??? :crazy:

    No problem. There is a fairly easy work around without all that XML "mumbo-jumbo". 😀

    Using the data that you provided (as below complete with a trailing space) as a file called Chelo.txt in the C:\Temp directory on the server, this code does the job. Notice that I always load data to a staging table where the data can be "gleaned'n'cleaned" (ie: validated) before it goes to the final table. As always, the commented details are in the code.

    "ID" | "Name" | "Address"

    "123" | "John Smith" | "6050 N Washington Ave"

    "345" | "Susan Perez" | "4566 "West" Main Road"

    "677" | "Robert |Champ| Izzo" | "344 Thorn Street East"

    --===== Conditionally drop the staging table to make

    -- reruns in SSMS easier.

    IF OBJECT_ID('tempdb..#Staging','U') IS NOT NULL

    DROP TABLE #Staging

    ;

    --===== Create the staging table.

    -- Note that all columns are

    -- VARCHAR so we can check for

    -- things like IsAllDigits where

    -- appropriate (ie: the ID column).

    CREATE TABLE #Staging

    (

    ID VARCHAR(10),

    Name VARCHAR(100),

    Address VARCHAR(100)

    )

    ;

    --===== Do the BULK INSERT

    -- This will leave a "smart quote" in the first column.

    -- Note the "error file". It will move MAXERRORS

    -- number of error rows to a separate file for "rework"

    -- if necessary.

    BULK INSERT #Staging

    FROM 'C:\Temp\Chelo.txt'

    WITH (

    FIELDTERMINATOR = '” | “',

    ROWTERMINATOR = '” ',

    BATCHSIZE = 2000000,

    CODEPAGE = 'RAW',

    DATAFILETYPE = 'char',

    FIRSTROW = 2,

    MAXERRORS = 5000,

    ERRORFILE = 'C:\Temp\Chelo.err'

    )

    ;

    --===== Get rid of the leading "smart quote"

    UPDATE #Staging

    SET ID = STUFF(ID,1,1,'')

    ;

    --===== Show what we have

    SELECT * FROM #Staging

    ;

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 16 through 30 (of 33 total)

You must be logged in to reply to this topic. Login to reply