How to efficiently import data in .rpt format into SQL 2008

  • Ok... here we go...

    The first thing that I did was to download your data import file and save it to my local machine at C:\Temp\Test.txt.

    The next thing I did was to create a BCP FORMAT FILE that looks like the following and saved it at C:\Temp\Test.fmt. It could be located anywhere but that's where I saved it.

    10.0

    12

    1 SQLCHAR 0 0 "" 0 ID ""

    2 SQLCHAR 0 12 "" 2 Field1 ""

    3 SQLCHAR 0 12 "" 3 Field2 ""

    4 SQLCHAR 0 17 "" 4 Field3 ""

    5 SQLCHAR 0 12 "" 5 Field4 ""

    6 SQLCHAR 0 12 "" 6 Field5 ""

    7 SQLCHAR 0 21 "" 7 Field6 ""

    8 SQLCHAR 0 21 "" 8 Field7 ""

    9 SQLCHAR 0 21 "" 9 Field8 ""

    10 SQLCHAR 0 21 "" 10 Field9 ""

    11 SQLCHAR 0 21 "" 11 Field10 ""

    12 SQLCHAR 0 99 "\r\ n" 12 Field11 "" --NOTE!! TAKE THE SPACE OUT FROM BETWEEN THE \ and the n.

    There are a couple of things to notice here. The format line marked as "1" in the first column isn't actually needed. The length is "0" (4th column) and the target table column (the 6th column) is also marked as "0" to effectively skip the ID IDENTITY column. The reason I left it in is as a reminder to whomever reads it that there is an identity column.

    All of the fields in the data file, except the last one, all need to be treated as fixed field fields. That's why each of them have no delimiter between the quotes (5th column) and they have the given field width in bytes.

    As you pointed out, the last column doesn't behave as if it's fixed field. A "0" in the data will have a different length than a "10", "100", or whatever. So, we have to treat that field (as I previously suggested) as if it's a variable length field (because it is). To do that, we need to use a delimiter to mark the end of the field for importing. The only delimiter available is the end of row delimiter, which is the \ r \ n thing. [font="Arial Black"]Note that this forum won't actually let you use \ n with no space so I had to inject one in the format file above. Be sure to remove that.[/font]

    After that, I ran the following code and it imported all 451 rows correctly (including that last field) before I could get my finger off the {f5} button.

    BULK INSERT dbo.LogDB

    FROM 'C:\Temp\Test.Txt' --<----<<<< LOOK! You'll likely need to change the path/filename of the data file here

    WITH (

    CODEPAGE = 'RAW'

    ,DATAFILETYPE = 'char'

    ,FORMATFILE = 'C:\Temp\Test.fmt' --<----<<<< LOOK! You'll likely need to change the path/filename of the format file here

    ,TABLOCK

    )

    ;

    If you need a way to do this without the format file, lemme know. It's a bit different but can still be done if really, really needed.

    If I had a lot of differently formatted files to do this with, I'd write a little T-SQL to automatically determine the structure of the data file, create the necessary CREATE TABLE statement, create the BCP FORMAT FILE, etc, etc. Most people don't like it though because I use xp_CmdShell to pull off such minor miracles. You could also have it do it all without a format file if you don't mind using dynamic SQL, which also makes some people cringe. Yes, you have to be a little careful and protect against SQL Injection, etc, but it can work a real treat without having to write managed code or run something outside of a stored procedure like PowerShell.

    As Graham pointed out, you could also do this pretty easily (well, kind of) through SSIS but I don't use it so someone else would have to help you there.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thank you so much Jeff,

    BUT, the first line of your script throws me an error:

    Msg 9422, Level 16, State 48, Line 1

    XML parsing: line 2, character 0, incorrect document syntax

  • I found the root cause to the error, there is nothing wrong with you code but in fact when I save the format file, the UltraEdit automatically added some prefix (\uFEFF) to the begining of the file, which cause the xml failed. after I noticed it and remove it, everything is working like a charm.

    Thank you again for your time, I really appreciate.:Wow::Wow::Wow::Wow::Wow:

  • Make sure the format file is encoded as ANSI, not unicode.

    You can do this using notepad.exe - just open the file, then go to Save As... and check the Encoding. If it doesn't say ANSI, change it to ANSI and save.

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • For the little test data file, the script and the format file work like charming and finished in less than 1 second.

    I decided to go further, so I started to load the real data file 1 which is 400MB (2437246 rows), unfortunately it failed with no single row imported.

    Here is the error message:

    Msg 4864, Level 16, State 1, Line 1

    Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 1, column 1 (UserID).

    Msg 4864, Level 16, State 1, Line 1

    Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 2437247, column 1 (UserID).

    Msg 4832, Level 16, State 1, Line 1

    Bulk load: An unexpected end of file was encountered in the data file.

    Msg 7399, Level 16, State 1, Line 1

    The OLE DB provider "BULK" for linked server "(null)" reported an error. The provider did not give any information about the error.

    Msg 7330, Level 16, State 2, Line 1

    Cannot fetch a row from OLE DB provider "BULK" for linked server "(null)".

    Since the script ends up no row imported, I am wondering how "bulk insert" is handling this, does it suck all data in one time into memory? if not, the error message seems indicating it reached the last row (2437247). The error message also leads me thinking the bulk insert didn't recognize the end of file in this 400M file although no problem with the test file.

    Any thought?

  • as I suspect if the EOF was recognized by bulk insert, I compare the two data files (the larger file costs me long time to load and even longer time to move to the end of the file to check the hex code) and I don't see difference in terms of the following findings:

    Each row ends with 0D 0A

    Each file ends with 0D 0A 0D 0A and then followed by (xxx row(s) affected ) and 0D 0A 0D 0A

  • halifaxdal (7/21/2014)


    I found the root cause to the error, there is nothing wrong with you code but in fact when I save the format file, the UltraEdit automatically added some prefix (\uFEFF) to the begining of the file, which cause the xml failed. after I noticed it and remove it, everything is working like a charm.

    Thank you again for your time, I really appreciate.:Wow::Wow::Wow::Wow::Wow:

    You bet. Thanks for the feedback.

    By the way, that prefix doesn't actually have anything to do with XML. Think of it as a "Unicode file prefix". It actually means something else specifically and I can't think of the precise meaning but that's mostly what it boils down to.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • halifaxdal (7/21/2014)


    as I suspect if the EOF was recognized by bulk insert, I compare the two data files (the larger file costs me long time to load and even longer time to move to the end of the file to check the hex code) and I don't see difference in terms of the following findings:

    Each row ends with 0D 0A

    Each file ends with 0D 0A 0D 0A and then followed by (xxx row(s) affected ) and 0D 0A 0D 0A

    BULK INSERT doesn't handle such a thing very well for some reason even if you have error capturing turned on. Are you allowed to use xp_CmdShell? I ask because the error handling of BCP does handle such things with the greatest of ease. They'll be reported as errors but the whole file will, in fact, still load.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (7/21/2014)


    halifaxdal (7/21/2014)


    as I suspect if the EOF was recognized by bulk insert, I compare the two data files (the larger file costs me long time to load and even longer time to move to the end of the file to check the hex code) and I don't see difference in terms of the following findings:

    Each row ends with 0D 0A

    Each file ends with 0D 0A 0D 0A and then followed by (xxx row(s) affected ) and 0D 0A 0D 0A

    BULK INSERT doesn't handle such a thing very well for some reason even if you have error capturing turned on. Are you allowed to use xp_CmdShell? I ask because the error handling of BCP does handle such things with the greatest of ease. They'll be reported as errors but the whole file will, in fact, still load.

    No, sp like like xp_CmdShell is not allowed.

    How come your bulk insert picks up the EOF in the first and smaller size data file but stucks in the second and larger file? and I hate to see that no any rows loaded after hit the exception in the very last xxxxxxxxth row. :crazy:

  • halifaxdal (7/21/2014)


    Jeff Moden (7/21/2014)


    halifaxdal (7/21/2014)


    as I suspect if the EOF was recognized by bulk insert, I compare the two data files (the larger file costs me long time to load and even longer time to move to the end of the file to check the hex code) and I don't see difference in terms of the following findings:

    Each row ends with 0D 0A

    Each file ends with 0D 0A 0D 0A and then followed by (xxx row(s) affected ) and 0D 0A 0D 0A

    BULK INSERT doesn't handle such a thing very well for some reason even if you have error capturing turned on. Are you allowed to use xp_CmdShell? I ask because the error handling of BCP does handle such things with the greatest of ease. They'll be reported as errors but the whole file will, in fact, still load.

    No, sp like like xp_CmdShell is not allowed.

    How come your bulk insert picks up the EOF in the first and smaller size data file but stucks in the second and larger file? and I hate to see that no any rows loaded after hit the exception in the very last xxxxxxxxth row. :crazy:

    The smaller Text.txt file you attached has no such anomalies. I'll try to recreate the problem and see what happens and if I can come up with a fix because BULK INSERT used to be much more robust in areas like this.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (7/21/2014)


    Think of it as a "Unicode file prefix". It actually means something else specifically and I can't think of the precise meaning but that's mostly what it boils down to.

    It is Unicode, it is the byte order mark indicating endian (byte order) and UTF encoding

    Far away is close at hand in the images of elsewhere.
    Anon.

  • David Burrows (7/22/2014)


    Jeff Moden (7/21/2014)


    Think of it as a "Unicode file prefix". It actually means something else specifically and I can't think of the precise meaning but that's mostly what it boils down to.

    It is Unicode, it is the byte order mark indicating endian (byte order) and UTF encoding

    Heh... I finally get something right and I still doubt myself. Thanks for the confirmation, David.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (7/22/2014)


    David Burrows (7/22/2014)


    Jeff Moden (7/21/2014)


    Think of it as a "Unicode file prefix". It actually means something else specifically and I can't think of the precise meaning but that's mostly what it boils down to.

    It is Unicode, it is the byte order mark indicating endian (byte order) and UTF encoding

    Heh... I finally get something right and I still doubt myself. Thanks for the confirmation, David.

    Hi Jeff, you have new script/format for me to resolve the EOF issue in big data file?

  • halifaxdal (7/22/2014)


    Jeff Moden (7/22/2014)


    David Burrows (7/22/2014)


    Jeff Moden (7/21/2014)


    Think of it as a "Unicode file prefix". It actually means something else specifically and I can't think of the precise meaning but that's mostly what it boils down to.

    It is Unicode, it is the byte order mark indicating endian (byte order) and UTF encoding

    Heh... I finally get something right and I still doubt myself. Thanks for the confirmation, David.

    Hi Jeff, you have new script/format for me to resolve the EOF issue in big data file?

    Heh... actually, I'm on vacation. Is there any chance of you zipping up and attaching a complete file that I could work on instead of me generating one?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • For the 400M file, I remove the first two rows (non data), I also added BULKSIZE = 200 to the Bulk Insert, and I was able to load all the data until it hits the last row which is also not a data row.

    But I can use UltraEdit to open the 400M data file and remove the first two rows, I don't want to try to open the 4G data, I can imagine the system will hang there forever. I believe the bcp or Bulk Insert must be able to handle situations like this.

    And sorry I can't upload the data file anywhere: 1. it is against my company's policy; 2. it is too big

    Thank you, any clue is appreciated.

  • Viewing 15 posts - 31 through 45 (of 59 total)

    You must be logged in to reply to this topic. Login to reply