Bulk Insert format file but input file not always fixed

  • I am trying to do a bulk insert using a non-xml file format. When my input file exactly matches my format file the bulk insert works fine. I have data that is older and was not written in the exact same manner as the newer input data. In the case of this older data, the last field may either not be there at all or be shorter than the allotted characters. This causes an unexpected end of file error. Here is an example of what I mean:

    Format file:

    8.0

    3

    1 sqlchar 0 5 "" 1 recid ""

    2 sqlchar 0 5 "" 2 testfield ""

    3 sqlchar 0 0 "/r/n" 0 skipfield ""

    The data that works is

    1 test1

    2 test2

    The data that does not work is

    1 test

    2 tes

    another one that does not work (the first field is 5 characters with nothing for the second field)

    1

    2

    Without creating a format file for each case, is there a way to have the bulk insert work with the variable length in the last field?

  • jdixon-586320 (6/20/2013)


    I am trying to do a bulk insert using a non-xml file format. When my input file exactly matches my format file the bulk insert works fine. I have data that is older and was not written in the exact same manner as the newer input data. In the case of this older data, the last field may either not be there at all or be shorter than the allotted characters. This causes an unexpected end of file error. Here is an example of what I mean:

    Format file:

    8.0

    3

    1 sqlchar 0 5 "" 1 recid ""

    2 sqlchar 0 5 "" 2 testfield ""

    3 sqlchar 0 0 "/r/n" 0 skipfield ""

    The data that works is

    1 test1

    2 test2

    The data that does not work is

    1 test

    2 tes

    another one that does not work (the first field is 5 characters with nothing for the second field)

    1

    2

    Without creating a format file for each case, is there a way to have the bulk insert work with the variable length in the last field?

    Unfortunately, BCP was written to work with very well formed and consistent data formats. Even the "skip rows" part of BCP requires that the rows being skipped be properly formatted with the same number and types of delimiters. As you're finding out, it REALLY doesn't like ragged right formats especially when fixed field lengths are being used.

    The way I get around such a thing is to import the lines/rows as blobs (BULK INSERT works very well here but so does BCP without a format file) and then use substring/CAST/CONVERT to do the parsing. Using your test data, something like the following will handle the ragged right fixed-field problem quite well.

    SELECT RecID = CAST(SUBSTRING(d.String,1,5) AS INT)

    , TestField = RTRIM(SUBSTRING(d.String,6,5))

    FROM

    (--==== This would be your staging table instead of test data

    SELECT '1 test1' UNION ALL

    SELECT '2 test2' UNION ALL

    SELECT '3 test' UNION ALL

    SELECT '4 tes' UNION ALL

    SELECT '5' UNION ALL

    SELECT '6'

    )d(String)

    ;

    Here's the output:

    RecID TestField

    ----------- ---------

    1 test1

    2 test2

    3 test

    4 tes

    5

    6

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 2 posts - 1 through 1 (of 1 total)

You must be logged in to reply to this topic. Login to reply