Delimiters - Importing Data - SQL Server Import Wizard

  • Hello,

    I have an issue trying to import oil and gas data from a magnetic tape into SQL Server. At this point all I want to do is import into a staging table and then I will absorb the information accordingly. The whole reason for this is that just importing the data (at least for me) has become impossible. I have tried everything from OPENROWSET, BULK INSERT, and the SQL Server import wizard to just get the data into the staging table but I'm discovering it might not be possible.

    The data was converted into a ASCII text file and so I have been using the usual Flat File Source in the SQL Server Import Wizard. There are no headers in the file. The Column delimiter is {CR}{LF} and the Row delimiter is {CR}{LF}01. Yes, it is {CR}{LF}01 (a zero and a one - number). Now because this delimiter is not shown in the typical drop down I have tried to find a number of workarounds. I have tried everything in the import wizard under row delimiter, {CR}{LF}01, {CR}{LF}{01}, {CR}{LF}<01>, {CRLF01}, etc. Nothing works. So, I tried OPENROWSET and BULKINSERT, but each failed because of errors related to truncation and random errors. Interestingly, the closest I have gotten to a solution is with the import wizard, however because I cannot delimit by a row (because it does not fall in the drop down - {CR}{LF}01 cannot be recognized or any combination herein) I have put everything into a row with one column. This would work except SQL Server ORDERS THE DATA!!! So I can no longer tell which data is for which well! I have tried to find ways to turn this functionality off, but with no luck.

    So, all in all, I can't delimit by unique identifier nor can I turn off the sort function inherent to the import wizard. At this point I think it is impossible to even import a text file with unique delimiters into SQL Server which is a shame because the text file is more than 10 GBs in size so any ordinary program cannot handle it.

    However, I am pinning my hope on someone knowing a way to designate a row delimiter that deviates from the list in the SQL Server Import Wizard (right click on database, select tasks, import data, etc.). If there is a better way, I am more than happy to invesitgate.

    Any help would be much appreciated.

    Kevin

  • Despite the whacko delimiters, importing this will be a breeze (looks like the ol' AT&T junk I used to have to load from tape). You just can't use (IIRC) a bloody wizard in SQL Server to do it. 🙂

    You might even be able to do this without a BCP format file. You just need to take a look at how to designate multi-character ASCII character delimiters in "Books Online" for both the column and row.

    If you want me to take a shot at it and the file doesn't have anything proprietary, PCI, or PII in it, attach the file and we'll give it a go for you.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Further on Jeff's reply, here is a quick sample on how this can be done by importing the file as a single lob and parse it afterwards. Mind you that this is not a complete solution, rather a demonstration of a collection of techniques.

    😎

    USE tempdb;

    GO

    SET NOCOUNT ON;

    IF OBJECT_ID(N'dbo.TBL_SAMPLE_IMPORT') IS NOT NULL DROP TABLE dbo.TBL_SAMPLE_IMPORT;

    CREATE TABLE dbo.TBL_SAMPLE_IMPORT

    (

    SIMP_ID INT IDENTITY(1,1) NOT NULL CONSTRAINT PK_DBO_TBL_SAMPLE_IMPORT_SIMP_ID PRIMARY KEY CLUSTERED

    ,SIMP_TXT VARCHAR(MAX) NOT NULL

    ,CLEAN_TXT AS (REPLACE(REPLACE(SIMP_TXT,CHAR(13) + CHAR(10) + '01',CHAR(9)),CHAR(13) + CHAR(10),CHAR(124)))

    );

    /* Create sample data, this section would be replaced by a blob import

    */

    DECLARE @SAMPLE_SIZE INT = 100;

    DECLARE @COL_COUNT INT = 10;

    DECLARE @NL CHAR(2) = CHAR(13) + CHAR(10)

    DECLARE @RDLM CHAR(2) = '01';

    DECLARE @IMPORT_TXT VARCHAR(MAX) = '';

    ;WITH T(N) AS (SELECT N FROM (VALUES (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) AS X(N))

    , NUMS(N) AS (SELECT TOP(@SAMPLE_SIZE) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N FROM T T1,T T2,T T3,T T4,T T5,T T6,T T7,T T8,T T9)

    SELECT

    @IMPORT_TXT = (

    SELECT

    CONVERT(VARCHAR(20),ABS(CHECKSUM(NEWID())),0)

    + @NL

    + CASE

    WHEN NM.N % @COL_COUNT = 0 THEN @RDLM

    ELSE ''

    END

    FROM NUMS NM

    FOR XML PATH(''), TYPE).value('.[1]','VARCHAR(MAX)')

    INSERT INTO dbo.TBL_SAMPLE_IMPORT(SIMP_TXT) VALUES(@IMPORT_TXT);

    ;WITH IMPORT_ROWS AS

    (

    SELECT

    SI.SIMP_ID

    ,CR.ItemNumber

    ,CR.Item

    FROM dbo.TBL_SAMPLE_IMPORT SI

    CROSS APPLY dbo.DelimitedSplit8K(SI.CLEAN_TXT,CHAR(9)) AS CR

    )

    SELECT

    IR.SIMP_ID

    ,IR.ItemNumber

    ,MAX(CASE WHEN CX.ItemNumber = 1 THEN CX.Item END) AS COL_01

    ,MAX(CASE WHEN CX.ItemNumber = 2 THEN CX.Item END) AS COL_02

    ,MAX(CASE WHEN CX.ItemNumber = 3 THEN CX.Item END) AS COL_03

    ,MAX(CASE WHEN CX.ItemNumber = 4 THEN CX.Item END) AS COL_04

    ,MAX(CASE WHEN CX.ItemNumber = 5 THEN CX.Item END) AS COL_05

    ,MAX(CASE WHEN CX.ItemNumber = 6 THEN CX.Item END) AS COL_06

    ,MAX(CASE WHEN CX.ItemNumber = 7 THEN CX.Item END) AS COL_07

    ,MAX(CASE WHEN CX.ItemNumber = 8 THEN CX.Item END) AS COL_08

    ,MAX(CASE WHEN CX.ItemNumber = 9 THEN CX.Item END) AS COL_09

    ,MAX(CASE WHEN CX.ItemNumber = 10 THEN CX.Item END) AS COL_10

    ,MAX(CASE WHEN CX.ItemNumber = 11 THEN CX.Item END) AS COL_11

    ,MAX(CASE WHEN CX.ItemNumber = 12 THEN CX.Item END) AS COL_12

    FROM IMPORT_ROWS IR

    CROSS APPLY dbo.DelimitedSplit8K(IR.Item,CHAR(124)) AS CX

    GROUP BY IR.SIMP_ID,IR.ItemNumber;

    Sample result

    SIMP_ID ItemNumber COL_01 COL_02 COL_03 COL_04 COL_05 COL_06 COL_07 COL_08 COL_09 COL_10 COL_11 COL_12

    ----------- -------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------

    1 1 929921805 2020602585 1065847538 241221017 990534388 644120748 2104993075 286982861 1525503854 220985065 NULL NULL

    1 2 1133067752 432352254 1745207466 1282917181 1313391887 1206045469 1694470956 1756981457 2064316189 524082013 NULL NULL

    1 3 1829250780 436865283 406020161 2015567106 463316644 203813455 2061685011 1677858413 1921373429 644549343 NULL NULL

    1 4 2071331924 767648522 1100157458 904207050 1878553767 76627253 349316736 1834849514 914881040 1238774505 NULL NULL

    1 5 71230629 173708088 732584232 369731448 1345573226 375238821 830029406 556165778 1557643771 568968194 NULL NULL

    1 6 1411049181 1730144612 547081174 742178146 346916586 578214453 492699858 929319859 2107054202 950744183 NULL NULL

    1 7 1135197734 758561329 27889184 29010534 1599607931 532487089 542348291 494257327 858475319 327263803 NULL NULL

    1 8 1628056451 858110646 1085099211 2139886901 1803339571 328486886 864659129 1556563022 2002363098 18765239 NULL NULL

    1 9 2118877902 387958293 1251505054 181742052 522281616 1420955802 745290310 335967939 1747482566 1286423121 NULL NULL

    1 10 979458209 2123681835 842058993 572248005 804948703 1982543984 2066990261 2037719921 692576816 274410813 NULL NULL

    1 11 NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL

  • Jeff & Eirikur,

    Thanks for the suggestions. I was able to find a workaround. Though not a elegant solution, I used a program called HxD to open the text file and replaced all <CR><LF>01 hexadecimal codes with my own delimiter that I thought of (a combination of *|*|). Then I went into the import wizard and specified this as a delimiter and was able to get everything imported successfully into a staging table.

    Thanks again for all the help.

    Kevin

  • Is this a one-off exercise or something that will need to be done on a regular basis, Kevin?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • This is more of a one-off exercise.

    Kevin

  • You're probably good to go then. If it were something that needed to be done on a regular basis, we likely could have hammered it out with BULK INSERT using multi-character delimiters. If that turns out to be the case in the future (like what happens so often to one-offs), c'mon back and we'll take another shot at it together.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply