Query conversion error from phantom data

  • Running 2014 SP1-GDR (KB4019091) in compatibility_level 100, I am getting repeated conversion errors from data that just isn't there.

    Our data vendor sends data for 14 distinct tables in a single delimited text file, which I load into a table that . These tables have varying numbers of fields, from 11 up to 58, so I used Jeff Moden's DelimitedSplit8K to create a table to use for loading the 14 tables.

    Import Table: CREATE TABLE [dbo].[ImportData](
        [InsertValues] [varchar](3500) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL, --Largest possible row is 3,500
        [ImportRow] [int] IDENTITY(1,1) NOT NULL
    ) ON [PRIMARY]

    Split Table: CREATE TABLE [dbo].[Split](
        [ImportRow] [int] NOT NULL,
        [FieldNumber] [int] NOT NULL,
        [FieldData] [varchar](1024) COLLATE SQL_Latin1_General_CP1_CI_AS NULL --Largest field is 1,024
    ) ON [PRIMARY]

    Splitting process: WHILE @processRow<=@processTotal
    BEGIN
        insert dbo.Split(ImportRow, FieldNumber, FieldData)
        SELECT ImpData.ImportRow, split.ItemNumber, nullif(left(split.Item,1024),'')
         FROM dbo.ImportData ImpData
         CROSS APPLY dbo.DelimitedSplit8K(ImpData.InsertValues,@fieldDelimiter) split
         where ImpData.ImportRow between @processRow and @processRow+@processIncrement-1;

        set @processRow=@processRow+@processIncrement;
    Example table load: insert dbo.Surveys(UniqueRowIdentifier, Domain, FirstName, LastName, DateOfBirth, ExternalPersonID, ExternalVendorID, SurveyName, SurveyQuestion, SurveyAnswer, DateOfAnswer, LoadDate)
    select UniqueRowIdentifier=Field1.FieldData
    , Domain=left(Field2.FieldData,18)
    , FirstName=left(Field3.FieldData,32)
    , LastName=left(Field4.FieldData,32)
    , DateOfBirth=case when convert(int,left(Field5.FieldData,2))<18 then '17530101' else Field5.FieldData end --Correcting an infrequently observed date issue
    , ExternalPersonID=left(Field6.FieldData,64)
    , ExternalVendorID=left(Field7.FieldData,64)
    , SurveyName=left(Field8.FieldData,64)
    , SurveyQuestion=left(Field9.FieldData,256)
    , SurveyAnswer=left(Field10.FieldData,128)
    , DateOfAnswer=case when convert(int,left(Field11.FieldData,2))<18 then null else Field11.FieldData end
    , LoadDate=@rundate
    from dbo.Split Field1 join dbo.Split Field2 on Field1.ImportRow=Field2.ImportRow and Field1.FieldNumber=1 and Field2.FieldNumber=2
    join dbo.Split Field3 on Field1.ImportRow=Field3.ImportRow and Field3.FieldNumber=3
    join dbo.Split Field4 on Field1.ImportRow=Field4.ImportRow and Field4.FieldNumber=4
    join dbo.Split Field5 on Field1.ImportRow=Field5.ImportRow and Field5.FieldNumber=5
    join dbo.Split Field6 on Field1.ImportRow=Field6.ImportRow and Field6.FieldNumber=6
    join dbo.Split Field7 on Field1.ImportRow=Field7.ImportRow and Field7.FieldNumber=7
    join dbo.Split Field8 on Field1.ImportRow=Field8.ImportRow and Field8.FieldNumber=8
    join dbo.Split Field9 on Field1.ImportRow=Field9.ImportRow and Field9.FieldNumber=9
    join dbo.Split Field10 on Field1.ImportRow=Field10.ImportRow and Field10.FieldNumber=10
    join dbo.Split Field11 on Field1.ImportRow=Field11.ImportRow and Field11.FieldNumber=11
    where Field2.FieldData='Survey' or Field2.FieldData='Surveys';

    The largest files I have received have been 75M rows, with most being 1.5M. Experimentation on run time and database space led to me using batch sizes of 100,000.

    The issue I have run into after making the process into a stored procedure is that sometimes it will throw a data type conversion error for rows that aren't there when I load my tables. The errors repeat when I run the same code in SSMS, regardless of whether I leave the conversions implicit or explicitly use convert(). I found through experimenting that I could get it to succeed if I:
         A) added criteria to limit the number of rows selected, e.g. "ImportRow>x" or
         B) if I moved the specific table's data to a temp table in one batch and imported it in a second batch, but the error would still strike if the temp table was created in the same batch.

    I have enough logging built into the stored procedure that I can go to SSMS and complete the load, but I would far prefer to prevent the error. How would I do that? What is the conversion error on phantom data even coming from?


    Puto me cogitare, ergo puto me esse.
    I think that I think, therefore I think that I am.

  • If the fields aren't there (i.e Field4), then the entire row won't return.  You can use a Left join instead of a Join to return missing rows and then handle the data with an IsNull.  The Data you supplied doesn't help in writing a query for you

    For better, quicker answers, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Is there a clear delineation of the data for each table in the single file?

  • This may work better in SSIS.  You can read the file one line at a time and direct each line to the proper table in a script task.

    A quick web search turned up this page on reading a delimited text file with multiple record formats and distributing the lines to separate outputs.
    https://www.timmitchell.net/post/2017/08/22/using-the-ssis-script-component-with-multiple-outputs/

    It handles three formats and you've got 14, so you have a lot more work to do.  It also uses three flat file destinations, where you should instead have 14 SQL destinations.  But I think a package with 14 separate destinations all doing Bulk Insert operations would go through 75 million input rows much faster than the way you are attempting to do it with T-SQL.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply