Query conversion error from phantom data

Question

Query conversion error from phantom data

srienstr

SSCrazy

Points: 2435
More actions
December 14, 2018 at 8:18 am

#378804

Running 2014 SP1-GDR (KB4019091) in compatibility_level 100, I am getting repeated conversion errors from data that just isn't there.
Our data vendor sends data for 14 distinct tables in a single delimited text file, which I load into a table that . These tables have varying numbers of fields, from 11 up to 58, so I used Jeff Moden's DelimitedSplit8K to create a table to use for loading the 14 tables.
Import Table: CREATE TABLE [dbo].[ImportData](     [InsertValues] [varchar](3500) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL, --Largest possible row is 3,500     [ImportRow] [int] IDENTITY(1,1) NOT NULL ) ON [PRIMARY]
Split Table: CREATE TABLE [dbo].[Split](     [ImportRow] [int] NOT NULL,     [FieldNumber] [int] NOT NULL,     [FieldData] [varchar](1024) COLLATE SQL_Latin1_General_CP1_CI_AS NULL --Largest field is 1,024 ) ON [PRIMARY]
Splitting process: WHILE @processRow<=@processTotal BEGIN     insert dbo.Split(ImportRow, FieldNumber, FieldData)     SELECT ImpData.ImportRow, split.ItemNumber, nullif(left(split.Item,1024),'')      FROM dbo.ImportData ImpData      CROSS APPLY dbo.DelimitedSplit8K(ImpData.InsertValues,@fieldDelimiter) split      where ImpData.ImportRow between @processRow and @processRow+@processIncrement-1;
    set @processRow=@processRow+@processIncrement;
Example table load: insert dbo.Surveys(UniqueRowIdentifier, Domain, FirstName, LastName, DateOfBirth, ExternalPersonID, ExternalVendorID, SurveyName, SurveyQuestion, SurveyAnswer, DateOfAnswer, LoadDate) select UniqueRowIdentifier=Field1.FieldData , Domain=left(Field2.FieldData,18) , FirstName=left(Field3.FieldData,32) , LastName=left(Field4.FieldData,32) , DateOfBirth=case when convert(int,left(Field5.FieldData,2))<18 then '17530101' else Field5.FieldData end --Correcting an infrequently observed date issue , ExternalPersonID=left(Field6.FieldData,64) , ExternalVendorID=left(Field7.FieldData,64) , SurveyName=left(Field8.FieldData,64) , SurveyQuestion=left(Field9.FieldData,256) , SurveyAnswer=left(Field10.FieldData,128) , DateOfAnswer=case when convert(int,left(Field11.FieldData,2))<18 then null else Field11.FieldData end , LoadDate=@rundate from dbo.Split Field1 join dbo.Split Field2 on Field1.ImportRow=Field2.ImportRow and Field1.FieldNumber=1 and Field2.FieldNumber=2 join dbo.Split Field3 on Field1.ImportRow=Field3.ImportRow and Field3.FieldNumber=3 join dbo.Split Field4 on Field1.ImportRow=Field4.ImportRow and Field4.FieldNumber=4 join dbo.Split Field5 on Field1.ImportRow=Field5.ImportRow and Field5.FieldNumber=5 join dbo.Split Field6 on Field1.ImportRow=Field6.ImportRow and Field6.FieldNumber=6 join dbo.Split Field7 on Field1.ImportRow=Field7.ImportRow and Field7.FieldNumber=7 join dbo.Split Field8 on Field1.ImportRow=Field8.ImportRow and Field8.FieldNumber=8 join dbo.Split Field9 on Field1.ImportRow=Field9.ImportRow and Field9.FieldNumber=9 join dbo.Split Field10 on Field1.ImportRow=Field10.ImportRow and Field10.FieldNumber=10 join dbo.Split Field11 on Field1.ImportRow=Field11.ImportRow and Field11.FieldNumber=11 where Field2.FieldData='Survey' or Field2.FieldData='Surveys';
The largest files I have received have been 75M rows, with most being 1.5M. Experimentation on run time and database space led to me using batch sizes of 100,000.
The issue I have run into after making the process into a stored procedure is that sometimes it will throw a data type conversion error for rows that aren't there when I load my tables. The errors repeat when I run the same code in SSMS, regardless of whether I leave the conversions implicit or explicitly use convert(). I found through experimenting that I could get it to succeed if I:
A) added criteria to limit the number of rows selected, e.g. "ImportRow>x" or
   B) if I moved the specific table's data to a temp table in one batch and imported it in a second batch, but the error would still strike if the temp table was created in the same batch.
I have enough logging built into the stored procedure that I can go to SSMS and complete the load, but I would far prefer to prevent the error. How would I do that? What is the conversion error on phantom data even coming from?
Puto me cogitare, ergo puto me esse.
I think that I think, therefore I think that I am.

Viewing 4 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply

Mike01 SSChampion Points: 11420 More actions · Answer 1

If the fields aren't there (i.e Field4), then the entire row won't return. You can use a Left join instead of a Join to return missing rows and then handle the data with an IsNull. The Data you supplied doesn't help in writing a query for you

For better, quicker answers, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 2

Is there a clear delineation of the data for each table in the single file?

Scott Coleman One Orange Chip Points: 27452 More actions · Answer 3

This may work better in SSIS. You can read the file one line at a time and direct each line to the proper table in a script task.

A quick web search turned up this page on reading a delimited text file with multiple record formats and distributing the lines to separate outputs.
https://www.timmitchell.net/post/2017/08/22/using-the-ssis-script-component-with-multiple-outputs/

It handles three formats and you've got 14, so you have a lot more work to do. It also uses three flat file destinations, where you should instead have 14 SQL destinations. But I think a package with 14 separate destinations all doing Bulk Insert operations would go through 75 million input rows much faster than the way you are attempting to do it with T-SQL.