Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

Null records being Inserted during Import of CSV file Expand / Collapse
Author
Message
Posted Friday, September 6, 2013 10:12 AM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Monday, July 21, 2014 4:33 PM
Points: 1,296, Visits: 1,812
I know very little about SSIS, so bear with me.

I have a SSIS package that inserts records into a 'staging' table from a CSV file (loading one CSV file per package execution).

The 1st production file loaded just fine - no issuses. The 2nd file loaded an additional 494,720 NULL rows (all columns for each row had NULLs) in addition to the 18,272 records I wanted to import.

I have a flat file connection (CSV), Code Page: 20127 (US-ASCII), Format: Delimited, Text Qualifier: ", Header Row Delimiter: {CR}{LF}, Header Rows to Skip: 2, Column Delimter: Comma{,}, Data Type: unicode String [DT_WSTR] (for ALL Columns).

I have an OLEDB Connection to my Staging table and I'm using 'fast load'.

What could be different about the 2nd file to cause this? What 'file editor' would I need to use to see said differences? Is there a setting I could use(set) to Not load Null rows?

Any ideas of what could be causing this?

Thanks,
Sqlraider
Post #1492323
Posted Saturday, September 7, 2013 1:17 AM


SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: Monday, July 21, 2014 11:49 PM
Points: 4,973, Visits: 11,660
Sqlraider (9/6/2013)
I know very little about SSIS, so bear with me.

I have a SSIS package that inserts records into a 'staging' table from a CSV file (loading one CSV file per package execution).

The 1st production file loaded just fine - no issuses. The 2nd file loaded an additional 494,720 NULL rows (all columns for each row had NULLs) in addition to the 18,272 records I wanted to import.

I have a flat file connection (CSV), Code Page: 20127 (US-ASCII), Format: Delimited, Text Qualifier: ", Header Row Delimiter: {CR}{LF}, Header Rows to Skip: 2, Column Delimter: Comma{,}, Data Type: unicode String [DT_WSTR] (for ALL Columns).

I have an OLEDB Connection to my Staging table and I'm using 'fast load'.

What could be different about the 2nd file to cause this? What 'file editor' would I need to use to see said differences? Is there a setting I could use(set) to Not load Null rows?

Any ideas of what could be causing this?

Thanks,
Sqlraider


If you open the two files in Notepad++, you should get an idea of what the difference is.

As for avoiding the import of this dodgy data, I would use a Conditional Split component in the data flow to redirect all of the rubbish rows to an unused output - that will filter it out.

Or just filter it out when you process the data in staging.



Help us to help you. For better, quicker and more-focused answers to your questions, consider following the advice in this link.

When you ask a question (and please do ask a question: "My T-SQL does not work" just doesn't cut it), please provide enough information for us to understand its context.
Post #1492479
Posted Saturday, September 7, 2013 2:36 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Tuesday, July 15, 2014 2:27 AM
Points: 27, Visits: 209
Just a small hint - check if the columns in your destination table allows NULLs and change that.
Post #1492484
Posted Monday, September 9, 2013 9:22 AM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Monday, July 21, 2014 4:33 PM
Points: 1,296, Visits: 1,812
Phil Parkin (9/7/2013)
Sqlraider (9/6/2013)
I know very little about SSIS, so bear with me.

I have a SSIS package that inserts records into a 'staging' table from a CSV file (loading one CSV file per package execution).

The 1st production file loaded just fine - no issuses. The 2nd file loaded an additional 494,720 NULL rows (all columns for each row had NULLs) in addition to the 18,272 records I wanted to import.

I have a flat file connection (CSV), Code Page: 20127 (US-ASCII), Format: Delimited, Text Qualifier: ", Header Row Delimiter: {CR}{LF}, Header Rows to Skip: 2, Column Delimter: Comma{,}, Data Type: unicode String [DT_WSTR] (for ALL Columns).

I have an OLEDB Connection to my Staging table and I'm using 'fast load'.

What could be different about the 2nd file to cause this? What 'file editor' would I need to use to see said differences? Is there a setting I could use(set) to Not load Null rows?

Any ideas of what could be causing this?

Thanks,
Sqlraider


If you open the two files in Notepad++, you should get an idea of what the difference is.

As for avoiding the import of this dodgy data, I would use a Conditional Split component in the data flow to redirect all of the rubbish rows to an unused output - that will filter it out.

Or just filter it out when you process the data in staging.


I was focused on the first row of data after the 2nd header record, thinking there was something not right, when in fact there are 494720 Null records (each row is all commas) After the last 'good' data record.

I'm going to use your suggestion of a Conditional Split for those records I don't load into the staging table. That way if for some reason I don't load a record I'll at least still have it.

Thanks,
Sqlraider
Post #1492832
Posted Monday, September 16, 2013 3:57 PM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Monday, July 21, 2014 4:26 PM
Points: 386, Visits: 622
I am going to hazard a guess that the CSV is generated from an Excel file and that there were 400K empty lines at the end of the CSV file. Any time Excel is involved any where near an SSIS package, SSIS seems to wander off in a huff.

Excel and SSIS do not play nicely together in any combination I have found
Post #1495313
Posted Tuesday, September 17, 2013 8:32 AM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Monday, July 21, 2014 4:33 PM
Points: 1,296, Visits: 1,812
aaron.reese (9/16/2013)
I am going to hazard a guess that the CSV is generated from an Excel file and that there were 400K empty lines at the end of the CSV file. Any time Excel is involved any where near an SSIS package, SSIS seems to wander off in a huff.

Excel and SSIS do not play nicely together in any combination I have found


A third party creates the file (how I don't know) and sends it to us as CSV. I didn't see the 400k null lines until I opened the file using Notepad++.
Post #1495529
Posted Tuesday, September 17, 2013 10:38 AM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Monday, July 21, 2014 4:26 PM
Points: 386, Visits: 622
So is the source file missing data (i.e. are the lines of empty fields supposed to have data) or just extra lines.

If you are happy that the data is complete, I would

1) raise a defect against the file source to get the null rows removed
2) apply the conditional split as per Phils suggestion to ignore them - the exact rules for the CS will be down to you and the nature of the data (I would find a field or combination of fields that CANNOT be null and validate against them)
Post #1495582
Posted Tuesday, September 17, 2013 11:49 AM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Monday, July 21, 2014 4:33 PM
Points: 1,296, Visits: 1,812
aaron.reese (9/17/2013)
So is the source file missing data (i.e. are the lines of empty fields supposed to have data) or just extra lines.

If you are happy that the data is complete, I would

1) raise a defect against the file source to get the null rows removed
2) apply the conditional split as per Phils suggestion to ignore them - the exact rules for the CS will be down to you and the nature of the data (I would find a field or combination of fields that CANNOT be null and validate against them)


It's just extra lines (lines with just commas no data between ex: ,,,,,,). I did apply a conditional split off of the ONE field that cannot be null. This is a monthly file and only on occasion does it have extra lines.
Post #1495613
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse