SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


SSIS Design Pattern - Staging Fixed Width Flat Files


SSIS Design Pattern - Staging Fixed Width Flat Files

Author
Message
Sam Vanga
Sam Vanga
Ten Centuries
Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)

Group: General Forum Members
Points: 1038 Visits: 502
Comments posted to this topic are about the item SSIS Design Pattern - Staging Fixed Width Flat Files

Sam Vanga
http://SamuelVanga.com
t.pinder
t.pinder
SSC Rookie
SSC Rookie (34 reputation)SSC Rookie (34 reputation)SSC Rookie (34 reputation)SSC Rookie (34 reputation)SSC Rookie (34 reputation)SSC Rookie (34 reputation)SSC Rookie (34 reputation)SSC Rookie (34 reputation)

Group: General Forum Members
Points: 34 Visits: 69
In almost all cases, parsing in the data flow pefroms (sic) better than parsing at the source, if not at the same performance level.


Sam, I'm sorry, this sentence doesn't make any sense to me. If something performs better, say so. But adding the qualifier "if not at the same performance level" negates any meaning.

I'm not saying your methodology is wrong - in fact, in terms of human effort, it's vastly superior IMO, but I can't work out whether you've added the qualifier beacuse the total processing time is increased (is it?) or for some other reason.
RonKyle
RonKyle
SSCoach
SSCoach (19K reputation)SSCoach (19K reputation)SSCoach (19K reputation)SSCoach (19K reputation)SSCoach (19K reputation)SSCoach (19K reputation)SSCoach (19K reputation)SSCoach (19K reputation)

Group: General Forum Members
Points: 19227 Visits: 4108
Moreover, if the value is unknown or unavailable you’ll see a blank string, such as ‘ ‘. It is often better to convert these blanks to NULL. You can off course load the blank value to database column without any transformations, and I think it's again my preference to convert it to NULL.


Just to give a different view. I would convert it to '' if the value was to indicate that there was no value. I would convert it to NULL if the value was to indicate the value was not known or that it's in some ways not certain. If that's not able to be determined, it would be a judgement call.



Iwas Bornready
Iwas Bornready
SSC-Forever
SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)

Group: General Forum Members
Points: 45774 Visits: 886
Thanks for the article. We do a lot of these sort of imports and have many of the same issues such as dealing with blanks. But we have already determined what to do with them when we created our destination files by setting whether or not to allow nulls.
Honny
Honny
SSC Veteran
SSC Veteran (204 reputation)SSC Veteran (204 reputation)SSC Veteran (204 reputation)SSC Veteran (204 reputation)SSC Veteran (204 reputation)SSC Veteran (204 reputation)SSC Veteran (204 reputation)SSC Veteran (204 reputation)

Group: General Forum Members
Points: 204 Visits: 107
Its my suggestion ,In case of using two Derived Column and one Data Conversion in between source and Destination, it is performance Factor .When we are aware of length of each column then use Fixed width in Flat File Connection Manager and in advanced tab add new column as per requirement and order of the file then change length, datatype and length and map directly with Destination table.

This will reduce Transformations used and give High Performance when loading Billion Records from source file.
Gerald Britton
Gerald Britton
SSCrazy Eights
SSCrazy Eights (8.9K reputation)SSCrazy Eights (8.9K reputation)SSCrazy Eights (8.9K reputation)SSCrazy Eights (8.9K reputation)SSCrazy Eights (8.9K reputation)SSCrazy Eights (8.9K reputation)SSCrazy Eights (8.9K reputation)SSCrazy Eights (8.9K reputation)

Group: General Forum Members
Points: 8850 Visits: 1788
This is a good start, however I do not see any error handling. Do you just let the package fail? If so, how would you troubleshoot a failure?

Gerald Britton, MCSE-DP, MVPToronto PASS Chapter
Sam Vanga
Sam Vanga
Ten Centuries
Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)

Group: General Forum Members
Points: 1038 Visits: 502
t.pinder (10/6/2015)
In almost all cases, parsing in the data flow pefroms (sic) better than parsing at the source, if not at the same performance level.


Sam, I'm sorry, this sentence doesn't make any sense to me. If something performs better, say so. But adding the qualifier "if not at the same performance level" negates any meaning.

I'm not saying your methodology is wrong - in fact, in terms of human effort, it's vastly superior IMO, but I can't work out whether you've added the qualifier beacuse the total processing time is increased (is it?) or for some other reason.


t.pinder - Thanks for this comment, I really like what you said. Definitely something for me to learn as I write more!

Sam Vanga
http://SamuelVanga.com
Sam Vanga
Sam Vanga
Ten Centuries
Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)

Group: General Forum Members
Points: 1038 Visits: 502
Honny (10/6/2015)
Its my suggestion ,In case of using two Derived Column and one Data Conversion in between source and Destination, it is performance Factor .When we are aware of length of each column then use Fixed width in Flat File Connection Manager and in advanced tab add new column as per requirement and order of the file then change length, datatype and length and map directly with Destination table.

This will reduce Transformations used and give High Performance when loading Billion Records from source file.


Both Derived Column and Data Conversion transformations are asynchronous, meaning they shouldn't be a drag on performance. Having said that, I do plan on do a performance test and blog about it. It could take more than couple of weeks though.

Sam Vanga
http://SamuelVanga.com
Sam Vanga
Sam Vanga
Ten Centuries
Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)

Group: General Forum Members
Points: 1038 Visits: 502
g.britton (10/6/2015)
This is a good start, however I do not see any error handling. Do you just let the package fail? If so, how would you troubleshoot a failure?


As I stated in the article, error handling is out of the scope - I don't suggest that it's not required. I wanted to focus on the actual pattern.

Sam Vanga
http://SamuelVanga.com
skeleton567
skeleton567
SSCrazy
SSCrazy (2.2K reputation)SSCrazy (2.2K reputation)SSCrazy (2.2K reputation)SSCrazy (2.2K reputation)SSCrazy (2.2K reputation)SSCrazy (2.2K reputation)SSCrazy (2.2K reputation)SSCrazy (2.2K reputation)

Group: General Forum Members
Points: 2150 Visits: 538
This is a very good point. To me it is unforgivable to allow process aborts. Better have a table for error rows and/or groups of rows. Then send notifications to the data owner and even return invalid data that was not processed. You must never create processes that require IT personnel to intervene. Bad data must be rejected and returned to the creator. Depending on the nature of the data, you may need to reject a whole file batch or complete documents or whatever the application requires, but be sure YOU don't have to handle problems.
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum








































































































































































SQLServerCentral


Search