SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Escape from ETL Hell


Escape from ETL Hell

Author
Message
Tony Davis
Tony Davis
SSC Eights!
SSC Eights! (911 reputation)SSC Eights! (911 reputation)SSC Eights! (911 reputation)SSC Eights! (911 reputation)SSC Eights! (911 reputation)SSC Eights! (911 reputation)SSC Eights! (911 reputation)SSC Eights! (911 reputation)

Group: Administrators
Points: 911 Visits: 1160
Comments posted to this topic are about the item Escape from ETL Hell
Jim P.
Jim P.
SSC Eights!
SSC Eights! (909 reputation)SSC Eights! (909 reputation)SSC Eights! (909 reputation)SSC Eights! (909 reputation)SSC Eights! (909 reputation)SSC Eights! (909 reputation)SSC Eights! (909 reputation)SSC Eights! (909 reputation)

Group: General Forum Members
Points: 909 Visits: 2215
I worked for ten years at a company that had 27 stove piped systems. I have lived in the ETL nightmare for a long time.

My current position was transformed into the ETL process when our company was bought out by a competitor. I don't actually have to do the Load portion but am responsible for getting the data out and transformed to their load specs. It has been interesting to find out how the end-users have abused the applications and databases.

The current system works so well that five employees can transform up to 80 companies per month. And that is hundreds of medical and financial records per facility.

And really that has happened with every ETL process I have worked.

But sometimes it the programmer that does the RBAR process as oppose to UPDATE statement that causes the slow-down. I ran into a one where there was flag for flood insurance (FI_Req) that was based on two other columns in the table. The programmer was going through 500K+ rows to set the flag one by one.

We changed it to a simple set of update statements, and cut 30 minutes out of the processing time.

So now I always look to see what portion of the process is taking the longest and see if I can cut down the processing time.



----------------
Jim P.

A little bit of this and a little byte of that can cause bloatware.
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (86K reputation)SSC Guru (86K reputation)SSC Guru (86K reputation)SSC Guru (86K reputation)SSC Guru (86K reputation)SSC Guru (86K reputation)SSC Guru (86K reputation)SSC Guru (86K reputation)

Group: General Forum Members
Points: 86093 Visits: 41096
Sounds simple, right? That's its beauty. Much has been written about the intricacies and difficulties of ETL processes, delving deep into the crevices of tools such as SSIS, in order to wring maximum performance from each part of the process. Occasionally, however, someone presents an idea, a solution, of such elegance and simplicity that it immediately seems like common sense, and that everyone must be doing it this way already, though I suspect they aren't. Am I wrong?


Why whatever could you mean, Tony? Doesn't everyone know and enjoy the simplicity and incredible efficiency of such wonders as EDI, XML, HIPPA, CSV and dozens of other standard formats and the high quality, easy to use, intuative tools to exploit them?

{... wait for it...}



{... wait ... }





{... w-a-i-t ... }




{Ok... I can't hold it back any longer...}

BWWWWAAAAA-HAAAAA-HAAAAA-HAAAAA!

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Jim P.
Jim P.
SSC Eights!
SSC Eights! (909 reputation)SSC Eights! (909 reputation)SSC Eights! (909 reputation)SSC Eights! (909 reputation)SSC Eights! (909 reputation)SSC Eights! (909 reputation)SSC Eights! (909 reputation)SSC Eights! (909 reputation)

Group: General Forum Members
Points: 909 Visits: 2215
Jeff Moden (10/26/2013)
{Ok... I can't hold it back any longer...}

BWWWWAAAAA-HAAAAA-HAAAAA-HAAAAA!


Oh I know. My company shutdown a software line several years ago that was not profitable. They gave the customers a choice: go back to older software that worked or we'll give you your data in CSV/XML and the customer can walk away.

I built the CSV/XML export solution using a combo of MS Access/VBA/stored procs. The data was sensible, useable and appreciated by the former customers.

The SSIS packages to convert the data built by the former developers was fraught with issues, could only be run by them, and took days to process.

I look at it is the use of an off-character (pipe, tilde, backslash) from a text file is the best way to tranfer data.

And then storing XML data is like storing a binary file in a database. Never to be done and a waste of space.



----------------
Jim P.

A little bit of this and a little byte of that can cause bloatware.
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (86K reputation)SSC Guru (86K reputation)SSC Guru (86K reputation)SSC Guru (86K reputation)SSC Guru (86K reputation)SSC Guru (86K reputation)SSC Guru (86K reputation)SSC Guru (86K reputation)

Group: General Forum Members
Points: 86093 Visits: 41096
Just to be sure, Jim, I wasn't laughing at you. It sounds like you made moves for the better. I was laughing at my own sarcasm on the subject of "improvements" in the world of data transfer.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Jim P.
Jim P.
SSC Eights!
SSC Eights! (909 reputation)SSC Eights! (909 reputation)SSC Eights! (909 reputation)SSC Eights! (909 reputation)SSC Eights! (909 reputation)SSC Eights! (909 reputation)SSC Eights! (909 reputation)SSC Eights! (909 reputation)

Group: General Forum Members
Points: 909 Visits: 2215
Jeff Moden (10/26/2013)
Just to be sure, Jim, I wasn't laughing at you. It sounds like you made moves for the better. I was laughing at my own sarcasm on the subject of "improvements" in the world of data transfer.


I quite understood. I'm agreeing with you completely.

I know I have been through ETL hell and I think you have probably been there too. DTS sucked. SSIS is a total f---ing joke. Trying to troubleshoot either is a nightmare. The Oracle built in tool sucks as well. And then I worked with the Great Plains import tool when we went from Peoplesoft Accounting at my last job. It took an employee a month of 40 hour weeks to get three years of data into the GP, because it had to be done day by day. And that was using three desktops that the only input was the date that was needed to be imported.

Anytime someone says that the ETL is process is too slow but a different department is at fault I'm going to tell that department "let me in" or "this is the standard time you have to meet". I'm not going to mess around with it. I know what I have to do. The other people need to meet what I have to do.



----------------
Jim P.

A little bit of this and a little byte of that can cause bloatware.
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search