ETL or ELT

  • Something else that drives the approach is whether those in power think an Enterprise Data Warehouse is a server or an ecosystem

  • looks to defer some of the transform work until the data is in the destination.

    Yeah that.

  • I do most of my work as ELT so that I can leverage the servers and the flexibility of TSQL as needed.

    There are times when it isn't practical (joining up data from multiple sources) so it gets the slower but more effective ELT treatment.

    The answer is "the one that best suits the process and business needs".

  • I'll admit that I often prefer to transform data inside of a SQL Server target, OLTP or DW database. The reason is that mass changes are often easier with T-SQL, working sets of data rather than row by row.

    Amen to this. A large part of my job (what I'm taking a break from now actually) is making slow ETL faster. It is not uncommon that the culprit is poorly designed transformation inside the ETL that can be done more efficiently in SQL Server. Data cleansing and de-duplication (when the data is coming from an RDBMS) is a good example of something that should be done outside of the ETL process where you can leverage better indexing and the like. String and text-analysis too. 
    The Kimball Toolkit (MS 2008R2 version) actually recommends doing as much work outside of your ETL process as possible. It's all too common that DBAs and SQL Developers get lazy and let the ETL process do the stuff they should be doing. 

    Someone said ELTL - that's a good (and funny) way to describe what I do these days.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Stan Kulp-439977 - Tuesday, April 3, 2018 6:55 AM

    Our version of ELT is dumping the data to a staging table as VARCHARs with SSIS, creating a destination table with appropriate data types for each field and then writing a stored procedure to cast and insert the data from the staging table into the destination table. Never thought of calling it ELT.

    Agree with BOTH hands, could be done even without SSIS, just populate staging table and run stored procedures that have all transformation logic.
    I would prefer data pre-processing in a single go before it inserted into target rather than multiple steps in post-processing updates. 
    I suspect that ELT concept was influenced by introduction of DB refactoring tools - a big number or relatively simple steps rather than BIG BANG step in a single stored procedure. Ironically due to limitations of SQL in Microsoft Access (no multiple SQL statement in a single query) I suspect that is what i did in data migration (before DTS/SSIS was introduced by Microsoft and terms ETL/ELT were invented)

  • Personally I tend to also do ELT as well, for the same reason that many have listed. i tend to find that doing toe transformation with T-SQL is far far easier. We use SSIS at the office, I really don't find it lends itself to doing complex transformations well. I find this especially true if the data needs to be normalised in some way. I get quite a few files where the data needs to be pivoted in places, and I've not found a graceful way to do that in SSIS without a Script Task; and why should I need to write a bunch of C#/VB.net (which isn't my fortรฉ), when you can quickly do the whole thing using T-SQL and a "cross pivot".

    There are a few times that I have done the transformation before the load, but they were earlier in my usage of SSIS; I feel I've "learned by lesson" since. ๐Ÿ™‚

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • It is not uncommon that the culprit is poorly designed transformation inside the ETL that can be done more efficiently in SQL Server.

    I do not understand why there seems to be this conception that "T" must take place outside of SQL Server.  All my "T" is done in a SQL Server staging database using set-based TSQL queries to format the data as required before it's "L'd" into the target tables.  T != RBAR.  And T as the middle step does not mean no SQL Server database.

    I'd add that I often am assigned to speed up ETL processes.  One common way I've found is by taking T's that happen after Ls and move them to a staging table so that the T can occur before the L.

  • Really whichever one works and whichever one makes more sense for the specific problem you're trying to solve and the tools that are available.  In some cases loading all the data first then doing a bunch of manipulation in the target might make sense in other cases do that manipulation in flight on the data makes more sense for example when the manipulations are based on data that doesn't really have any value in loading into a staging table.

  • Stan Kulp-439977 - Tuesday, April 3, 2018 6:55 AM

    Our version of ELT is dumping the data to a staging table as VARCHARs with SSIS, creating a destination table with appropriate data types for each field and then writing a stored procedure to cast and insert the data from the staging table into the destination table. Never thought of calling it ELT.

    That's what i often do, and used to call it ETL. Trying to learn ELT.

  • Steve Jones - SSC Editor - Wednesday, April 4, 2018 8:44 AM

    Stan Kulp-439977 - Tuesday, April 3, 2018 6:55 AM

    Our version of ELT is dumping the data to a staging table as VARCHARs with SSIS, creating a destination table with appropriate data types for each field and then writing a stored procedure to cast and insert the data from the staging table into the destination table. Never thought of calling it ELT.

    That's what i often do, and used to call it ETL. Trying to learn ELT.

    "Eee Tee Elle" feels "easier" to pronounce than "Eee Elle Tee". ๐Ÿ˜›

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • ELT was always my typo upon deploying a new ETL job.  Now i can say 'yep i meant to do that'.
  • I've always tended to do ELT, without even realizing it, because so many of the ETL tools are difficult to get anything accomplished.  For me, it's usually a question of what is the path of least resistance, or get the data somewhere where I can actually do something with it.  Let's face it, while SSIS has a number of tasks to allow you to do transformation work, the vast majority of the time it's easier to handle in the database itself.

  • RonKyle - Wednesday, April 4, 2018 5:34 AM

    It is not uncommon that the culprit is poorly designed transformation inside the ETL that can be done more efficiently in SQL Server.

    I do not understand why there seems to be this conception that "T" must take place outside of SQL Server.  All my "T" is done in a SQL Server staging database using set-based TSQL queries to format the data as required before it's "L'd" into the target tables.  T != RBAR.  And T as the middle step does not mean no SQL Server database.

    I'd add that I often am assigned to speed up ETL processes.  One common way I've found is by taking T's that happen after Ls and move them to a staging table so that the T can occur before the L.

    Depends what the transform is and what the data source was.

    TOPAS is a COBOL structure requiring quite a bit of string manipulation.  Until SQL Server can handle ORC & Parquet files I have to use code to read those files.  It is almost a preprocessing activity to get the data into a form where set based operations are practical and can be serviced properly in an RDBMS.
    It is always a trade off between the complexity of a multi tech system and the complexity that comes from trying to use one tool for everything

  • TOPAS is a COBOL structure requiring quite a bit of string manipulation. Until SQL Server can handle ORC & Parquet files I have to use code to read those files. It is almost a preprocessing activity to get the data into a form where set based operations are practical and can be serviced properly in an RDBMS.

    I don't doubt that there are some transformations that SQL Server can't do well.  I even have a couple of things I would use assemblies for if I were more skilled with them.  But the time savings isn't worth the time investment.  The intent of my comment was to address those who seemed to associate the T between E and L with only RBAR activities.  I don't understand why set based operations against only the incremental extract is somehow more efficient than set based operations against the target tables. 

  • RonKyle - Wednesday, April 4, 2018 11:30 AM

    TOPAS is a COBOL structure requiring quite a bit of string manipulation. Until SQL Server can handle ORC & Parquet files I have to use code to read those files. It is almost a preprocessing activity to get the data into a form where set based operations are practical and can be serviced properly in an RDBMS.

    I don't doubt that there are some transformations that SQL Server can't do well.  I even have a couple of things I would use assemblies for if I were more skilled with them.  But the time savings isn't worth the time investment.  The intent of my comment was to address those who seemed to associate the T between E and L with only RBAR activities.  I don't understand why set based operations against only the incremental extract is somehow more efficient than set based operations against the target tables. 

    Some people seem to get to hung up on having a very rigid structure of ETL or ELT where either everything is a straight copy from source to staging then all transforms are done to the target or everything is done in transfer process and loaded into the target without anything else being done.

    Where I'm working now it's fairly common to do a mixture of both, as well as putting views on top the source data and do some work there as well, so i guess that's doing transforms as part of the extract?

Viewing 15 posts - 16 through 30 (of 32 total)

You must be logged in to reply to this topic. Login to reply