SSIS and Date Ranges

  • cmullican

    Old Hand

    Points: 350

    Let me preface this by saying I'm relatively new to SSIS, somewhat new to SQL Server, but not new to database work in general -- most of my experience is with PostgreSQL and MySQL. (And the only thing I ever want to do with MySQL again is migrate off it!)

    That said, I have inherited an ETL process that I'm updating to do incremental loading.  The destination is really a reporting server not a true data warehouse, so the transforms involved are not necessarily what would be used for a star or snowflake schema.  Ultimately, it will be fed from multiple data sources, but for now, I'm just working with one.

    In the step I'm currently working on, each source data row has (among other columns) a start date, an end date, and columns for week1-week14.  The goal is to unroll this so there is one row for each week from start to end (can be 1 week up to 14), with the value from the matching weekN column.

    The current, non-incremental, process makes extensive use of staging tables and hand-written SQL in Execute SQL tasks; the SQL Tasks are badly formatted, hard to read, hard to maintain, and just generally complex.  I've been switching to using Data Flow components, with Lookups for the incremental load, and up to this step, it's been as fast or faster -- we're not dealing with huge amounts of data here; a couple hundred K rows or less for the most part, so avoiding all the disk IO makes up for a lot.

    I've written a Script Task to do the date range expansion, but it's unacceptably slow.  Even though it's processing less than 200K rows, it's taking about 7 hours(!!) to complete.  I've had a coworker look over the code, and tried a few things to speed it up a bit (like switching from dynamically building the output row based on the columns in the input row to specifying each column individually), but with no luck so far. I've attached the two non-boilerplate functions as they currently exist, but I think the answer is more likely to be "Do [X] instead of the script task" -- I just have no idea what [X] might be, other than switching back to staging tables and Execute SQL tasks, which I'd hoped to avoid.

    Any other ideas, or is that my best bet?

    Attachments:
    You must be logged in to view attached files.
  • AlphaTangoWhiskey

    SSChampion

    Points: 10774

    from what you are saying it seems the main issue is the unpivoting of the data.

    Here is a post about using the unpivot component. This may help get you started.

    https://www.mssqltips.com/sqlservertip/1761/how-to-use-the-unpivot-data-flow-transform-in-sql-server-integration-services-ssis/

  • palandri

    SSC Enthusiast

    Points: 193

    I was resistant to using staging tables at first, but later realized they can be the right tool at times.  I have approached similar tasks by loading a staging table using a data flow, then doing the unpivoting in SQL.  We're on 2008 R2, so I unpivoted "manually" but it's still fairly fast,

  • Phil Parkin

    SSC Guru

    Points: 243474

    You might find that you can expand the data ranges very rapidly in T-SQL, by joining to an appropriate calendar table or numbers table (it's hard to be sure what will work best, without first seeing your source data and desired results).

    If the answer to your question can be found with a brief Google search, please perform the search yourself, rather than expecting one of the SSC members to do it for you.

  • cmullican

    Old Hand

    Points: 350

    @alphatangowhiskey -- Ah!  I missed that one, hiding down at the bottom of the Other Transforms list!  Thanks; I'll see if I can get what I need from it.

    @Phil Parkin -- I do have a calendar table available, but with no way i can find to specify a between condition for the join, I can't figure out how I can use it usefully, without putting the data in a staging table and writing an Execute SQL task.

    @palandri -- I'm not ruling out that option, but if I can work without it, I'd like to, mostly because I'm not using staging tables anywhere else in the process so far.

     

  • AlphaTangoWhiskey

    SSChampion

    Points: 10774

    Was also going to suggest the calendar table as a way to expand dates. I have a similar process that i need to get a date range down to 1 row per minute so i can do rolling aggs (deltas) by minute, and its super fast.

    But yeah forget any between join in SSIS do that as a separate DFT in the source component and stage it.

  • Phil Parkin

    SSC Guru

    Points: 243474

    The staging table / stored procedure route is the way I would go. T-SQL will likely blow away anything you can code in SSIS to do this.

    If the answer to your question can be found with a brief Google search, please perform the search yourself, rather than expecting one of the SSC members to do it for you.

  • AlphaTangoWhiskey

    SSChampion

    Points: 10774

    Totally agree, i mentioned it since they seemed to want to get away from the SQL tasks, but calling a sproc is minor.

    I always tell the devs who are trying to use a set pattern for consistency that the best way is to use what works the best.

  • cmullican

    Old Hand

    Points: 350

    Let's just say I'm using this as a learning experience and want to try different ways, even if I end up going back to staging tables in the end.

    Also, seriously, what I inherited is _ugly_.  The Execute SQL task in the original is using three staging tables, updating two destination tables, and using a cursor for part of the unpivot...all in a proportional font, because SSIS is dumb like that, and with inconsistent spacing and capitalization. The formatting is fixable, of course, but it's not something I want to try to maintain...or something I'd be proud to pass on to my successor. Is it any wonder I'm liking neat little boxes and arrows when possible?

  • palandri

    SSC Enthusiast

    Points: 193

    I use Aqua Data or SSMS to develop SQL code, and view the Enter SQL Query window in the SQL Task Editor strictly as a paste destination.   Unless the change is extremely trivial, I copy the code back to Aqua Data to make changes.

  • AlphaTangoWhiskey

    SSChampion

    Points: 10774

    Oh that does sound ugly and a cursor *looks around for Jeff* cursors aren't so bad 🙂

    Yeah i feel your pain, i rarely look at or edit TSQL in the stupid little SQL task editor, i always put it in an ssms query window. Why they couldn't used a fixed width font is beyond me.

    When a SQL statement outgrows the editor i usually consider a sproc instead. Generally use them for simple stuff like truncating tables or calling sprocs.

    edit: before i get roasted the cursor thing was a joke.

  • cmullican

    Old Hand

    Points: 350

    Yeah, when I have to work with the long, ugly queries, I usually pop them in to either gVim or SSMS, but it's an extra step, and doesn't change the fundamental flaws in the design (of the ETL, but also the SSIS query "editor").

Viewing 12 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply