How do you handle SSIS package merging between two BI teams?

  • Hi!!

    Here's the thing - I have the following scenario:

    Base points

    - Multiple teams work on the same Data Warehouse (DW)

    - The SSIS structure is rather complex (1000+ dtsx packages) and organized in a tree-like structure

    - The leaves are 'unit of work' (i.e. 1 table = 1 flow)

    - Everything is tracked on GIT, with different branches

    Issue

    When we have to work on the intermediate nodes, we have to wait and synchronize with other teams to be certain that none of them is developing logic on these nodes, before we can start with our own development.

    I've been searching on the net and all I found was more people asking this same type of question. Unfortunately I didn't find a suitable answer, i.e. some kind of tool to successfully merge changes performed on the same .dtsx. The closest I could find was 'file comparison' approach but nothing that we could run automatically.

    How do you handle shared SSIS projects?
    Thanks

  • Data Cruncher - Tuesday, October 9, 2018 6:55 AM

    Hi!!

    Here's the thing - I have the following scenario:

    Base points

    - Multiple teams work on the same Data Warehouse (DW)

    - The SSIS structure is rather complex (1000+ dtsx packages) and organized in a tree-like structure

    - The leaves are 'unit of work' (i.e. 1 table = 1 flow)

    - Everything is tracked on GIT, with different branches

    Issue

    When we have to work on the intermediate nodes, we have to wait and synchronize with other teams to be certain that none of them is developing logic on these nodes, before we can start with our own development.

    I've been searching on the net and all I found was more people asking this same type of question. Unfortunately I didn't find a suitable answer, i.e. some kind of tool to successfully merge changes performed on the same .dtsx. The closest I could find was 'file comparison' approach but nothing that we could run automatically.

    How do you handle shared SSIS projects?
    Thanks

    You don't merge DTSX files. Treat them like executables.
    Shared SSIS projects are completely fine, however – but somehow you need to find a way of ensuring that the same package is not being worked on by more than one person at a time.

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • Data Cruncher - Tuesday, October 9, 2018 6:55 AM

    Hi!!

    Here's the thing - I have the following scenario:

    Base points

    - Multiple teams work on the same Data Warehouse (DW)

    - The SSIS structure is rather complex (1000+ dtsx packages) and organized in a tree-like structure

    - The leaves are 'unit of work' (i.e. 1 table = 1 flow)

    - Everything is tracked on GIT, with different branches

    Issue

    When we have to work on the intermediate nodes, we have to wait and synchronize with other teams to be certain that none of them is developing logic on these nodes, before we can start with our own development.

    I've been searching on the net and all I found was more people asking this same type of question. Unfortunately I didn't find a suitable answer, i.e. some kind of tool to successfully merge changes performed on the same .dtsx. The closest I could find was 'file comparison' approach but nothing that we could run automatically.

    How do you handle shared SSIS projects?
    Thanks

    Easy, we have two data warehouses.  😀

    One data warehouse for the BI team and one for everything else. This is because the BI team typically needed more granular and more rigorous work done versus the other teams that were primarily using the data warehouse for general reporting. Essentially, separating analytics from operational reporting. This evolved into the BI team using more of a cloud based solution (Azure DW) versus the operation reporting (SQL Server) on top of the BI team using other tools like machine learning, data lakes, etc that are more suited for analytical processing than the other.

    The other go to is not having BI touch the data warehouse and just focus on the data marts. One team for data warehouse and one for data mart/OLAP creation. Anything the BI needs from the data warehouse is handled by ONE data warehouse team.

  • As has been mentioned, you don't merge SSIS packages.   There's no tool that's ever going to be able to accomplish such a task.   How could one possibly know what to do when pieces of two SSIS package edits conflict?   Plus, who would even want to make an attempt to do that?   You'd have to be able to manually choose what parts of the conflicting XML to save and thus you'd have to be able to know exactly what the rules are the exact same way that SSIS does, and you'd have to be able to recognize the problem in 100% of all such cases.   Not likely in our lifetimes.   The level of AI needed is well beyond what AI is likely to accomplish any time soon, and even then, might still be totally impractical.   You have to use a source control system that allows you to check out the file such that no one else is allowed to make changes until after you check the code back in.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • Uhm, got it.. So we'll keep raising hands to get a priority on a development branch. Somehow I was hoping for a more computer-aided solution but your replies are more than reasonable.
    Thank you folks 🙂

  • Data Cruncher - Wednesday, October 17, 2018 4:42 AM

    Uhm, got it.. So we'll keep raising hands to get a priority on a development branch. Somehow I was hoping for a more computer-aided solution but your replies are more than reasonable.
    Thank you folks 🙂

    How big is your development team? If you're 'doing' Agile, it should be easy enough to know in advance what others are working on and thus avoid such collisions. I can't imagine any practical situations where you would want more than one person working on the same package, anyway.

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • I agree with you @Phil Parkin but in this case the ETL has a hierarchical structure (4 levels). Each team has its own branch and when we work on leaf nodes (let's call them level 4 packages) everything is fine. However the issue arises when both teams work on intermediate nodes (ex. level 2 packages, that call level 3 packages, that call leaf packages).
    So merging the branches causes issues on level 2 packages becasue both teams modified it (added logic and references to level 3 packages).
    I hope I managed to explain myself, although I realize it can be a bit confusing at first (and yeah, I'm also a bit skeptical if all that hieararchy was really necessary).

  • Data Cruncher - Wednesday, October 17, 2018 6:59 AM

    I agree with you @Phil Parkin but in this case the ETL has a hierarchical structure (4 levels). Each team has its own branch and when we work on leaf nodes (let's call them level 4 packages) everything is fine. However the issue arises when both teams work on intermediate nodes (ex. level 2 packages, that call level 3 packages, that call leaf packages).
    So merging the branches causes issues on level 2 packages becasue both teams modified it (added logic and references to level 3 packages).
    I hope I managed to explain myself, although I realize it can be a bit confusing at first (and yeah, I'm also a bit skeptical if all that hieararchy was really necessary).

    OK, I think I understand.

    I avoid anything like this because I have built a framework which allows me to specify, in metadata, which packages should be called as part of a job, and in what order (also allowing for parallelism).

    Adopting this means that packages in jobs can be modified in isolation. We have a rule that packages in jobs do not reference any other packages.

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • Data Cruncher - Tuesday, October 9, 2018 6:55 AM

    How do you handle shared SSIS projects?

    It's easy for me.  I write stored procedures to replace them. 😉  Only 2 more to go and then we can turn off SSIS.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Wednesday, October 17, 2018 7:48 AM

    Data Cruncher - Tuesday, October 9, 2018 6:55 AM

    How do you handle shared SSIS projects?

    It's easy for me.  I write stored procedures to replace them. 😉  Only 2 more to go and then we can turn off SSIS.

    Can we set your username to SSIS Troll😛

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • Phil Parkin - Wednesday, October 17, 2018 8:15 AM

    Jeff Moden - Wednesday, October 17, 2018 7:48 AM

    Data Cruncher - Tuesday, October 9, 2018 6:55 AM

    How do you handle shared SSIS projects?

    It's easy for me.  I write stored procedures to replace them. 😉  Only 2 more to go and then we can turn off SSIS.

    Can we set your username to SSIS Troll😛

    😛

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Wednesday, October 17, 2018 9:37 AM

    Phil Parkin - Wednesday, October 17, 2018 8:15 AM

    Jeff Moden - Wednesday, October 17, 2018 7:48 AM

    Data Cruncher - Tuesday, October 9, 2018 6:55 AM

    How do you handle shared SSIS projects?

    It's easy for me.  I write stored procedures to replace them. 😉  Only 2 more to go and then we can turn off SSIS.

    Can we set your username to SSIS Troll😛

    😛

    I mean, that's not a bad suggestion. Anything you can do in SSIS, I can likely do the exact same with TSQL. 😀

  • xsevensinzx - Thursday, October 18, 2018 6:47 AM

    I mean, that's not a bad suggestion. Anything you can do in SSIS, I can likely do the exact same with TSQL. 😀

    OK, here's a nice starter for you. What T-SQL commands would you use to grab Google Analytics data (which requires a web service login using OAuth 2.0)?

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • Phil Parkin - Thursday, October 18, 2018 7:13 AM

    xsevensinzx - Thursday, October 18, 2018 6:47 AM

    I mean, that's not a bad suggestion. Anything you can do in SSIS, I can likely do the exact same with TSQL. 😀

    OK, here's a nice starter for you. What T-SQL commands would you use to grab Google Analytics data (which requires a web service login using OAuth 2.0)?

    I don't know off the top of my head because I don't use Google Analytics data.  But, if you've written code that can do it, then it can be done without the use of SSIS.  I do have T-SQL-driven code that auto-magically downloads data from the NANPA website with the understanding that a special login isn't required.

    Now, that being said, if someone has an SSIS package that already does that for Google anything, there's no sense in redeveloping the wheel... unless it becomes a problem (and writing a thousand packages is a problem especially if they need to (and they should) deploy even simple maintenance through Dev, Staging, and Prod, IMHO, if it comes to that).

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Thursday, October 18, 2018 7:36 AM

    I don't know off the top of my head because I don't use Google Analytics data.  But, if you've written code that can do it, then it can be done without the use of SSIS.  I do have T-SQL-driven code that auto-magically downloads data from the NANPA website with the understanding that a special login isn't required.

    Now, that being said, if someone has an SSIS package that already does that for Google anything, there's no sense in redeveloping the wheel... unless it becomes a problem (and writing a thousand packages is a problem especially if they need to (and they should) deploy even simple maintenance through Dev, Staging, and Prod, IMHO, if it comes to that).

    I never intended to suggest that there was anything SSIS could do that no other product can do. Instead, I was responding to this assertion:

    Anything you can do in SSIS, I can likely do the exact same with TSQL

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply