DWH generator tools - Pro's & Con's

  • Hi there

    Have you experience with DWH generator tools? We are designing a new DWH architecture and weighting pro's & con's about DWH generator tools against the traditional way of doing it (SSIS in this case).

    Con's:

    - Training the BI-developer for the new tool

    ○ But might be less than introduce to the complex manual process in the traditional way (my experience)

    - Licence fee

    - You are dependent on a certain architecture built trough your tool

    - Dependent on the tool, it's hard to get work without the tools on a later stage

    Pro's:

    - (Much) more agile: Short cycles from business requirements to the marts, cubes and even reports

    ○ Long cycle times and development bottlenecks lead to missed business opportunities. Therefore, increasing agility by eliminating those bottlenecks will increase the benefits of your reporting and analytics investment.

    ○ BICC (BI competence center) often becomes the bottleneck as decisions about technology and architecture are subsumed within demands for report development and configuration. Not with a generator.

    - Easy startup with less requirements and easy reengineering during your developing period rather than a big-bang implementation in the traditional way, where you have to know every requirements from the first time you start developing.

    ○ Automatic generation of Cubes, SCD2, delta-load, etc. as built in features

    - Fast and easy structure changes: Automating discovery and alignment of data source helps you to implement changes at one point and change it everywhere

    - Generator tools support building and managing data warehouses and reduces the IT staff’s burden for design and development. This provides more time for BI-developers to focus on helping business users evaluate their specific business problems

    - Automatic generated documentations (lineage, Loading process already included) and test cases

    - Less skilled people can easily add new features, changing structure

    The tool I know so far are:

    - biGenius, Trivadis: http://goo.gl/MrOmJ0

    - DWA, TimeXtender: http://goo.gl/QxXP1l

    Last year I almost only worked with a generator tool and I just loved it. As a BI-developer you can focus on the real challenges you've got and the manual recurring work is done by a tool, and that assures, that it's done the same way all the time.

    I tried to find some information on the internet, but I guess this is a pretty new way of doing it, because there are not that much information. One thing I found is the best practice from TDWI. It's sponsored by TimeXtender, but from my point of view, it tells pretty much the advantages of these tools:

    - TDWI Checklist Report // Seven Best Practices for Adopting Data Warehouse Automation, December 18, 2015: https://goo.gl/5wpAGj

    Although there are strong Con's, I would never choose the traditional way again, the advantages are too big.

    What tools do you use? Why and why not using one? What are your experiences? Pro's and Con's?

    Looking forward to an interesting discussion.

    Cheers Simon

    ------------------------------------------------------------------------------------------------------------------------------------------------
    Im working as a Business Intelligence Specialist and mainly with Microsoft BI and data warehousing but also with Oracle Databases. I like working with the data and it is my passion to get more out of the data.

  • I bought an ETL generation tool where I work at and love it. I have to create my tables and columns one by one, but after that, it builds the entire ETL job for me in T-SQL. From what I understand there are free tools that do the same thing for SSIS as well. Like you said, it allows the developer to focus on the value added activities and not worry about the nitty gritty details of getting data from source to dw. The tool I use also builds in data validation logic and moves rows that fail validation to a separate database for review.

  • JesseBizInt (2/24/2016)


    I bought an ETL generation tool where I work at and love it. I have to create my tables and columns one by one, but after that, it builds the entire ETL job for me in T-SQL. From what I understand there are free tools that do the same thing for SSIS as well. Like you said, it allows the developer to focus on the value added activities and not worry about the nitty gritty details of getting data from source to dw. The tool I use also builds in data validation logic and moves rows that fail validation to a separate database for review.

    Hi, thanks for your answer, what tool do you use? And is the similar free tool which does it in SSIS, I don't know that? Or do you mean BIDS-helper, but I think this is something different.

    ------------------------------------------------------------------------------------------------------------------------------------------------
    Im working as a Business Intelligence Specialist and mainly with Microsoft BI and data warehousing but also with Oracle Databases. I like working with the data and it is my passion to get more out of the data.

  • The tool was a custom tool that our consultant wrote. It basically is a Windows form application that prompts the user to fill in a bunch of values, and then generates the T-SQL scripts to build the data warehouse, data staging area, a data quality area for storing records that don't meet the business rules to be considered valid data, as well as generating all the T-SQL to move data from the source systems or a pre-staging database into the system.

    It doesn't generate any SSIS - I do that by hand to move data from my source systems into my pre-staging database. From there, the tool takes over and handles the rest.

  • I can speak to why I don't use one.

    Outside of not knowing they existed, it's hard to rely on a point-and-click solution to automatically craft a complex solution such as the data warehouse. This is not a small task. This is a very large task. It's also not a simple solution, it's a very critical solution that requires a lot of care. To put all that trust into a tool that automatically generates models, code, ETL packages and the works is actually pretty scary.

    Why is it scary to someone like me who typically builds the EDW from scratch? Because it's depending on assumptions made by limited and sometimes maybe even vague direction from the end user. The whole selling point of these products is to be up and running quicker rather than later.

    The unfortunate reality is that data warehousing does take a lot of careful planning. There is no escaping that. It's not a technology problem. It's a business process problem in almost all cases where delays happen. This tool is not going to solve that problem. It's just likely going to help you cycle through the iterations of those problems faster before you get to the finished product. Only then will you start to evaluate if you trust the automation of what the final result will be from the tool to see if the approach the tool took is the right one for you.

    One good example from the second link I saw was their selling point of automatically connecting to the data sources on day one and start evaluating the data. Possibly even starting to build out that model right there in front of the stakeholder. What if my data is not available yet? What if the stakeholder doesn't know what data they want? What if the data is so large it can't be loaded that easily? A lot of those road blocks happen and no tool is going to solve that for them, especially in my case where the data is so large, no tool is just going to load it up quick and easy for me.

    I could see tools like these be great for smaller warehouses. I don't think anyone worth their salt in the DW field would waste money on a auto-generate data warehouse much like few rely on auto-generate code for being a developer.

  • Woops hit quote other than edit. Ignore.

  • JesseBizInt (3/2/2016)


    The tool was a custom tool that our consultant wrote. It basically is a Windows form application that prompts the user to fill in a bunch of values, and then generates the T-SQL scripts to build the data warehouse, data staging area, a data quality area for storing records that don't meet the business rules to be considered valid data, as well as generating all the T-SQL to move data from the source systems or a pre-staging database into the system.

    It doesn't generate any SSIS - I do that by hand to move data from my source systems into my pre-staging database. From there, the tool takes over and handles the rest.

    That tool could save a lot of people a lot of time. Sounds like its production was paid for but it sounds like a great start for a CodePlex Project, if that were an option. I could see a lot of use cases and would probably become a user and code-contributor on Day 1.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • xsevensinzx (3/4/2016)


    Outside of not knowing they existed, it's hard to rely on a point-and-click solution to automatically craft a complex solution such as the data warehouse. This is not a small task. This is a very large task. It's also not a simple solution, it's a very critical solution that requires a lot of care. To put all that trust into a tool that automatically generates models, code, ETL packages and the works is actually pretty scary.

    There are at least two different generators. One as you said, where you have to put all your trust in the tool. But there is also another way, where you basically write the generated code which will be generated. So you only have to trust in your own scripting. And that is the whole big advantage, you write code and use it as a template for all objects you want. I really like this kind of programming.

    But sad most companies are afraid to go this step, they are afraid more than they see the advantage it will bring. I worked for one year with a very good tool, which I very much liked as a developer, it's a whole new experience and we were like 50-60% faster. Didn't regret the decision. 🙂

    But every requirements are different and all tools are different. Would be great to have a full overview of most of the tools and the see pro's and con's of each other. Are anyone know's some of them? I already began with two, but would like to add more 😀

    ------------------------------------------------------------------------------------------------------------------------------------------------
    Im working as a Business Intelligence Specialist and mainly with Microsoft BI and data warehousing but also with Oracle Databases. I like working with the data and it is my passion to get more out of the data.

  • I have been toying with the idea of writing my own tool and releasing it as an open source tool, but I am not the most knowledgeable developer or data warehouse architect. My company paid $15k for this tool, which basically just collects a list of values and uses tokens in SQL scripts to generate the SQL code.

    For example, let's say we are adding a table to the EDW. The CREATE TABLE statement in the template might look like this:

    CREATE TABLE [/*<<dw_schema>>/*].[/*<<table_name>>/*]

    In the program, you would have two text boxes labelled dw_schema and table_name that you fill in and hit the generate button. Let's say our table name was DimEmployees and our schema was DW. The program would generate the following CREATE TABLE statement:

    CREATE TABLE [DW].[DimEmployees]

    Of course the tool does a lot more than just this, but all of its functionality is template driven via T-SQL. Extracts, lookups, transforms, data validation, every bit of it is done in T-SQL using the staging database as a controller and uses stored procedures to hold the functionality.

    If I were to write my own tool, I'd probably code it to generate SSIS packages for extracting source data to a staging database, and instead of series of text boxes I'd probably use a wizard based interface to create a new fact or dimension. Like I said, it's all just an idea in my head right now but I might sit down and take a shot at coding something.

  • JesseBizInt (3/22/2016)


    I have been toying with the idea of writing my own tool and releasing it as an open source tool, but I am not the most knowledgeable developer or data warehouse architect. My company paid $15k for this tool, which basically just collects a list of values and uses tokens in SQL scripts to generate the SQL code.

    For example, let's say we are adding a table to the EDW. The CREATE TABLE statement in the template might look like this:

    CREATE TABLE [/*<<dw_schema>>/*].[/*<<table_name>>/*]

    In the program, you would have two text boxes labelled dw_schema and table_name that you fill in and hit the generate button. Let's say our table name was DimEmployees and our schema was DW. The program would generate the following CREATE TABLE statement:

    CREATE TABLE [DW].[DimEmployees]

    Of course the tool does a lot more than just this, but all of its functionality is template driven via T-SQL. Extracts, lookups, transforms, data validation, every bit of it is done in T-SQL using the staging database as a controller and uses stored procedures to hold the functionality.

    If I were to write my own tool, I'd probably code it to generate SSIS packages for extracting source data to a staging database, and instead of series of text boxes I'd probably use a wizard based interface to create a new fact or dimension. Like I said, it's all just an idea in my head right now but I might sit down and take a shot at coding something.

    That's correct what you write but in a veeeeeery very simple way. These tools are doing exactly that. So why start again and build up the same, when there are already tools like that. But if you are actually starting one, let me know, I'm always interested in new projects.

    Right now, I have an idea to build an open DWH where I would like to collect all kind of open Data like:

    http://www.opendata.dk/

    https://opendata.swiss/en

    and connection all kind of data report all differet reports that you even can't think of yet. And that I would of course do with a generator, but sadly, they are all not for free, or even the one I know.

    Does anybody knows good free generator tools?

    ------------------------------------------------------------------------------------------------------------------------------------------------
    Im working as a Business Intelligence Specialist and mainly with Microsoft BI and data warehousing but also with Oracle Databases. I like working with the data and it is my passion to get more out of the data.

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply