Data Warehouse automation tool suggestions?

  • I'm looking into the available tools in preparation to develop a first data warehouse, if you've used a Data Warehouse automation product before, what are your thoughts on it?

    Also, are the below statements correct, or have I picked up a flawed understanding of the different available products?

    The following Data Warehouse automation tools are coming up most on Google searches for DW automation products:

    • WhereScape RED
    • Attinuity Compose (Formerly BIReady)
    • timeXtender

    With the slower alternatives of hand-coding data warehouse schemas and performing indexing and using SSIS for ETL, or hand-coding the entire process. Both of these approaches require low or no initial investment, but requiring significantly more development resources and less flexible when business requirements change.

    I also understand WhereScape is significantly more expensive (at least in initial investment) but is a great product to use due to it's holistic and abstracted approach to DW and ETL development, and has a reasonable fan base.

    Both BIReady and timeXtender seem to cost upwards of USD$20,000 but provide an GUI that abstracts and reduces schema and ETL development time. These are not as powerful as RED, but cheaper and facilitate faster development than manual coding if used correctly.

  • I have been doing BI for about 8 years now and have never heard of these products. That does not mean they're not good. I looked over the website for each of those companies and it appears that they are pretty new and untested. A data warehouse/BI project is the kind of thing that can make or break a business and I would not be comfortable with an untested product.

    Regarding:

    With the slower alternatives of hand-coding data warehouse schemas and performing indexing and using SSIS for ETL, or hand-coding the entire process. Both of these approaches require low or no initial investment, but requiring significantly more development resources and less flexible when business requirements change.

    It depends on your goals and the scale of the project. Hand-coding a simple data mart for analyzing key metrics and slapping an Excel spreadsheet onto it is not an overly complex task. Indexing is not rocket science.I guess it all comes down to the amount of talent you have to work with and your budget. A data warehouse is a tool that you use to create a single version of your business truth formatted in a way that is easier to report against so that you can make better business decisions. Sometimes this requires an investment in experienced and educated people. My experience is that such an investment is worth the cost and yields a high rate of return if you can find the right people.

    That's my opinion.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • TimeXtender has been on the market for nearly 10 years and our technology is being used daily by more than 2600 customers around the world. I would say that it is neither new nor untested.

    Andrew - let me know if you are interested in taking a closer look at TimeXtender - I will be happy to assist you.

    Kind regards,

    Thomas Duun

    Product Manager

    TimeXtender

  • Thanks Thomas - I appreciate the offer, but not at the moment. I've had a read through the TimeXtender 2014 user guide and have somewhat of an idea of what it can do. I will get in touch should the project grow to a position where it receives the management support required for financial investment in a Data Warehouse automation tool.

    The current scale is very small (a Data Mart for a specific department) and is an initial step into Data Warehousing - I expect ideal for learning how to develop and maintain a small-scale Data Warehouse manually before any automation, with "Microsoft Exam 70-463: Implementing a Data Warehouse with Microsoft SQL Server 2012" as a guide.

  • Andrew P (6/17/2015)


    Thanks Thomas - I appreciate the offer, but not at the moment. I've had a read through the TimeXtender 2014 user guide and have somewhat of an idea of what it can do. I will get in touch should the project grow to a position where it receives the management support required for financial investment in a Data Warehouse automation tool.

    The current scale is very small (a Data Mart for a specific department) and is an initial step into Data Warehousing - I expect ideal for learning how to develop and maintain a small-scale Data Warehouse manually before any automation, with "Microsoft Exam 70-463: Implementing a Data Warehouse with Microsoft SQL Server 2012" as a guide.

    That's a good book. I would also look at the Kimball Data Warehouse Toolkit (Microsoft version); it's excellent and gives you a great overview of the DW/BI development life cycle. It's worth noting that it includes how to provide constant deliverables so as to see a quick RIO.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • If you're still interested in finding out more about WhereScape RED, Attunity and other data warehouse tools (it doesn't look like timeXtender has been added yet) I recommend that you visit IT Central Station and read the user reviews of these products, all of which have been written by people with personal experience using these solutions.
    For example, one BI Architect commented about WhereScape RED "It allows us to focus on the most important aspect of a data warehouse--business requirements--instead of focusing on the low-level details of the development platform." (if you want, you can read the rest of this review here.) 
    Hope this helps.

  • Regardless of what data warehousing tools you choose (if any); the majority of your human resource time will be spent gathering requirements, documenting requirements, team status meetings, troubleshooting source data quality issues, performance tuning, and attempting to overcome the inflexibility and limitations of your specific tool selection. Regardless of tool, you will at least be coding SQL against your source databases. It's a myth that some vendor's data warehousing "automation" tool will be a game changer in terms of project reducing project cost or implementation time. Really, the most deciding factor is that the team consist of experienced data warehouse analysts and developers.

    Kimball's Data Warehouse Toolkit book should be required reading for everyone on the IT side of the team.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Eric, thanks for the reply - I get that, the tool isn't the deciding factor, or where most of the time should be spent. I chose to go with SSIS, because it's purpose-built for ETL, and (I hope)  the graphical design and sensible component labels will assist any other developers in understanding the ETL work flow, with the understanding that SSIS has it's own limitations that need time to learn workarounds for.

    This isn't on-topic to this question, but here's some background. There's just me working on this project, and part-time. The end users are the IT team (the primary focus of this DW is to create a standardized data model for building reporting solutions from), and coming from a reporting background in this company, I'm confident I understand the user requirements and data quality for the primary resources (I've worked with the source data-sets for 6+ years).

    I think I understand the schema design - at least I get the basics (star, snowflake, fact, dimension, measure attribute, surrogate key, accumulating snapshot) - Kimball's DW Toolkit has been a good help with this. I still need to study the last few chapters of that book -understand the subsystems, and modelling processes, those chapters seem more valuable than some of the earlier.

    Currently I'm continuing to learn SSIS in the evenings through the Microsoft training kit for SSIS 2012 and Pluralsight and PASS videos for SSIS, so I can get a few basic star schemas with SCDs up through SSIS so the business can see some results, and I can gain some experience from running/maintaining these in production while I continue to study.

    With myself being the most experienced in the IT team, and limited resource available for outside consultants, this forum (and you Eric and Alan B.) have been a great help in teaching me when I don't understand things or need advice - thanks.

  • If you know your source data, and you know requirements gathering (ie: BusMatrix) / star-schema table design / SCD / ETL using Kimball method, and you know T-SQL / SSIS, then you've got all the essential data warehousing tools you need.

    The SSIS task for implementing SCD does not scale well at all; for all practical purposes it creeps when loading more than about 10,000 rows.
    https://connect.microsoft.com/SQLServer/feedback/details/632052/ssis-slowly-changing-dimension-scd-component-performance-unusably-slow
    However, on CodePlex there is a 3rd party SCD task that you can plug into SSIS.
    https://dimensionmergescd.codeplex.com/

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • If you plan to chose a Data Vault approach you could use the Datavault Builder https://datavault-builder.com which is automating all aspects of a DWH lifecycle and is generating objects on SQL Server in real time while you are designing.

  • This was removed by the editor as SPAM

  • There are several tools available for automating the process of building and maintaining a data warehouse. Some of the more popular options include:

    Talend: This is an open-source ETL (extract, transform, load) tool that can be used to automate the process of extracting data from various sources, transforming it into a format that is suitable for analysis, and loading it into a data warehouse.

    Informatica PowerCenter: This is a commercial ETL tool that offers a wide range of features for automating the process of building and maintaining a data warehouse, including support for real-time data integration, data quality, and data governance.

    AWS Glue: This is a fully managed ETL service offered by Amazon Web Services (AWS) that can be used to extract data from various sources, transform it, and load it into a data warehouse.

    Talend Cloud: This is a cloud-based version of Talend that offers many of the same features as the on-premises version, but with the added convenience of being hosted in the cloud.

    Google Cloud Data Fusion: This is a fully managed, cloud-native ETL service offered by Google Cloud that can be used to extract data from various sources, transform it, and load it into a data warehouse.

    Microsoft Azure Synapse Analytics (formerly SQL Data Warehouse): This is a fully managed data warehouse service offered by Microsoft Azure that includes built-in ETL capabilities for extracting data from various sources, transforming it, and loading it into the data warehouse.

    It's important to evaluate the different tools available and choose the one that best meets the needs of your organization. Factors to consider might include the types of data sources you need to support, the volume of data you need to handle, and your budget.

    Read here complete guide on warehouse automation

     

  • This was removed by the editor as SPAM

  • This was removed by the editor as SPAM

  • If you can get a trial version of the data warehouse software, You should be able to see if it is the right tool.  You may consider looking at some case studies with these tools.   Prototyping tools could be useful in your decision making process.

    Prototyping is a feature of the automated agile data warehouse lifecycle that skips several manual steps. It allows the customer to visualize the design of the architecture they have requested, with full lineage, before the build takes place.

    Here is something I found:

    Statistical Analysis System (SAS) is one of the large data warehouse tools that are used to access information from various data sources. The benefit of SAS data management is that users can access the data from wherever it is stored (from cloud, to legacy systems, to data lakes, like Hadoop, etc.) without any hassle.

    DBASupport

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply