Data Source Schema Changes..

  • Hello All

    We're planning to migrate our current data warehouse from MySQL to MS SQL 2014 and re-architect the current design.

    A major problem we currently have is moving to a DW that supports multi-tenancy, where specific users can access data via SSRS/Tableau, but only specific to their own content, rather than have each client of ours have their own database and increase the cost of ownership of maintenance in the long run - we currently have over 100 DW's for each external client, and a global DW for internal users to support decision making.

    Another issue we're facing is from our source systems. It's dynamic (drupal/MySQL) in the sense that there is always new data fields or content appearing from time to time in the ETL, and we're thinking of how to best manage this process in our new design/platform in an automated fashion, when there are schema/data structure changes from the source system?

    Any ideas will be helpful before the project has kick-started.

  • How do you need to handle the schema changes? Assuming these changes are a fact of life, what do the people adding these fields expect from the DW solution? Dynamic importation?

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • What they expect from the new DW, is to have these new fields created imported and available for reporting.

  • Suppose a Weather table has a Temperature column and it's always measured in Fahrenheit. Now someone adds an option to enter Celsius temperature and adds a UnitOfMeasure column , FK to UnitsOfMesure (new table). What ETL process should do? Should it just import data "as is" or may be it should convert all temperatures to a single UOM? Or may be it depends on target DW?

    I mean the question sounds too general to get a specific resolution.

  • Well, basically we have a SaaS application, for our end users (over 100 in number and growing fast), and from time to time, their admins tend to upload new files on to the system that allows them to create new fields with content.

    The Saas application permits them to adjust fields using the "Custom Classification" area.

    So User Admin 1, may have fields A, B, C, D and E, User Admin 2 may have A,C,E,F,G and H, User Admin 3 could be different or a mixture of what User Admin 1 and 2 have. Not all our users have the same fields, and this can exist today,and change tomorrow depending on client business changes it's dynamic.

    This is handled by the Drupal Application. In our existing ETL process, there is a API call built in Java, that searches the application DB, for existing and dynamically new structures, pulls the data across into text files, which is then transmitted to a staging area.

    From here, the normal business rules are applied for reporting for both external and internal end users- and this took 23hrs two days ago, reason why we're going ahead with the whole re-design of the current platform.

    The structure of the current DW, is too expensive to maintain, in the sense that we have client mini-DW's for each client (100 and growing),and because of this we are leaning towards designing a multi-tenancy DW on SQL Server and have DB Schema's restrict client data.

  • b_boy (1/14/2016)


    What they expect from the new DW, is to have these new fields created imported and available for reporting.

    Are your custom fields stored in a EAV structure? In what form do you want to store this data in the warehouse, 3NF or are we talking a copy of the transactional schema? What are the data access patterns for the reporting environment? Too many questions.

    I guess what I am saying is there is no magical switch that can enable the exportation of data from a dynamically changing transactional schema into a data warehouse. I would recommend you spend some time interviewing your potential data warehouse users to find out what they need and then map the current and planned future state of your transactional system to the data warehouse requirements. You have a tough assignment ahead of you, no doubt.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Can your requirements be stated in terms of entities, their attributes and relationships? It's still quite unclear, to which object admins can add fields and what is the field definition.

  • For the data source? I will get that across ASAP...thanks all, I really do appreciate your suggestions...

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply