Strategy Questions regarding Data Lake and multiple BI reporting tools with ETL

  • Hi,

    Please forgive me if this is wordy.  I am working on a future strategy for our BI environments.  We will have (among misc source systems) an ERP that provides their own data lake integration as well as a BI reporting tool with imbedded transformation logic.  We also have another BI reporting tool (Power BI).  My prior experiences (over 10 years) has been in on-prem traditional data warehouse/mart approaches.  Both jobs have been for large companies with complex ETL/integration so I would expect transformation logic more complex that what is included in Power BI or our ERP BI reporting tool, although I am hoping as this is a smaller organization, it will not be as complex. My goal is to provide an environment (not for data scientists/exploring data) for reporting that minimizes redundancy and maximizes data governance and common metrics.  My list of assumptions/questions are:

    Assumption: I am planning on using the ERP Data Lake as my ultimate source whenever possible.  Non ERP info can also be pushed into this Data Lake.

    Assumption: Minimize impact for releases of ERP which also updates Data Lake and the out of the box reports that are provided.

    Q1 - Would a best practice be to 'cleanse'/transform raw data into staging tables in the Data Lake to be sourced into either Power BI or the ERP BI tool?  These tools would use their imbedded ETL to build facts/dims in each tool for reporting. However at least common rules can be applied into the logic for building staging tables in the Data Lake (or would these need to be stored in another cloud DB)

    Or - 'cleanse'/transform raw data into dims/facts using staging tables/stored procedures and store the dims/facts in the Data Lake?

    Or - 'cleanse'/transform raw data into dims/facts using staging tables/stored procedures and store dims/facts within another cloud database?

    Or -'cleanse' raw data in the data lake into staging tables in the data lake (which can also be used for adhoc reporting), and then build dims/facts (either inside or outside of the Data Lake) that both tools would use.

    Or - use the ETL that is provided in Power BI and ERP BI tool independently depending on source of data and have duplicate business rules/transformations and multiple dimensions/facts within each tool?

    Thank you all for your feedback.

  • Thanks for posting your issue and hopefully someone will answer soon.

    This is an automated bump to increase visibility of your question.

  • Hello,

    I have worked on a similar strategic initiative and utilized the following methodologies to implement the on-prem SQL Server to Data Lake to Azure SQL DW(Synapse) ETL using Azure Data Factory. Once in the lake Data Scientists can consume from the Lake or Synapse using Databricks or other tools of choice. Also, Power BI could connect to both the dimensional models in Synapse or folders in the Data Lake.:

    1. Azure Data Factory Pipeline to fully Load all SQL Server Objects to ADLS Gen2, see: https://www.mssqltips.com/sqlservertip/6302/azure-data-factory-pipeline-to-fully-load-all-sql-server-objects-to-adls-gen2/
    2. Logging Azure Data Factory Pipeline Audit Data: https://www.mssqltips.com/sqlservertip/6320/logging-azure-data-factory-pipeline-audit-data/
    3. Load Data Lake files into Azure Synapse Analytics Using Azure Data Factory: https://www.mssqltips.com/sqlservertip/6350/load-data-lake-files-into-azure-synapse-analytics-using-azure-data-factory/

     

    Thanks

     

  • This was removed by the editor as SPAM

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply