SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Good to have Multiple Data Warehouses?


Good to have Multiple Data Warehouses?

Author
Message
xsevensinzx
xsevensinzx
SSC-Insane
SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)

Group: General Forum Members
Points: 21174 Visits: 5789
Interesting debate came up on data warehousing. Is it good to have multiple data warehouses or just one?

To add some context to avoid the ever, "it depends" answer. Let's say you have a data warehouse already built that supports a number of products that are both for reporting purposes and non-reporting purposes. Then another product comes up that is for more analytical purposes that requires data in a more granular fashion and slightly different set of data requirements than what the current data warehouse can support (i.e.: one is very normalized, the other wants to be very denormalized). Does it make sense to split and have two data warehouses supporting different products and use cases between multiple teams?

The arguments here are pretty valid. One data warehouse, one source of truth. They are both using the same data, just stored and used differently. But, there are a number of organizations that have released papers on multiple data warehouse scenarios that basically state, it's hard to satisfy multiple products and use cases with one single solution. So why restrict your data potential and or constantly have to rewrite your existing models when new things come up?

Personally, I see no problem with multiple data warehouses if there is justification and they are both not fighting for resources and causing contention on the target data streams. If they are supporting totally different data requirements, it seems logical to have the split and allow them to grow separately while still working together. Ideally, it would be great to stick to one data warehouse where if you needed separate data specific needs, it would result in a new data mart to be developed for those needs. Yet, that still requires the data warehouse to support those needs at the source.

Is there ever a point where you stop trying to meet all data requirements with your data warehouse and just split?
Eirikur Eiriksson
Eirikur Eiriksson
SSC Guru
SSC Guru (159K reputation)SSC Guru (159K reputation)SSC Guru (159K reputation)SSC Guru (159K reputation)SSC Guru (159K reputation)SSC Guru (159K reputation)SSC Guru (159K reputation)SSC Guru (159K reputation)

Group: General Forum Members
Points: 159119 Visits: 23308
Whenever sets of requirements (Subject/Business) are incompatible with other sets of requirements (Subject/Business), it is appropriate to subset the data into Subject/Business specific Data Marts. My preference is a single Enterprise Data Warehouse as the single source of the truth for the individual Data Marts, that is all Data Marts are fed from the same source.
Cool
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (870K reputation)SSC Guru (870K reputation)SSC Guru (870K reputation)SSC Guru (870K reputation)SSC Guru (870K reputation)SSC Guru (870K reputation)SSC Guru (870K reputation)SSC Guru (870K reputation)

Group: General Forum Members
Points: 870189 Visits: 47415
To wit, I frequently go with the "no data warehouse" option in favor of just writing some streamlined code to solve actual problems instead of a system to handle all problems that have yet to be realized. Sometimes it really is better to go with a dedicated solution rather than trying to catch all possibilities in a generic fashion.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
xsevensinzx
xsevensinzx
SSC-Insane
SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)

Group: General Forum Members
Points: 21174 Visits: 5789
Jeff Moden (11/12/2016)
To wit, I frequently go with the "no data warehouse" option in favor of just writing some streamlined code to solve actual problems instead of a system to handle all problems that have yet to be realized. Sometimes it really is better to go with a dedicated solution rather than trying to catch all possibilities in a generic fashion.


Yeah, that's what I was thinking too. It would still lead to multiple data management systems though, whether you called them a warehouse or not. But it's hard to create specialized data marts if the house does not support the requirements. You constantly have to update and conform in order to have one solution be the solution to all.
Eirikur Eiriksson
Eirikur Eiriksson
SSC Guru
SSC Guru (159K reputation)SSC Guru (159K reputation)SSC Guru (159K reputation)SSC Guru (159K reputation)SSC Guru (159K reputation)SSC Guru (159K reputation)SSC Guru (159K reputation)SSC Guru (159K reputation)

Group: General Forum Members
Points: 159119 Visits: 23308
xsevensinzx (11/13/2016)
Jeff Moden (11/12/2016)
To wit, I frequently go with the "no data warehouse" option in favor of just writing some streamlined code to solve actual problems instead of a system to handle all problems that have yet to be realized. Sometimes it really is better to go with a dedicated solution rather than trying to catch all possibilities in a generic fashion.


Yeah, that's what I was thinking too. It would still lead to multiple data management systems though, whether you called them a warehouse or not. But it's hard to create specialized data marts if the house does not support the requirements. You constantly have to update and conform in order to have one solution be the solution to all.



I agree with the "no data warehouse" if and when it fits the bill, obviously sometimes it does and sometimes it doesn't, correlation of different reporting from those systems is much more error prone and much harder than maintain a single data warehouse. A good example of the latter are situations when working with multiple diverse systems which all hold a slice of the cake but none has the full picture. On the flip side, regulatory and legislative requirements must also be met which often can complicate the picture, certain data cannot be stored, processed or analysed with other data. That situation can force the data the business to use multiple warehouses, i.e. telecom company cannot use Mobile or POTS data for Internet marketing etc.
Cool
MMartin1
MMartin1
SSC-Insane
SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)

Group: General Forum Members
Points: 24644 Visits: 2200
I believe in the one source of truth, as the ideal. From many sources one place to store. It also keeps enforcement on data standards easy since you are working in only one location. I think you are describing reporting across different grains. Here I take the lowest grain possible since it is safe to assume that future requirements are not fully known. If need be you can have snapshots of the data at particular points in time (usually month ends for a business) that aggregate on the grain that you want. You can segment different snapshots to different business areas through using schemas.

----------------------------------------------------
How to post forum questions to get the best help
JesseBizInt
JesseBizInt
Old Hand
Old Hand (350 reputation)Old Hand (350 reputation)Old Hand (350 reputation)Old Hand (350 reputation)Old Hand (350 reputation)Old Hand (350 reputation)Old Hand (350 reputation)Old Hand (350 reputation)

Group: General Forum Members
Points: 350 Visits: 31
In my environment I have many different departments that need to be kept separate from each other, yet share some data elements such as calendars and employee profiles. I work for a tribal government, which means I have to deal with various regulations re: data confidentiality and access. I implement just one data warehouse (SQL Server database) to store all the data, and then I use other databases that just hold views to expose the data to various end user applications. For example, let's say the accounting department and the health clinic both need access to employee data and to the calendar, but each have their own fact tables.

In the data warehouse, you would see:
Core_Dim_Employee
Core_Dim_Date
Accounting_Fact_GrantHours
Health_Fact_PatientEncounters

There would also be two separate databases: HealthDM and AccountingDM (DM standing for data mart)

HealthDM would have these views:
Core_Dim_Employee (WHERE Division='Health' to only include Health division employees)
Core_Dim_Date
Health_Fact_PatientEncounters

AccountingDM would have these views:
Core_Dim_Employee
Core_Dim_Date
Accounting_Fact_GrantHours

For me, this allows me that single source of truth and place to enforce data standards, and for the end user, they can only see the data that they need to see and that interests them. They can point Excel or whatever other tool they want to use at their DM and get the data they need without seeing the other departments' data. If a end user needs a cube built, then I use the appropriate DM for the data source that feeds the cube. Because the DMs only hold views, the data is always matching the data in teh data warehouse.
Chris Harshman
Chris Harshman
SSC-Dedicated
SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)

Group: General Forum Members
Points: 36890 Visits: 7036
xsevensinzx (11/8/2016)
Interesting debate came up on data warehousing. Is it good to have multiple data warehouses or just one?...
Personally, I see no problem with multiple data warehouses if there is justification and they are both not fighting for resources and causing contention on the target data streams. If they are supporting totally different data requirements, it seems logical to have the split and allow them to grow separately while still working together. Ideally, it would be great to stick to one data warehouse where if you needed separate data specific needs, it would result in a new data mart to be developed for those needs. Yet, that still requires the data warehouse to support those needs at the source.

Maybe we need some clarification on what you mean by multiple data warehouses? In general, a data warehouse can consist of multiple fact tables, each of which relates to multiple dimensions. A fact table is usually modeled based on some business process.

Offhand it sounds like you just need a different fact table within the same data warehouse. You probably would not want to change the grain of an existing fact table as that would affect all users who currently do reporting and analysis on it. The new fact table will likely be able to reuse some of your existing dimensions though.

There is lots of good information about designing data warehouses available from the Kimball group:
http://www.kimballgroup.com/data-warehouse-business-intelligence-resources/kimball-techniques/dimensional-modeling-techniques/
MMartin1
MMartin1
SSC-Insane
SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)

Group: General Forum Members
Points: 24644 Visits: 2200
JesseBizInt (12/12/2016)
In my environment I have many different departments that need to be kept separate from each other, yet share some data elements such as calendars and employee profiles. I work for a tribal government, which means I have to deal with various regulations re: data confidentiality and access. I implement just one data warehouse (SQL Server database) to store all the data, and then I use other databases that just hold views to expose the data to various end user applications. For example, let's say the accounting department and the health clinic both need access to employee data and to the calendar, but each have their own fact tables.

In the data warehouse, you would see:
Core_Dim_Employee
Core_Dim_Date
Accounting_Fact_GrantHours
Health_Fact_PatientEncounters

There would also be two separate databases: HealthDM and AccountingDM (DM standing for data mart)

HealthDM would have these views:
Core_Dim_Employee (WHERE Division='Health' to only include Health division employees)
Core_Dim_Date
Health_Fact_PatientEncounters

AccountingDM would have these views:
Core_Dim_Employee
Core_Dim_Date
Accounting_Fact_GrantHours

For me, this allows me that single source of truth and place to enforce data standards, and for the end user, they can only see the data that they need to see and that interests them. They can point Excel or whatever other tool they want to use at their DM and get the data they need without seeing the other departments' data. If a end user needs a cube built, then I use the appropriate DM for the data source that feeds the cube. Because the DMs only hold views, the data is always matching the data in teh data warehouse.


You mention a separate database to hold your views yet you are referencing the accountingDM database as a location for views as well. To me when you need to create a join across databases you have to decide where you need to store that view if they are not all on a dedicated database. Here you could run into the situation where you would need to store two versions of the same view across two databases due to security restrictions if you wanted more than one user to access that view (if they each only had access to one of the respective databases in the view but both needed access to the information in the view).

I would take a look at schemas within a database, just my preference.

----------------------------------------------------
How to post forum questions to get the best help
JesseBizInt
JesseBizInt
Old Hand
Old Hand (350 reputation)Old Hand (350 reputation)Old Hand (350 reputation)Old Hand (350 reputation)Old Hand (350 reputation)Old Hand (350 reputation)Old Hand (350 reputation)Old Hand (350 reputation)

Group: General Forum Members
Points: 350 Visits: 31
Schemas would work just as well in this example. I opted for the separate databases because it was easier to explain to management in my organization rather than getting into how schemas keep things separate, and the permissions are a bit easier (not much, just a bit) to manage. I also forgot to mention that the enterprise data warehouse is named EnterpriseDW so it is seperate from AccountingDM and HealthDM. The DM databases live on the same server as the DW, and yes, some views are redundant so I just script the view as Create To, copy and paste the code, and adjust the filters to deploy.

If I ran into a situation where I needed to change the query definition, I could script that via T-SQL with CREATE VIEW statements to push that out. Unfortunately, politics plays a much larger role in some decisions than we would like.
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum








































































































































































SQLServerCentral


Search