Hi all! Your help would be appreciated.
- We have a data warehouse that is a single tenant design with multiple reports and widgets that query the data
- The underlying structure of the DW is modelling invoice data - so essentially header records and line items
- The widgets and reports show pretty much what you would expect:
- Sales by Month
- Sales by Supplier
- Sales by Customer
- For the purposes of this example let's assume there are 20 different reports all pointing at the DW Invoice structure
- When we load the DW in the ETL we have one set of scripts that is used by all the customers and we have tried very hard to stay away from customizing those scripts.
- Although we have approximately 100 customers for this example I will limit it to 3 customers as they all are asking for something different:
- Customer Number 1 wants all the reporting to go off all the invoice data in the DW
- Customer #2 wants the reporting to exclude 3 specific suppliers and 2 specific customers
- Customer #3 wants to exclude any invoice that has a purchase order number that starts with "TF" and exclude any invoice that has a product code of "Internal Surcharge" on it.
The fundamental issue is that we have 20 different reports that all point at the same structure but need to filter the data depending on the customer needs.
Here is the only caveat: I do not want to exclude loading any of the invoice data into the data warehouse during the ETL. Why?
- I do not want to maintain custom ETLs for each of my customers as it makes future upgrades to the process and subsequent testing much more difficult.
- Because I am convinced that eventually customer 2 and customer 3 will ask us to create new reports that operate on the data that they asked us to exclude. If that is the case I need to have access to that data in the DW at some point.
We have come up with a couple of solutions and am I curious if we are down the right track with any of them:
Change the queries
Alter all the queries for all the reports to exclude the required data. This is high maintenance and as soon as customer 4 comes along and asks to filter on something else, we have to rework all the queries. I am not thrilled with this idea.
Create a View on top of the tables
If we create a view that sits on top of every table and force the reports to query the view then we can control the filtering by customer without necessarily having to retest every single customer installation.
The default installed view over top the invoice table would be: select * from invoice
For customer 2 it would be: select * from invoice where supplier not in (a,b,c) and customer not in (1,2)
For customer 3 it would be: select * from invoice where ponumber not like 'tf%' and invoicekey not in (select invoicekey from invoicelineitem where productcode = "Internal Surcharge")
This would also mean that we could probably change 2 to 3 views and affect many of the reports. Additionally when we do upgrades we would only ever change the underlying structures of the tables and the views would go untouched as they would be customized for each customer.
Flagging the invoices
One of the developers came up with this and it has caught my eye --- I cannot seem to find any examples of this technique being used.
We would add a bit flag to both the invoice and invoice lineitem tables that would represent "include this data in the base reports". A value of 1 would mean include it and a value of 0 would mean exclude it.
We would then need to alter all the standard/base reports to filter the data to only include rows where "include this data in the base reports" is set to 1.
Since all the data is in the DW we can subsequently add more reports at a later time that make use of the excluded data. Those reports could just ignore the flag.
So how does the bit get set? During the ETL process we would have a stored procedure or function that would take in the invoice number and check the data against the constraints of the customer and decide whether to mark the data with a 1 or 0. The benefit of this method is that all the customer specific code is centralized to a very small number of stored procedures or functions. The downside is that if they ever retroactively change their criteria we have some work to do (not a huge deal)
Thoughts? Does anyone have any links they could pass along?
Thanks in advance