Filtering Customer Specific Data

  • Hi all! Your help would be appreciated.

    • We have a data warehouse that is a single tenant design with multiple reports and widgets that query the data
    • The underlying structure of the DW is modelling invoice data - so essentially header records and line items
    • The widgets and reports show pretty much what you would expect:

      • Sales by Month
      • Sales by Supplier
      • Sales by Customer
      • etc.

    • For the purposes of this example let's assume there are 20 different reports all pointing at the DW Invoice structure
    • When we load the DW in the ETL we have one set of scripts that is used by all the customers and we have tried very hard to stay away from customizing those scripts.
    • Although we have approximately 100 customers for this example I will limit it to 3 customers as they all are asking for something different:

      • Customer Number 1 wants all the reporting to go off all the invoice data in the DW
      • Customer #2 wants the reporting to exclude 3 specific suppliers and 2 specific customers
      • Customer #3 wants to exclude any invoice that has a purchase order number that starts with "TF" and exclude any invoice that has a product code of "Internal Surcharge" on it.

    The fundamental issue is that we have 20 different reports that all point at the same structure but need to filter the data depending on the customer needs.

    Here is the only caveat:  I do not want to exclude loading any of the invoice data into the data warehouse during the ETL.  Why?

    1. I do not want to maintain custom ETLs for each of my customers as it makes future upgrades to the process and subsequent testing much more difficult.
    2. Because I am convinced that eventually customer 2 and customer 3 will ask us to create new reports that operate on the data that they asked us to exclude.  If that is the case I need to have access to that data in the DW at some point.

    We have come up with a couple of solutions and am I curious if we are down the right track with any of them:

    Change the queries

    Alter all the queries for all the reports to exclude the required data.  This is high maintenance and as soon as customer 4 comes along and asks to filter on something else, we have to rework all the queries.  I am not thrilled with this idea.

    Create a View on top of the tables

    If we create a view that sits on top of every table and force the reports to query the view then we can control the filtering by customer without necessarily having to retest every single customer installation.

    The default installed view over top the invoice table would be: select * from invoice

    For customer 2 it would be: select * from invoice where supplier not in (a,b,c) and customer not in (1,2)

    For customer 3 it would be: select * from invoice where ponumber not like 'tf%' and invoicekey not in (select invoicekey from invoicelineitem where productcode = "Internal Surcharge")

    This would also mean that we could probably change 2 to 3 views and affect many of the reports.  Additionally when we do upgrades we would only ever change the underlying structures of the tables and the views would go untouched as they would be customized for each customer.

    Flagging the invoices

    One of the developers came up with this and it has caught my eye --- I cannot seem to find any examples of this technique being used.

    We would add a bit flag to both the invoice and invoice lineitem tables that would represent "include this data in the base reports".  A value of 1 would mean include it and a value of 0 would mean exclude it.

    We would then need to alter all the standard/base reports to filter the data to only include rows where "include this data in the base reports" is set to 1.

    Since all the data is in the DW we can subsequently add more reports at a later time that make use of the excluded data.  Those reports could just ignore the flag.

    So how does the bit get set?  During the ETL process we would have a stored procedure or function that would take in the invoice number and check the data against the constraints of the customer and decide whether to mark the data with a 1 or 0.  The benefit of this method is that all the customer specific code is centralized to a very small number of stored procedures or functions.  The downside is that if they ever retroactively change their criteria we have some work to do (not a huge deal)

    Thoughts?  Does anyone have any links they could pass along?

     

    Thanks in advance

    Tony

  • Are the customers able to export the data from the reports into something like Excel?  If so, then problem solved - push it back on them.

    If they CAN'T export to Excel, can they use your reports as a data source for Excel and pull data and filter it how they need?

    Alternately, if the customer has the ability to build reports based on your reports, they could build their own reports.

    And another option - add the filters into the reports for the fields they want filtered.  If the reports are SSRS, the filters can be inserted into the URL, so they could set the filters that they need directly into the report.

    Now, I would avoid  creating custom queries as it just makes more work for you.  I would also avoid making views on top of the tables as again, this makes more work for you.  I would be trying to push this off to the customer as much as possible.

    Flagging the invoices is a method we use and it is messy and gets complicated quickly.  If you want to go that route, our approach was to set the flag in the ETL process.  What makes it hard is we ended up with multiple flags.  For example, a consumer of the data wanted to see data where item A cost more than what it was sold for.  This isn't hard... but then they add in except when it is sold as a 'spare part"... ok,.. not too bad... but show it when it is sold as a spare part to customer C... and the requirements list grew with each iteration of the data we sent out.  I think we have over 90 different flags on that data set now.

    The other risk with all 3 of the methods you mentioned is it could be that user X at company Y wants to see the data that way.  But when User X quits the company and didn't document things well, their replacement MAY not have any clue how the filters were applied to the data and may actually expect to see things that are being filtered out.

     

    TL;DR - I would add the filters as options in the reporting framework.  This allows consumers of the data to dictate what they see and does not hide any business logic from the consumers.  Hidden business logic WILL be forgotten why it was put in place in the future and it will somehow be your fault that the report was hiding data they needed to see.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Just a thought ...

    What about an exclusion table (or set of tables) that you can populate per customer.  Then modify your queries to exclude data based on the config table.

    As each customer changes their filtering requirement,you simply INS/UPD/DEL the data in the exclusion table(s).

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply