Complex and weird requirement : multi-customers Datawarehouse

  • Hi,

    We have a project of having a kind of premade-standard-retailler-datawarehouse in the Cloud (hosted on a SQL Server somewhere, and with reports on that).

    We send a little software to different customers having the same source system, lets say SAP. Then, they download it, install it, enter the SAP connexion information, then click "EXTRACT MY DATA". The data is then extracted from the source system and sent over the web to my Cloud Datawarehouse.

    We may have 10...20..50...500 customers...depending on the strenght of the sales team...

    I want to have only ONE datawarehouse (ONE fact table and ONE version of each Dims). They will be partitionned over a key that uniquely identify each customers (ranging from 1 to ...).

    Note :

    I was thinking to create one database for each customers (lot of sense), or maybe on schema

    for each customers in the same database (less sense),... but then I'll have to maintain 40 versions of everything (fact / dinensions)...not good.

    So my fact table (and every dimensions) have this key (uniqueCustomerKey) to partition the data. One partition for each customer.

    My questions: Is that make sense?

    Each customers wont have a lot of data in the fact table (max 3-4 millions of records each MAX).

    Do anyone see a major caveat in my solution?

    The fact table will then be clustered on the partitionKey + the date key

    The dimension tables will be partition also the same way (partitionkey + dimKey)

    thanks for you lights on that!

    Simon

  • I think this is more or less the way I would do it as well. The one caveat that I see is customer level security will have to be designed carefully. The last thing you want is for customer B to see some data from Customer A on one of their reports.

  • Hi There, i would put it all in one fact table too, i'd consider clustering / partitioning it the other way though, i.e. DateKey, CustomerKey.

    If most of the data warehouse queries include the datekey as well as the CustomerKey then this will generally be a better way of doing it. I'm assuming your loading your data in on daily increments? If you have the CustomerKey first then the index will be constantly re-shuffling around on the disk, where as putting the datekey first will make it all nice and contiguous, you'll get better scan speeds from that too. You should have a quick read of this http://msdn.microsoft.com/en-us/library/dd425070(v=sql.100).aspx too it really does help with this sort of thing.

    What version / edition of SQL are you using?

    Bob.

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply