We have a project of having a kind of premade-standard-retailler-datawarehouse in the Cloud (hosted on a SQL Server somewhere, and with reports on that).
We send a little software to different customers having the same source system, lets say SAP. Then, they download it, install it, enter the SAP connexion information, then click "EXTRACT MY DATA". The data is then extracted from the source system and sent over the web to my Cloud Datawarehouse.
We may have 10...20..50...500 customers...depending on the strenght of the sales team...
I want to have only ONE datawarehouse (ONE fact table and ONE version of each Dims). They will be partitionned over a key that uniquely identify each customers (ranging from 1 to ...).
I was thinking to create one database for each customers (lot of sense), or maybe on schema
for each customers in the same database (less sense),... but then I'll have to maintain 40 versions of everything (fact / dinensions)...not good.
So my fact table (and every dimensions) have this key (uniqueCustomerKey) to partition the data. One partition for each customer.
My questions: Is that make sense?
Each customers wont have a lot of data in the fact table (max 3-4 millions of records each MAX).
Do anyone see a major caveat in my solution?
The fact table will then be clustered on the partitionKey + the date key
The dimension tables will be partition also the same way (partitionkey + dimKey)
thanks for you lights on that!