I have some question regarding DM design. I am currently working on a clinical DM. Ultimate goal is to build a SSAS cube and run report from our clinical DM.
Business Scenario is fairly simple: Patient can have multiple hospital visits, and for each visit, patient could have multiple transactions for lab, medicine and procedures.
So far these are the dimension we have:
- Flag Field Junk Dimension
We currently have one fact table. Its granularity is at transaction level. Our per visit level information is also in the fact table as repeating column value.
FK_Paitient / FK_Lab / FK_Medicine / FK_Procedure / 6 Lab measure attributes / 5 Medicine measure attributes (5) / 3 Procedure measure attributes (3) / 8 Visit Level Attributes
Each fact line contains lab result/applied medicines/performed procedures during patient visit to our hospital. So, if patient has 7 transactions during visit, we will have 7 lines in fact tables. Since all transaction's attributes are merged, not attribute columns have some values. Some will have null values depending on the data sources (if data source is from medicine, only the medicine related attribute columns will have data and other will be filled in with null value, if data source is from lab then lab attributes will be filled in and other attributes will be filled in with null).
However, since all transaction gets generated based on patient visits, all visit level attributes information are all populated and repeated 7 times. We were trying to stay in star schema, and we thought if we were to write a report from the current design DM, we would set the data scope based on patient visiting date, and show all tied transactions.
Since this was our first attempt, we realized our design is flawed, and this is where we are stuck and not sure what would be best practice or solution...
1. Is it okay to break fact table into 4? Instead of one giant fact table, create fact table for visit level, lab transactions, medicine transactions, procedure transactions. Each our data scope is determined based on patient visit date, so fact tables will have visit_date_key and some minimal visit level key values.
2. In our analysis scenario, all of analysis is done on visit level. Lab/medicine/procedure transactions are required for report to show some detailed information. (to show MDs what lab tests are done and results for lab tests etc.) We are bring these data for drill through/down purpose only. When we roll up these values, it really means nothing. We can only derive factless(?) fact type of measure from these transaction data (i.e. total number of medicine administered during patient visit). In this situation, is it better to create visit dimension rather than visit fact, and try to derive measure from visit dimension and combine three transaction (medicine, lab, procedure) into one fact table?
3. If we break the fact tables into 4. What would be the proper way to link visit fact to transaction fact tables? It's 1-M from visit to lab/medicine/procedures. Only common dimension for these 4 fact tables would be patient, and visit fact table will act like a dimension...
Any suggestion would be greatly appreciated!