November 9, 2009 at 10:10 pm
I want to increase the size of my data warehouse. My professor has suggested that I create a new dimension with 5 rows and join it tot he fact table (one-to-one relationship). that makes the DW 5 times bigger.
I'm new to SQL Server BI. I've created new dimension table in the data warehouse.
But how do I join it to the fact table?
I can only add the key column of the new dimension to the fact table by checking "Allow Nulls" - but that doesn't populate the fact table or result in a one-to-one relationship I'm desiring.
So, how do I join new dimension to the existing dimensional model with all the data intact and add additional rows and column to the fact table?
Please help!
I'm using the sample database: AdventureWorksDW
November 10, 2009 at 9:06 am
This sounds like a homework question!
What do you mean by "add a dimension with 5 rows"? Do you mean 5 columns? You will need create a relationship between your Fact table(s) by adding the new dimension's key column to the Fact table(s). There is nothing to prevent you from specifying that the new column in the Fact table(s) can allow NULLS, or you can specify a default value.
Regards
Lempster
November 10, 2009 at 11:05 am
I meant 5 "rows", not columns. There are only two columns: ID and Name
Steps I followed:
In the object explorer, Go to tables -> Right Click Fact Table -> Modify
You can add a column - ID of the new dimension table, There is a Allow Nulls check box for each column.
When you try to save the change, an error message appears - among other things it says that the column should be allowed to have nulls.
Remember that there is already data in the data warehouse
November 10, 2009 at 11:24 am
Presumably your Fact table is linked to several dimension tables? You will need to change your ETL process to populate the Fact table with data from the new dimension, or you can allow nulls (like the error message says), or you can manually add the data as you're only talking about 5 rows. How do you know what data should be added though?
Regards
Lempster
November 10, 2009 at 12:00 pm
Yes, the Fact Table is already linked to other dimensions and has data in it.
I'm new to ETL. Can you give me steps how to proceed?
I basically want to increase the size of the fact table. Any dummy data is fine(it's just for project work).
There are 5 rows in dimension table, that corresponds to vast number of rows in the fact table.
say fact table has three dimensions with 10 rows each, then the fact table can have 1000 rows. When I add another dimension with 5 rows, the size of the fact table can increase upto 5000 rows (provided there is a fact for each combination of the dimensions).
So, essentially I need to add 4000 rows in the fact table to accommodate the new dimension.
Do I have it right?
I'm not sure about how to add those rows though..
So, if there is any way to do this bulk loading easily, it would be very helpful.
November 10, 2009 at 12:54 pm
Can you tell me what effect joining an extra dimension table to the fact table (already connected to other dimensions and is loaded with data and without adding additional rows in the fact table) has?
How significant will be the increase in the size of the data warehouse as a result?
Say the reference key is 4bytes - then the size of the fact table will increase by 4000 bytes. isn't it? (which is not much) Did I get it right?
November 11, 2009 at 7:28 am
pulipatisireesha (11/10/2009)
I'm new to ETL. Can you give me steps how to proceed?
Phew!! That's a rather large topic and impossible to cover in a forum post. I suggest you get hold of a very good book called 'The Microsoft Data Warehouse Toolkit' by Joy Mundy and Warren Thornthwaite.
If I understand correctly what you want do, you have a Fact table with (say) 1000 rows containing surrogate keys to 4 dimension tables and you now want to add surrogate keys to a 5th dimension table, but only for 5 out of the 1000 rows in the Fact table. Is that right?
If so (and you don't want to have nulls in the Tact table - which is commendable!) then you can add the new column to the Fact table and specify a default value, e.g. -1
You'll now have all 1000 rows of thew Fact table populated with -1 in the new column.
The problem I forsee is how do you know which 5 rows in the Fact table correspond to the 5 rows in your new dimension table (assuming that the fact data is in already in the Fact table)? If the fact data corresponding to the 5 rows in your new dimension does not already exist in the Fact table then your source system data needs to be able to do a lookup against the new dimension table to retieve the appropriate surrogate key from the dimension table.....that's where the ETL process comes in.
Let's assume for a minute that you manage to sort of all of this out. You'll then have a Fact table populated with either 1000 or 1005 rows (depending on whether the fact data was already in the Fact table or still in the source system) and 995 or 1000 of those rows will have -1 in the column that you added. An advantage of this is that you will be able to see exactly when you started populating the fact table with meaningful data in the new column - assuming that one of your other dimensions is Date. The value of -1 is essentially referring to an 'unknown member' of the 5th dimension. (This sounds a bit like a sci-fi film doesn't it?! :-D)
As you'll no doubt have gathered from all of the above, it is far better to get the dimensional model right at the beginning rather than trying to add dimensions after you've started populating your Fact table(s).
In terms of your sizing questions, yes, those figures sound about right.
I hope you are not too confused by all of the above - it's not a simple topic!
Regards
Lempster
Viewing 7 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply
This website stores cookies on your computer.
These cookies are used to improve your website experience and provide more personalized services to you, both on this website and through other media.
To find out more about the cookies we use, see our Privacy Policy