April 1, 2010 at 12:31 am
Hi everyone,
I m new to this forum.In fact my issues made me join the forum.some of my friend told me that normally all kind experts on Sql server are available in here.
I have a DSS(Decision Support System) on SQL server 2005.DSS is built on the star schema concept.I have 20 Different Fact table which contains Daily and monthly Facts.I have 15 Dimension Table.
The daily Facts are loaded in every 15 days.
Monthly Facts gets Loaded once in every month.
Few Dimension Table gets Loaded on incremental basis but Daily.
Few Dimension Table gets Loaded on incremental basis but monthly.
Fact Tables has at a average 300,000,000 records.
Dimension tables has at a average 50,000,000 records.
Now the issue is i have been assigned to find the discrepancies in the Database.I have found few discrepancies but not sure if they are right. The team is ready to implement the changes but the data is too crucial so i want to be 100 % sure about the performance elevation.The changes i m suggesting should not result in performance overhead.
The existing structure DSS has is like:
Few Fact Tables has
Non Clustered Index and Primary Key on Unique int column which is not used for joins
Clustered Index on composite key which is combination of 6 to 7 columns(includes int,varchar and datetime fields) to get a unique data.
and multiple Non Clustered Indexes on Fields on which are queried on daily basis.
Few Dimension Tables has
Non Clustered Index and Primary Key on Unique(but not neccesarily (int) it can be varchar also though other unique (int) field is available) column which is not used for joins
Clustered Index on composite key which is combination of 6 to 7 columns(includes int,varchar and datetime fields) to get a unique data.
And Rest of the Fact And Dimension Table are are not having any indexing data is in heap structure.
My Suggestion is.
All The Fact and Dimension Tables which are used should have index (most probably clustered index on a unique int field,non unique then also on the int field).As sql server itself assigns a int unique-ifier key along with non unique index key to make it unique. but here the size will be restricted to just 8 bytes per key.
Non clustered unique index with primary key on unique int field.
Non clustered index on fields which are queried heavily.
plus i m working on removing redundant data from the respective fact and dimension tables.
Now my questions are:
1. Are my suggestions correct as per the current situation, will this elevate the performance of the DSS???
2. How much time will it take to apply all the indexes what i m suggesting assuming that i have a descent hardware configuration ??
3. how much clustered index on a non unique int field is efficient than a clustered index on unique composite key comprising of all sorts of data ???
Regards,
Vel
April 7, 2010 at 7:51 am
Too many issues/questions in a single post, lets go to the basics.
Lets use a HHRR datamart as an example. In a star-schema like the one you describe you have a FACTual table like EmployeesTbl that includes both attributes like DepartmentId, TitleId, etc. and measures like CurrentSalary for each EmployeeId; it also includes DIMensional tables that describe each one of the attributes like DepartmentTbl, TitleTbl, etc.
In this scenario a unique index or PK should be built on EmployeeTbl.EmployeeId, DepartmentTbl.DepartmentId and TitleTbl.TitleId
Since in a DSS environment queries join DIMensional tables based on measures that are sitting on FACTual tables like... "gimme the employee #, department and title of employees making more than $100K" it is a common practice (*) to build non-unique indexes on the FACTual table for each attribute described in a DIMensional table, in this case EmployeeTbl.DepartmentId, EmployeeTbl.TitleId, etc. It is not uncommon (*) to define a FK on these indexes.
Hope this helps you to start moving.
(*) in the Oracle world where you want to take advantage of "star-transformation" and "query-rewrite" features it is a must to create such indexes (as bitmap indexes) and define those FKs (as disable-rely-novalidate).
_____________________________________
Pablo (Paul) Berzukov
Author of Understanding Database Administration available at Amazon and other bookstores.
Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.Viewing 2 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply