|
|
|
SSCrazy
      
Group: General Forum Members
Last Login: Tuesday, March 26, 2013 8:41 AM
Points: 2,562,
Visits: 3,451
|
|
Below is the schema of my FACT table
table : Fact.POLICY Size : 165 GB
Policy_transactional_ID (surrogate key ) Policy_ID Audit_Trail_ID (FK) Journalization_ID (FK) Program_ID (FK) Product_ID (FK) Producer_ID (FK) Surplus_Line_ID (FK) Policy_ID (FK) Account Number Acquisition Costs monthly cost total_gain_producer_amt Amount Attachment Point Base Amount ' ' ' ' ' ' starttime Endtime
approach 1 :
Policy_transactional_ID identity , not in use of any query , should i make it PK with non clu index clus index : policy_id + starttime and other non clus indexes on ( every ID column + starttime )
approach 2 :
Policy_transactional_ID : PK + clus index and other non clus indexes on ( every ID column + starttime )
people say we should go with case 1 , but why to aligned/sort the data physical on column which is not being used in queries. my take would be case 2
please guide
-------Bhuvnesh---------- While 1 = 1 (Learning SQL....) Click to get fast response of your post
|
|
|
|
|
SSC Eights!
      
Group: General Forum Members
Last Login: Yesterday @ 3:38 AM
Points: 803,
Visits: 2,123
|
|
It depends, what is the ETL process, is it a snapshot Fact (as is) or incremental with Insert new update existing, or Contra correction logic as the decision will have a big impact on which route you go down.
_________________________________________________________________________ SSC Guide to Posting and Best Practices
|
|
|
|
|
Mr or Mrs. 500
      
Group: General Forum Members
Last Login: Today @ 9:25 AM
Points: 595,
Visits: 2,137
|
|
Jason-299789 (2/15/2013) It depends, what is the ETL process, is it a snapshot Fact (as is) or incremental with Insert new update existing, or Contra correction logic as the decision will have a big impact on which route you go down.
Agreed +1
Putting the clustered index on the startdate column could speed up your select statements providing they use startdate but will also slow down your inserts. In addition I would look at the queries that are hitting this table and think do you really need all those non clustered indexes if yes then fair enough but worth doing the check just to see? 
If you are doing a nightly load rather than an incremental you could test your ETL's by dropping all of your indexes and re-creating and determining the impact so see which is more efficient
Cheers
Andy
========================================================================================================================== A computer lets you make more mistakes faster than any invention in human history - with the possible exceptions of handguns and tequila. Mitch Ratcliffe
|
|
|
|
|
SSCrazy
      
Group: General Forum Members
Last Login: Tuesday, March 26, 2013 8:41 AM
Points: 2,562,
Visits: 3,451
|
|
Jason-299789 (2/15/2013) It depends, what is the ETL process, is it a snapshot Fact (as is) or incremental with Insert new update existing, it will be "incremental with Insert new update existing".
And how much indentity column as surrogate key will help the other non clus index although SK will not be part of any query.
-------Bhuvnesh---------- While 1 = 1 (Learning SQL....) Click to get fast response of your post
|
|
|
|
|
SSCrazy
      
Group: General Forum Members
Last Login: Tuesday, March 26, 2013 8:41 AM
Points: 2,562,
Visits: 3,451
|
|
Andy Hyslop (2/15/2013) [quote][b]could test your ETL's by dropping all of your indexes and re-creating and determining the impact so see which is more efficient Does it cost intensive if we dropp the indexes and rebuild them every time we upload the data ?
-------Bhuvnesh---------- While 1 = 1 (Learning SQL....) Click to get fast response of your post
|
|
|
|
|
Mr or Mrs. 500
      
Group: General Forum Members
Last Login: Today @ 9:25 AM
Points: 595,
Visits: 2,137
|
|
Bhuvnesh (2/18/2013)
Andy Hyslop (2/15/2013) [quote][b]could test your ETL's by dropping all of your indexes and re-creating and determining the impact so see which is more efficient
Does it cost intensive if we dropp the indexes and rebuild them every time we upload the data ?
Sorry for this answer, but it does depend on the volume of data..
I assume you were referring to the nightly loads (as dropping indexes for an incremental will carry more overhead than its worth)..
For a nightly load yes dropping and re-creating the indexes will have an impact but you will need to try and assess both strategies and see which performs better i.e. perform two loads one with indexes and one dropping and re-creating the indexes and test..
In my experience dropping and recreating generally works faster and has the added benefit of generating a fresh index..
Andy
========================================================================================================================== A computer lets you make more mistakes faster than any invention in human history - with the possible exceptions of handguns and tequila. Mitch Ratcliffe
|
|
|
|
|
SSC Eights!
      
Group: General Forum Members
Last Login: Yesterday @ 3:38 AM
Points: 803,
Visits: 2,123
|
|
Andy Hyslop (2/18/2013)
Bhuvnesh (2/18/2013)
Andy Hyslop (2/15/2013) [quote][b]could test your ETL's by dropping all of your indexes and re-creating and determining the impact so see which is more efficient
Does it cost intensive if we dropp the indexes and rebuild them every time we upload the data ? Sorry for this answer, but it does depend on the volume of data.. I assume you were referring to the nightly loads (as dropping indexes for an incremental will carry more overhead than its worth).. For a nightly load yes dropping and re-creating the indexes will have an impact but you will need to try and assess both strategies and see which performs better i.e. perform two loads one with indexes and one dropping and re-creating the indexes and test.. In my experience dropping and recreating generally works faster and has the added benefit of generating a fresh index.. Andy
+1, but would add that there are only two indexes I wouldn't drop, the first is the clustered index, the other is the covering Business key index for the ETL process, especially on tables that are substantial.
But would agree its very much a suck it and see which works best for you, try it without the drop indexes then try the same load with drop indexes, see which one gives you the best solution.
_________________________________________________________________________ SSC Guide to Posting and Best Practices
|
|
|
|
|
SSCrazy
      
Group: General Forum Members
Last Login: Yesterday @ 1:08 PM
Points: 2,982,
Visits: 4,396
|
|
Bhuvnesh (2/18/2013)
Andy Hyslop (2/15/2013) [quote][b]could test your ETL's by dropping all of your indexes and re-creating and determining the impact so see which is more efficient Does it cost intensive if we dropp the indexes and rebuild them every time we upload the data ?
Dropping the indexes during ETL would mean misserable performance for users querying the table, also adding overhead wher building index back - former means potential performance impact for other queries.
In regards to indexing strategy, here are my two cents. Clustered unique index supporting your PK Non-clustered, non-unique indexes on every FK Other non-clustered, non-unique indexes depending on actual queries.
Hope this helps.
_____________________________________ Pablo (Paul) Berzukov
Author of Understanding Database Administration available at Amazon and other bookstores.
Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
|
|
|
|
|
Ten Centuries
      
Group: General Forum Members
Last Login: Today @ 2:55 PM
Points: 1,319,
Visits: 1,770
|
|
Under no circumstances should you cluster this table on Policy_transactional_ID.
Cluster the table by the most common WHERE conditions; make the clustered key unique by including the Policy_transactional_ID on the end of the clus key but only if necessary.
[You can add a separate unique nonclustered PK if you absolutely need it, although I don't think you should need it in this type of situation.]
Besides, the date added should be ascending anyway, other than corrections. Sort the new data being added into the clustering key sequence.
Keep in mind: a given row is INSERTed only once, but SELECTed thousands (or millions!) of times.
SQL DBA,SQL Server MVP('07, '08, '09) One man with courage makes a majority. Andrew Jackson
|
|
|
|
|
SSCrazy
      
Group: General Forum Members
Last Login: Tuesday, March 26, 2013 8:41 AM
Points: 2,562,
Visits: 3,451
|
|
ScottPletcher (2/18/2013) Under no circumstances should you cluster this table on Policy_transactional_ID.
Cluster the table by the most common WHERE conditions; make the clustered key unique by including the Policy_transactional_ID on the end of the clus key but only if necessary.
Lets take an scenario, i have PK on Policy_transactional_ID and most of the queries are using starttime and policy_number in WHERE clause
then which option will work better for clustered index.
approach 1 : starttime + policy number (sql server itself add RID to make the records unique) approach 2 : srarttime + policy number + Policy_transactional_ID.
and which apprach will have more overhead from space and resource perspective.
-------Bhuvnesh---------- While 1 = 1 (Learning SQL....) Click to get fast response of your post
|
|
|
|