Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

DWH : Fact table : index creation. Expand / Collapse
Author
Message
Posted Friday, February 15, 2013 3:33 AM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Tuesday, March 26, 2013 8:41 AM
Points: 2,562, Visits: 3,451
Below is the schema of my FACT table

table : Fact.POLICY
Size : 165 GB


Policy_transactional_ID (surrogate key )
Policy_ID
Audit_Trail_ID (FK)
Journalization_ID (FK)
Program_ID (FK)
Product_ID (FK)
Producer_ID (FK)
Surplus_Line_ID (FK)
Policy_ID (FK)
Account Number
Acquisition Costs
monthly cost
total_gain_producer_amt
Amount
Attachment Point
Base Amount
'
'
'
'
'
'
starttime
Endtime


approach 1 :

Policy_transactional_ID identity , not in use of any query , should i make it PK with non clu index
clus index : policy_id + starttime
and other non clus indexes on ( every ID column + starttime )

approach 2 :

Policy_transactional_ID : PK + clus index
and other non clus indexes on ( every ID column + starttime )

people say we should go with case 1 , but why to aligned/sort the data physical on column which is not being used in queries. my take would be case 2


please guide











-------Bhuvnesh----------
While 1 = 1 (Learning SQL....)
Click to get fast response of your post
Post #1420463
Posted Friday, February 15, 2013 4:04 AM
SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Yesterday @ 3:38 AM
Points: 803, Visits: 2,123
It depends, what is the ETL process, is it a snapshot Fact (as is) or incremental with Insert new update existing, or Contra correction logic as the decision will have a big impact on which route you go down.


_________________________________________________________________________
SSC Guide to Posting and Best Practices
Post #1420471
Posted Friday, February 15, 2013 5:01 AM


Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Today @ 9:25 AM
Points: 595, Visits: 2,137
Jason-299789 (2/15/2013)
It depends, what is the ETL process, is it a snapshot Fact (as is) or incremental with Insert new update existing, or Contra correction logic as the decision will have a big impact on which route you go down.


Agreed +1

Putting the clustered index on the startdate column could speed up your select statements providing they use startdate but will also slow down your inserts. In addition I would look at the queries that are hitting this table and think do you really need all those non clustered indexes if yes then fair enough but worth doing the check just to see?

If you are doing a nightly load rather than an incremental you could test your ETL's by dropping all of your indexes and re-creating and determining the impact so see which is more efficient

Cheers

Andy


==========================================================================================================================
A computer lets you make more mistakes faster than any invention in human history - with the possible exceptions of handguns and tequila. Mitch Ratcliffe
Post #1420479
Posted Monday, February 18, 2013 1:09 AM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Tuesday, March 26, 2013 8:41 AM
Points: 2,562, Visits: 3,451
Jason-299789 (2/15/2013)
It depends, what is the ETL process, is it a snapshot Fact (as is) or incremental with Insert new update existing,
it will be "incremental with Insert new update existing".

And how much indentity column as surrogate key will help the other non clus index although SK will not be part of any query.


-------Bhuvnesh----------
While 1 = 1 (Learning SQL....)
Click to get fast response of your post
Post #1421054
Posted Monday, February 18, 2013 1:11 AM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Tuesday, March 26, 2013 8:41 AM
Points: 2,562, Visits: 3,451
Andy Hyslop (2/15/2013)
[quote][b]could test your ETL's by dropping all of your indexes and re-creating and determining the impact so see which is more efficient
Does it cost intensive if we dropp the indexes and rebuild them every time we upload the data ?


-------Bhuvnesh----------
While 1 = 1 (Learning SQL....)
Click to get fast response of your post
Post #1421055
Posted Monday, February 18, 2013 3:21 AM


Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Today @ 9:25 AM
Points: 595, Visits: 2,137
Bhuvnesh (2/18/2013)
Andy Hyslop (2/15/2013)
[quote][b]could test your ETL's by dropping all of your indexes and re-creating and determining the impact so see which is more efficient

Does it cost intensive if we dropp the indexes and rebuild them every time we upload the data ?


Sorry for this answer, but it does depend on the volume of data..

I assume you were referring to the nightly loads (as dropping indexes for an incremental will carry more overhead than its worth)..

For a nightly load yes dropping and re-creating the indexes will have an impact but you will need to try and assess both strategies and see which performs better i.e. perform two loads one with indexes and one dropping and re-creating the indexes and test..

In my experience dropping and recreating generally works faster and has the added benefit of generating a fresh index..

Andy



==========================================================================================================================
A computer lets you make more mistakes faster than any invention in human history - with the possible exceptions of handguns and tequila. Mitch Ratcliffe
Post #1421123
Posted Monday, February 18, 2013 4:30 AM
SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Yesterday @ 3:38 AM
Points: 803, Visits: 2,123
Andy Hyslop (2/18/2013)
Bhuvnesh (2/18/2013)
Andy Hyslop (2/15/2013)
[quote][b]could test your ETL's by dropping all of your indexes and re-creating and determining the impact so see which is more efficient

Does it cost intensive if we dropp the indexes and rebuild them every time we upload the data ?


Sorry for this answer, but it does depend on the volume of data..

I assume you were referring to the nightly loads (as dropping indexes for an incremental will carry more overhead than its worth)..

For a nightly load yes dropping and re-creating the indexes will have an impact but you will need to try and assess both strategies and see which performs better i.e. perform two loads one with indexes and one dropping and re-creating the indexes and test..

In my experience dropping and recreating generally works faster and has the added benefit of generating a fresh index..

Andy



+1, but would add that there are only two indexes I wouldn't drop, the first is the clustered index, the other is the covering Business key index for the ETL process, especially on tables that are substantial.

But would agree its very much a suck it and see which works best for you, try it without the drop indexes then try the same load with drop indexes, see which one gives you the best solution.


_________________________________________________________________________
SSC Guide to Posting and Best Practices
Post #1421134
Posted Monday, February 18, 2013 7:27 AM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Yesterday @ 1:08 PM
Points: 2,982, Visits: 4,396
Bhuvnesh (2/18/2013)
Andy Hyslop (2/15/2013)
[quote][b]could test your ETL's by dropping all of your indexes and re-creating and determining the impact so see which is more efficient
Does it cost intensive if we dropp the indexes and rebuild them every time we upload the data ?


Dropping the indexes during ETL would mean misserable performance for users querying the table, also adding overhead wher building index back - former means potential performance impact for other queries.

In regards to indexing strategy, here are my two cents.
Clustered unique index supporting your PK
Non-clustered, non-unique indexes on every FK
Other non-clustered, non-unique indexes depending on actual queries.

Hope this helps.


_____________________________________
Pablo (Paul) Berzukov

Author of Understanding Database Administration available at Amazon and other bookstores.

Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
Post #1421214
Posted Monday, February 18, 2013 2:14 PM
Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Today @ 2:55 PM
Points: 1,319, Visits: 1,770
Under no circumstances should you cluster this table on Policy_transactional_ID.

Cluster the table by the most common WHERE conditions; make the clustered key unique by including the Policy_transactional_ID on the end of the clus key but only if necessary.

[You can add a separate unique nonclustered PK if you absolutely need it, although I don't think you should need it in this type of situation.]

Besides, the date added should be ascending anyway, other than corrections. Sort the new data being added into the clustering key sequence.

Keep in mind: a given row is INSERTed only once, but SELECTed thousands (or millions!) of times.


SQL DBA,SQL Server MVP('07, '08, '09)
One man with courage makes a majority. Andrew Jackson
Post #1421350
Posted Monday, February 18, 2013 11:25 PM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Tuesday, March 26, 2013 8:41 AM
Points: 2,562, Visits: 3,451
ScottPletcher (2/18/2013)
Under no circumstances should you cluster this table on Policy_transactional_ID.

Cluster the table by the most common WHERE conditions; make the clustered key unique by including the Policy_transactional_ID on the end of the clus key but only if necessary.


Lets take an scenario,
i have PK on Policy_transactional_ID and most of the queries are using starttime and policy_number in WHERE clause

then which option will work better for clustered index.

approach 1 : starttime + policy number (sql server itself add RID to make the records unique)
approach 2 : srarttime + policy number + Policy_transactional_ID.

and which apprach will have more overhead from space and resource perspective.



-------Bhuvnesh----------
While 1 = 1 (Learning SQL....)
Click to get fast response of your post
Post #1421441
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse