RE: Do You Really Need Surrogate Keys? Part 1 – Concepts and Methods

SSC Veteran

Points: 211

October 25, 2010 at 4:31 am

An interesting start, but the discussion doesn't go far enough. There needs to be a distinction: what is the target? Is the target OLTP systems? or is the target a data warehouse or an OLAP cube? The function will determine part of (not all of) the need

There is not enough data here to make formal statements. The optimizer will change it's path (as will the results change) for the following points:

The optimizer will change...

* at 500,000 rows, 1M, 5M, 10M, 100M rows.

* what it's doing when joins are introduced across two tables each at 100M rows.

* based on the type of indexes or compound indexes used on the table.

* based on statistics being up-to-date (also the type and depth of statistics run).

* based on the DOP (degree of parallelism)

* based on the amount of RAM

* based on 64 bit or 32 bit core

* based on the version of SQLServer

* based on the size of the Database Block

* Size of the surrogate (not all surrogates can be integer driven)

* Width of the row in the table

* based on the logging

* based on the settings of the database

* based on the partitioning across I/O channels

I have seen too many tests run at "low volume" (76k rows, or 245 rows) and stand up and say: Eureka! I have performance.

Then when the queries make it to production (even a year later), they die a miserable death because of the lack of foresight to test at high volume levels.

I think a test like this should be done, but it should be done to scale (80M rows in 1 table, then 2 tables joined, then 4 tables, then 8 tables joined).

Then it should be done again with different settings (as indicated above).

Great start, keep up the good work, just carry it further.

Cheers,

Dan Linstedt

http://danLinstedt.com