Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


data design considerations


data design considerations

Author
Message
Michael Gerholdt
Michael Gerholdt
SSC Rookie
SSC Rookie (48 reputation)SSC Rookie (48 reputation)SSC Rookie (48 reputation)SSC Rookie (48 reputation)SSC Rookie (48 reputation)SSC Rookie (48 reputation)SSC Rookie (48 reputation)SSC Rookie (48 reputation)

Group: General Forum Members
Points: 48 Visits: 228
Scenario is that I want to have a primary key to identify a new entry in a main table. Call it "record_id." This will also be a foreign key in many other tables.

The most important of those tables will hold a complete history of all changes to the main record. Each record in the history table will have both the record_id as well as a unique history_id.

So I'm envisioning a main table that generates the record_id, and a history table which will FK to the main table as well as generate its own unique internal PK as history_id.

The thing I'm struggling with is that the "main" table seems like it only has one purpose, which is to generate the PK. All the other data would be stored in the history table because each piece of it might potentially change (this is storing data that is altered by users in something like a workflow application). So the main table is reduced to functioning the way a sequence does in Oracle. It holds no data and there's no real reason even to use it beyond its ability to create the record_id.

This isn't really a problem, but it seems odd. Every time I think of other fields that might belong in the main table, such as create-date and such, that argument disappears, because that date will be in the first record of the history table as well as in another audit table.

I get the feeling that something obvious is slipping right past my thought process. Composite key may fit here somehow.

Though maybe I should just relax and allow this "main" table to hold only the record_id and recognize that its usefulness is limited to that - an SQL Server version of an Oracle sequence.

Your thoughts are welcome!
ScottPletcher
ScottPletcher
Hall of Fame
Hall of Fame (4K reputation)Hall of Fame (4K reputation)Hall of Fame (4K reputation)Hall of Fame (4K reputation)Hall of Fame (4K reputation)Hall of Fame (4K reputation)Hall of Fame (4K reputation)Hall of Fame (4K reputation)

Group: General Forum Members
Points: 3957 Visits: 6698
The main table would still hold the entire current record. Even if the history table contains the current record, you don't want to get it from there because of the additional overhead. That table could have dozens or hundreds (or thousands or ...?) copies of the rows, one for each change. When you need the currently version of a row, which is presumably the most common read, it's vastly less overhead to read it from a table that contains only the current version of rows instead of the entire history of all rows.

SQL DBA,SQL Server MVP('07, '08, '09)

Prosecutor James Blackburn, in closing argument in the "Fatal Vision" murders trial: "If in the future, you should cry a tear, cry one for them [the murder victims]. If in the future, you should say a prayer, say one for them. And if in the future, you should light a candle, light one for them."
Michael Gerholdt
Michael Gerholdt
SSC Rookie
SSC Rookie (48 reputation)SSC Rookie (48 reputation)SSC Rookie (48 reputation)SSC Rookie (48 reputation)SSC Rookie (48 reputation)SSC Rookie (48 reputation)SSC Rookie (48 reputation)SSC Rookie (48 reputation)

Group: General Forum Members
Points: 48 Visits: 228
Thanks for your thoughts. I'm not sure I agree - with proper indices and queries, don't think that getting the current record data from the history table has to be a performance hit. But it's helpful to consider from multiple perspectives.

In the meantime, I have identified some attributes that would not change along with other updates, so they could reside in the main table to keep the record_id company.
Evil Kraig F
Evil Kraig F
SSCertifiable
SSCertifiable (5.7K reputation)SSCertifiable (5.7K reputation)SSCertifiable (5.7K reputation)SSCertifiable (5.7K reputation)SSCertifiable (5.7K reputation)SSCertifiable (5.7K reputation)SSCertifiable (5.7K reputation)SSCertifiable (5.7K reputation)

Group: General Forum Members
Points: 5715 Visits: 7660
Getting the current record HAS to be a performance hit. What you're talking about is a logging table. I use them all the time and I prefer them for audit tables, but you only go into them when you need point in time data. Why?

Because of the way indexes work and the mandatory sub-computation to find the most recent record.

Whenever you use a logging (audit, whatever) table, to find the most recent record you have to go in on the key + Max(DateTimeStamp). So, every time you go in, you have to first aggregate, then pull the current record.

This gets worse when you want to, say, pull all items with information in a secondary field, say products in category Z. There's two approaches, both... at best... abuse indexing in ways it really doesn't want to go.

First, you go in, find all recent records, then scan the results.

Second, you go in and find all records with said category, then go in a second time for most recent record evaluation, then match up the two sets.

You'll want to keep your histories out of the primary data points if you intend to do anything but work with a single 'record_id' at a time.


- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions | Forum Netiquette
For index/tuning help, follow these directions. |Tally Tables

Twitter: @AnyWayDBA
Michael Gerholdt
Michael Gerholdt
SSC Rookie
SSC Rookie (48 reputation)SSC Rookie (48 reputation)SSC Rookie (48 reputation)SSC Rookie (48 reputation)SSC Rookie (48 reputation)SSC Rookie (48 reputation)SSC Rookie (48 reputation)SSC Rookie (48 reputation)

Group: General Forum Members
Points: 48 Visits: 228
Thanks, Evil Kraig F.

I see your points.

I had thought to avoid some of that by having a column in the main table which would hold the current (i.e. max) history_id for that record_id and then join the tables on history_id.

Rather than storing all the data for the current record twice, just get the data out of the history table where the history_id = the one that's in the main table. So every time the history table is updated, so is the main table - but not with the entire row of data, just the history_id. With an index on the history_id in the history table, pulling the current record in this way should not be expensive; and that seems like a correct way to use an index.

Wrong?

We will also be doing some flattening/denormalization for reporting purposes, so there won't be a need to crawl the history table on a regular basis.
Grant Fritchey
Grant Fritchey
SSCoach
SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)

Group: General Forum Members
Points: 17653 Visits: 32273
I worked with a design for several years that allowed for versioned data and queries that ran extremely fast. You just have to make sure that you index the tables appropriately and write the queries in an optimal fashion. I've got two articles on Simple-Talk about these mechanisms. The first talks about the index structure. The second talks about the various methods of querying the versioned data.

Short answer, instead of a main table and a history table, just have one table that stores everything. The methods outlined above have been in production, successfully, for years.

----------------------------------------------------
The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood...
Theodore Roosevelt

The Scary DBA
Author of: SQL Server Query Performance Tuning and SQL Server Execution Plans
Product Evangelist for Red Gate Software
ScottPletcher
ScottPletcher
Hall of Fame
Hall of Fame (4K reputation)Hall of Fame (4K reputation)Hall of Fame (4K reputation)Hall of Fame (4K reputation)Hall of Fame (4K reputation)Hall of Fame (4K reputation)Hall of Fame (4K reputation)Hall of Fame (4K reputation)

Group: General Forum Members
Points: 3957 Visits: 6698
There must be more fragmentation in a history table than a current-data-only table.

For single row lookups, this is not really a performance hit.

But for table scans, even partial ones, it will be significant. Whether or not it will actually be noticeable depends on your environment: if you have extra capacity in your system to handle the added overhead, you may not notice the performance degradation. You could reduce fillfactor to reduce fragmentation, but that causes potential "dead space" issues, so it's not a panacea.

SQL DBA,SQL Server MVP('07, '08, '09)

Prosecutor James Blackburn, in closing argument in the "Fatal Vision" murders trial: "If in the future, you should cry a tear, cry one for them [the murder victims]. If in the future, you should say a prayer, say one for them. And if in the future, you should light a candle, light one for them."
Grant Fritchey
Grant Fritchey
SSCoach
SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)

Group: General Forum Members
Points: 17653 Visits: 32273
ScottPletcher (3/28/2013)
There must be more fragmentation in a history table than a current-data-only table.

For single row lookups, this is not really a performance hit.

But for table scans, even partial ones, it will be significant. Whether or not it will actually be noticeable depends on your environment: if you have extra capacity in your system to handle the added overhead, you may not notice the performance degradation. You could reduce fillfactor to reduce fragmentation, but that causes potential "dead space" issues, so it's not a panacea.


Some, but what we found was the if the clustered index was built right, then the storage was pretty consistent. The key structure had two values, the identity of the root object and the version of that object. That very evenly distributed the data in a known pattern. It really did work well and it's been going for about eight years now. I haven't seen it personally in the last two, but it works.

----------------------------------------------------
The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood...
Theodore Roosevelt

The Scary DBA
Author of: SQL Server Query Performance Tuning and SQL Server Execution Plans
Product Evangelist for Red Gate Software
ScottPletcher
ScottPletcher
Hall of Fame
Hall of Fame (4K reputation)Hall of Fame (4K reputation)Hall of Fame (4K reputation)Hall of Fame (4K reputation)Hall of Fame (4K reputation)Hall of Fame (4K reputation)Hall of Fame (4K reputation)Hall of Fame (4K reputation)

Group: General Forum Members
Points: 3957 Visits: 6698
Grant Fritchey (3/28/2013)
ScottPletcher (3/28/2013)
There must be more fragmentation in a history table than a current-data-only table.

For single row lookups, this is not really a performance hit.

But for table scans, even partial ones, it will be significant. Whether or not it will actually be noticeable depends on your environment: if you have extra capacity in your system to handle the added overhead, you may not notice the performance degradation. You could reduce fillfactor to reduce fragmentation, but that causes potential "dead space" issues, so it's not a panacea.


Some, but what we found was the if the clustered index was built right, then the storage was pretty consistent. The key structure had two values, the identity of the root object and the version of that object. That very evenly distributed the data in a known pattern. It really did work well and it's been going for about eight years now. I haven't seen it personally in the last two, but it works.



But some root objects will change a lot, other root objects not at all. It's essentially impossible to handle that without fragmentation in the table. You could thus have extreme fragmentation in some areas of the table and virtually none or none in others. For example, newer rows would typically get updated much more frequently than very old rows.

SQL DBA,SQL Server MVP('07, '08, '09)

Prosecutor James Blackburn, in closing argument in the "Fatal Vision" murders trial: "If in the future, you should cry a tear, cry one for them [the murder victims]. If in the future, you should say a prayer, say one for them. And if in the future, you should light a candle, light one for them."
Grant Fritchey
Grant Fritchey
SSCoach
SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)

Group: General Forum Members
Points: 17653 Visits: 32273
ScottPletcher (3/28/2013)
Grant Fritchey (3/28/2013)
ScottPletcher (3/28/2013)
There must be more fragmentation in a history table than a current-data-only table.

For single row lookups, this is not really a performance hit.

But for table scans, even partial ones, it will be significant. Whether or not it will actually be noticeable depends on your environment: if you have extra capacity in your system to handle the added overhead, you may not notice the performance degradation. You could reduce fillfactor to reduce fragmentation, but that causes potential "dead space" issues, so it's not a panacea.


Some, but what we found was the if the clustered index was built right, then the storage was pretty consistent. The key structure had two values, the identity of the root object and the version of that object. That very evenly distributed the data in a known pattern. It really did work well and it's been going for about eight years now. I haven't seen it personally in the last two, but it works.



But some root objects will change a lot, other root objects not at all. It's essentially impossible to handle that without fragmentation in the table. You could thus have extreme fragmentation in some areas of the table and virtually none or none in others. For example, newer rows would typically get updated much more frequently than very old rows.



Yep. You will get some fragmentation, no question. You get fragmentation with almost any structure. But in this instance the fragmentation was never so horrible that it was noticeable let alone that it brought the system down. Worst problem we had to deal with in the whole thing was keeping statistics up to date, but that was no different than the current issues with an identity column.

----------------------------------------------------
The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood...
Theodore Roosevelt

The Scary DBA
Author of: SQL Server Query Performance Tuning and SQL Server Execution Plans
Product Evangelist for Red Gate Software
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search