Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««12345»»

Denormalization Strategies Expand / Collapse
Author
Message
Posted Monday, March 15, 2010 1:24 PM


SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Thursday, January 30, 2014 4:28 PM
Points: 81, Visits: 258
Thx Lynn... good to know. Still, alligators and all that...

Dr. Les Cardwell, DCS-DSS
Enterprise Data Architect
Central Lincoln PUD
Post #883297
Posted Monday, March 15, 2010 2:57 PM
Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Tuesday, October 7, 2014 7:31 AM
Points: 553, Visits: 666
I'm not a big fan of de-normalization in OLTP databases. If I need to run queries I'll run them against a structure that's is optimized for queries such as a dimensional model or a heavily indexed Operational Data Store. I think an OLTP database exists for one reason, to capture (and validate) transactional data as quickly as possible. The emphasis is as much on write performance as it is on read performance. Anything that gets in the way of that mission such as locking associated with read-only type queries should be consciously reduced if not completely eliminated.

I've actually found that it's easier and faster to develop the OLTP model and the ODS in parallel. It also allows you to build a data source for your down-stream data warehouse that can be accessed without impacting the OLTP database. This is, after all, why so many companies use transactional replication, i.e. in order to populate their ODS for querying/reporting purposes.

One of the reasons I don't like de-normalization in the OLTP database is that the business intent of the logical data model can become obfuscated by the perceived need for short-cuts in the physical data model. There is a reason why each entity exists in the normalized data model. Referential integrity demonstrates how those entities are related to each other. When those relationships are eliminated and attributes are scattered redundantly around the model the reasoning behind those relationships is no longer clear. A normalized data model is very easy to "read". Once de-normalization occurs a level of unnecessary complexity is added.

There was a time when the cost of physical referential integrity was quite high and locking algorithms were relatively primitive so de-normalization was a means of overcoming the shortcomings of the engine. That time is past.


"Beliefs" get in the way of learning.
Post #883365
Posted Monday, March 15, 2010 3:10 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Friday, October 17, 2014 11:18 AM
Points: 22, Visits: 141
Robert Frasca (3/15/2010)

I've actually found that it's easier and faster to develop the OLTP model and the ODS in parallel. It also allows you to build a data source for your down-stream data warehouse that can be accessed without impacting the OLTP database. This is, after all, why so many companies use transactional replication, i.e. in order to populate their ODS for querying/reporting purposes.


I appreciate your purist view of this, and I agree that a transfer from OLTP to a OLAP/Warehouse environment is optimal, and best-practice for reporting and other analyses, and if the resources are available, I totally agree with that notion; however, I have a number of customers and environments where operating in that environment is quite a challenge because of financial resources, and support and maintenance concerns (ie not having a DBA to support it).
Post #883373
Posted Monday, March 15, 2010 3:20 PM
Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Tuesday, October 7, 2014 7:31 AM
Points: 553, Visits: 666
I feel your pain. I should probably have prefaced my remarks with "when it's possible". It's much easier to take the approach I recommended at the beginning of a project. It's much more difficult to retrofit; however, one of the things I'm working on now is to create that ODS and gradually move, starting with the longest running, queries off of the production OLTP database and on to the ODS to reduce stress. It'll probably take a couple of years and only then will we be able to think about tweaking the underlying OLTP physical model. It's a cool project however with a number of interesting facets to it.

"Beliefs" get in the way of learning.
Post #883384
Posted Monday, March 15, 2010 8:04 PM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Friday, August 1, 2014 9:10 AM
Points: 55, Visits: 73
Sometimes when you identify a one to many relationship there is a tendency toward dogmatic decision-making.

I wrote code not long ago to help users categorize records. Instead of using a lookup table i used a varchar field and an int field with a binary hash function triggered on insert or update. As users enter data, previous entries are distinct selected on that binary hash, in order to give a drop-down option based on what they're typing. It seemed easier to me than creating a lookup table and writing code to insert or update to the lookup table.

I would only denormalize in rare cases though.
Post #883491
Posted Monday, March 15, 2010 8:57 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Tuesday, April 24, 2012 5:33 AM
Points: 15, Visits: 120
I don't want to hit this too hard but I see the example more as a case for improved design versus dernomarlization. Just a few obervations if I may:

1) The Pet_Medication table should include the date the med was administered and the dosage, whether it's a one time event or part of a series. The Pet_MedicationSeries table can be elimniated and the Pet ID and Med ID used as a composite key. That change would seem to me to satisfy a rather obvious business requirement.

2) I would speculate this hypothetical business would want some measure of inventory and there is no better place to put that than the Medication table. But that table should also include any requirement for readministering the medication. That would eliminate the need to write a separate select statement for each med and allow for changes without requiring a developer to write more SQL.

I know the example is hypothetical, but this cat can be skinned a much better way. I think the changes I have suggested would eliminate any need for denormalization and provide a far more flexible application that is less DB developer dependent.
Post #883502
Posted Monday, March 15, 2010 9:42 PM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Tuesday, September 23, 2014 9:31 AM
Points: 32, Visits: 187
I agree more with trubolotta than the contents of the article. I do not see why there needs to be separate table for one-time meds vs. recurring meds. In fact Pet_Medication could contain a bit column to indicate it is recurring. The overhead of one bit is much less than an entire table along with the foreign keys to maintain it. Possibly another tinyint column to indicate recurring frequency (e.g. 1= daily, 2-weekly, 3=monthly...)

A properly designed structure with proper indexes will perform amazingly fast, especially in this environment. Unless they are a verterinary clinic with 50 doctors seeing 30 patients each a day, six days/week and you have 20 years of medical history with each patient having an average of 2 medications per visit.

Even then, you have to realize that a large percentage of patients may not even require medication. But even so, this is very doable with good response time in SQL Server.

I have designed HR systems that join up to 20 tables in certain queries with sub-second response time reading the data.
Post #883524
Posted Monday, March 15, 2010 9:59 PM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: Friday, October 17, 2014 8:13 AM
Points: 9,926, Visits: 11,188
Les Cardwell (3/15/2010)
[quote]Since 'getdate()' is a non-deterministic function, like all non-deterministic functions, we've always assigned them to a scalar variable to ensure the dbms won't perform a table-scan...although admittedly, these days they seem to be more implementation dependent.

Non-determinism isn't the issue. SQL Server can evaluate the expression once and seek on the result.
Assigning to a variable might cause a poor plan since SQL Server cannot 'look inside' the variable in many cases.




Paul White
SQL Server MVP
SQLblog.com
@SQL_Kiwi
Post #883533
Posted Tuesday, March 16, 2010 9:01 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Tuesday, April 24, 2012 5:33 AM
Points: 15, Visits: 120
While I find the discussion of how to best use the getdate() function interesting, I also find it distressing that the pursuit of code efficiency brushes aside the more important issue of business practicality. As I mentioned in a previous post, the sample presented to demonstrate a "need for denormalization" more aptly demonstrates the need for better design, and that includes how the application might be used in the real world.

The reason I would run such a report is to notify customers that their pets are due for a particular medication, probably a month in advance. The way the code was written to produce that list requires it be run every day so as not to miss any pet due to date gaps. More likely, the business will run that report once a month or once week, but not always on some developer conceived target due date. People do things early or late and those gaps must be covered. Hence getdate() is totally inappropriate for use in that report.

I know my comment digresses from the topic, but I think it is import to use examples that illustrate very practical business considerations, especially those covering fundamentals of design that n00bs wish to learn. Someone reading the article or comments regarding the use of getdate() is going to walk away with the wrong lesson. They may write more efficient code as a result, but they will also write an application that puts undue burden on the business that uses it.
Post #883869
Posted Friday, March 19, 2010 11:45 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: 2 days ago @ 2:25 AM
Points: 123, Visits: 66
First: Great article, interesting replies.

My question:
This query does not work in MsSql 2000:
SELECT text, total_worker_time/execution_count AS [Avg CPU Time], execution_count
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS st
ORDER BY execution_count DESC

Is there some kind of alternative for MsSql2000?

(Sorry if this has been asked before, I have read the replies rather quickly)
Post #886610
« Prev Topic | Next Topic »

Add to briefcase «««12345»»

Permissions Expand / Collapse