Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

Indexed Views

By Christoffer Hedgate,

I guess more or less every reader has heard of denormalization and how it is used to improve performance. It is said that a query that joins several tables will perform bad, and that to remedy this you should denormalize your database design. This 'advice' is however wrong, and it is wrong in several ways. In this article I will show you why it is wrong and what you can do instead of following this advice.

What is denormalization?

The fact that there is no single and clear answer to this question should in itself be a warning flag that this is something you should be wary of. Compared to normalization, there is no definition of what a denormalized table is. Note that I don't use the term relation, because an R-table, i.e. a relation in table form as stated in the relational model, is in 1NF by definition and therefore always normalized. Instead, what most proponents of denormalization mean when they say that you should denormalize a database design is that you should 'lower the level of normalization', i.e. not have your tables fully normalized. In other words, you should have more than one entity in a table. The reason for doing this, they say, is that queries will otherwise have to join several tables to return complete information, and this is supposedly bad for performance. Some like to do this when the performance of some specific query or operation is not as good as they want it, but I have even seen projects where there was an entire phase in the project cycle dedicated to denormalizing the system!
 
Lets view a simple example of denormalization. Say we have an application that use the pubs database and execute queries like the one below:
SELECT t.title_id, t.title, t.price, p.pub_id, p.pub_name
FROM dbo.titles t
INNER JOIN dbo.publishers p ON t.pub_id = p.pub_id
This example use only a single join and might therefore not be one that would be targeted for denormalization, but it will do as an example. Besides, just as there is no definition for denormalization, there is no answer to the question "how many joins are too many?". In order to avoid the join between the tables titles and publishers, a new table, tp, is created, by running the following statement:
SELECT t.title_id, t.title, t.price, p.pub_id, p.pub_name
INTO dbo.tp
FROM dbo.titles t
INNER JOIN dbo.publishers p ON t.pub_id = p.pub_id
You can now retrieve the same result as the initial query does by running a SELECT-query without the join(s), just using the new table tp. But, as you hopefully will see, this also means that you have created a situation where you have redundant data that exists in both the original tables titles and publishers, as well as in the denormalized table tp. Sure, one could argue that you don't need to keep the original tables (at least not if tp would contain all columns in both titles and publishers), but that still means you have redundant data in the table tp. That is what normalization does, it removes these redundancies by making sure you have only one entity in each table. In either case, you have just made the process of keeping the integrity of your data much more complicated than it is in a fully normalized design, and that is the most important thing that proponents of denormalization forget about. The following quote is from an article called "The Dangerous Illusion: Denormalization, Performance and Integrity, Part 1" by Fabian Pascal:
 
"The only reason you may sometimes achieve better performance when you denormalize your database is because you ignore the integrity implications. If and when those are taken into account, you will lose the performance gain, if any, from denormalization and may even end up with a net performance loss relative to the fully normalized alternative."
 
Even though the integrity implications are the main reason you should never denormalize, this article will not discuss those issues because Pascal and others already does this in the above mentioned article and elsewhere (see references below). I should however emphasize the parts "may sometimes achieve better performance" and "performance gain, if any". Because a database by definition should not be biased towards a specific application but should instead be designed to perform well for all applications, denormalizing a table is not even always a good thing for performance. Even if one application might benefit (performance wise) by the denormalization, others, including future applications that have not even been conceived yet, will not benefit. Note that I still say "might benefit", even when talking about the specific application that is the reason you want to denormalize a table, because other factors (like extra I/O) might negate the performance gain proponents expect from removing a join.
 
To complete this little debunk of denormalization I must also state that it should not be possible to improve performance by denormalizing a database design, since normalization (and denormalization, whatever the term actually means) is a purely logical operation, while it is the physical layer that determines performance. The reason you might sometimes achieve better performance from a logical design choice (as denormalization) is that the DBMS we use does not adequately separate the logical layer from the physical layer, and because of this a logical design choice might still affect the physical layer in a way that could sometimes give you better performance.

Indexed views in SQL Server 2000

What the DBMS should be doing is completely separating the logical and physical layers, thereby opening up an infinite number of possibilities to do physical performance enhancing work 'under the hood'. This way you as a database designer would only need to consider the logical design and thereby help the DBMS control the integrity of the database, and your logical design decisions would not affect the performance of the database. The DBMS could handle logical joins with any physical structures it wants, as long as it makes sure the integrity is maintained the way the logical design specifies it. SQL Server actually has a way for us, the DBAs, to 'help' it do this for us, more or less (see summary as well). Using indexed views we can keep our fully normalized logical design with all the constraints intact, but still have a physical structure that lets us execute our logical join without reducing performance.
 
An indexed view is created like a standard SQL Server view, i.e. a query definition that is stored and then materialized when you use the view as a table source in a query, by using the CREATE VIEW ddl statement. You then create a unique clustered index on this view. Because the leaf level pages of clustered indexes contain the actual data rows (see my earlier articles about clustered indexes), this will physically store the result set of the query that the view is defined by on disk in the same way as a clustered index on a table is stored. There are several conditions that a view must meet to allow you to create a clustered index on it. Some of them are SET options that must be set in a specific way, others are rules for what the view definition can contain, but I will leave it to you the reader to check for yourself in Books Online exactly what they are.
 
Now you might be thinking that this sounds like the 'solution' I described above, with the denormalized table tp. But even though the data is duplicated, there are almost no similarities between these solutions. Especially, there are two important differences:
  • First of all, SQL Server will automatically transfer modifications made to the data in the base tables to the indexed view, the same way it does for an index created on a base table. This means that you don't have to create any kind of complex constraints to maintain the integrity of the data as you would need to do for a denormalized table like tp.
  • The second reason is that SQL Server will automatically use the indexed view even in queries that don't reference it, if it decides that the query will benefit from it. This means that you don't need to rewrite existing applications and queries to benefit from the indexed view, as you would need to do with the denormalized table tp.
The version of Books Online that comes with SQL Server (no service pack) incorrectly state that "You can create indexed views only if you install Microsoft SQL Server 2000 Enterprise Edition or Microsoft SQL Server 2000 Developer Edition.", but this is a documentation error. What it should say is that only SQL Server 2000 Enterprise Edition (and Developer Ed) will use the indexed view as a stored table. Other editions of SQL Server will expand the view just as it does for a standard view, even if it is directly referenced in the query, unless you specify the query hint NOEXPAND. The updated version of BOL (SP3) says exactly this "Indexed views can be created in any edition of SQL Server 2000. In SQL Server 2000 Enterprise Edition, the query optimizer will automatically consider the indexed view. To use an indexed view in all other editions, the NOEXPAND hint must be used."

Testing indexed views

The small script below show you how to use indexed views instead of denormalization. You must of course be using Enterprise Edition or Developer Edition to run it.
USE pubs
GO

SELECT t.title_id, t.title, t.price, p.pub_id, p.pub_name
FROM dbo.titles t
INNER JOIN dbo.publishers p ON t.pub_id = p.pub_id
GO

IF EXISTS(SELECT * FROM sysobjects WHERE name = 'v_tp')
  DROP VIEW v_tp
GO

CREATE VIEW dbo.v_tp
WITH SCHEMABINDING
AS
SELECT t.title_id, t.title, t.price, p.pub_id, p.pub_name
FROM dbo.titles t
INNER JOIN dbo.publishers p ON t.pub_id = p.pub_id
GO

CREATE UNIQUE CLUSTERED INDEX ixcv_tp ON dbo.v_tp (title_id, pub_id)
GO

-- Note in execution plan that even though the query doesn't specify
-- the view, SQL Server will automatically use it instead of the base
-- tables that are specified in the query.
SELECT t.title_id, t.title, t.price, p.pub_id, p.pub_name
FROM dbo.titles t
INNER JOIN dbo.publishers p ON t.pub_id = p.pub_id
GO

Summary

Now that you've seen how you can use indexed views instead of denormalizing your tables I hope I won't need to hear anyone proposing denormalization as an advice to improve performance. Well, seriously, I never accept anyone advising a user to denormalize because of the integrity implications, but I hope this article might help in some situations. However, you should still remember that an indexed view is just as any other index, it will have a negative effect on modifications and you must of course consider this to the improvement you will get for some queries. As you might notice, this is not an optimal solution as it can still bias the database towards some applications. It also mix the logical and physical layers, but this is unavoidable as an indexed view 'by definition' is a mix of logical and physical.

References
SQL Server Books Online - Creating an Indexed View
The Dangerous Illusion: Denormalization, Performance and Integrity, Part 1
The Dangerous Illusion: Denormalization, Performance and Integrity, Part 2
On normalization, Performance and Integrity
On The Relational Model and Physical Implementations

I guess more or less every reader has heard of denormalization and how it is used to improve performance. It is said that a query that joins several tables will perform bad, and that to remedy this you should denormalize your database design. This 'advice' is however wrong, and it is wrong in several ways. In this article I will show you why it is wrong and what you can do instead of following this advice.

Total article views: 12147 | Views in the last 30 days: 4
 
Related Articles
ARTICLE

What is Denormalization?

Chris Kempster brings us a basic look at the database design topic of denormalization.

ARTICLE

Denormalization Strategies

In building a database, typically we want a well normalized design. However there are cases for cons...

BLOG

Indexes and their Effect on Query Performance

Indexes play a huge role in the performance of a query. Without indexes, you very simple query to ge...

FORUM

Slow performing Query

Slow performing Query inspite of clustered index seek!!!

Tags
advanced    
database design    
indexing    
sql server 7    
 
Contribute

Join the most active online SQL Server Community

SQL knowledge, delivered daily, free:

Email address:  

You make SSC a better place

As a member of SQLServerCentral, you get free access to loads of fresh content: thousands of articles and SQL scripts, a library of free eBooks, a weekly database news roundup, a great Q & A platform… And it’s our huge, buzzing community of SQL Server Professionals that makes it such a success.

Join us!

Steve Jones
Editor, SQLServerCentral.com

Already a member? Jump in:

Email address:   Password:   Remember me: Forgotten your password?
Steve Jones