Blog Post

Indexes are best utilized in WHERE, not JOIN

,

A pervasive oversight in database indexing by newer database users is relying solely on indexes used for joining tables together (i.e. foreign key indexes.) For example, say you have a Puppies table and a Kid table. Most likely, there is a Kid_ID column in the Puppies table pointing back to Kid. Since this is a foreign key, there is usually an index on Kid_ID. This index allows the SQL Engine to quickly ensure no referential rules are being broken on inserts and updates and is used when the engine needs to leap from the Kid table to related records in the Puppies table. This is good and necessary. The problem is when the creator of the database assumes that this will be sufficient for efficient database queries. In many, many cases, these indices will never be utilized. When searching for the kids who own a small Labrador with white spots on its back who’s owner’s last name is ‘Vegiard’, this index is completely ignored and the database simply reads every record in the puppies table, known as a table scan, which is, of course, much slower than using an appropriate index.

Why?

The Kid_ID index is merely too simple for SQL Server to use effectively. Think back to “The” Analogy. If you’re the super secret agent, do you want to drive out to the warehouse, grab the index book that lists all dogs on the planet, then grab the reference for all kids on the planet and start cross-referencing them? No!!! You’d have to cross reference several billion (if not more) entries before starting to weed out entries based on their type. In other words, you’d be cross-referencing hound dogs, sheppard’s, etc. which is a lot of work, especially since in the end you don’t care about those breeds right now.

It makes more sense to grab the index reference on dog characteristics and turn directly to the section for Labradors. If you’re lucky, it will also have the labs sorted by those who have white spots on their backs to allow you to only worry about the specific dogs you are after. SQL wants to work in this same way. It wants an index on the characteristics fields so it can work with the least number of records possible. An index on Kid_ID is not helpful here and it is ignored in this case.

So, back to our analogy. The agent uses the Characteristics cross reference to look for all small Labs w/white spots. He finds a mere 36,000. Now he runs into the warehouse to look up all those puppy files and see who owns them. As he finds the owners names, he then goes and pulls the kid’s file and checks to see if their last name is “Vegiard.” Good news… this approach will only take about 32 days….

What the super secret agent REALLY needs is a cross-refference that has the puppies characteristics AND the Kid_ID. Then, if he had a reference that had Last_Name and Kid_ID he could cross-reference the two of them without ever leaving the reference room. The super-secret agent just LOVES saving time by not going into the warehouse. If I may be bold as to personify; SQL Server also loves working just with indexes and skipping the tables.

So, the moral of the story…

When creating an index, create a compound key (an index with multiple fields in it). The first field(s) should be those commonly used in WHERE clauses (in this case, Characteristics). Next, place fields that commonly participate in Join clauses (such as Kid_ID in this example.) Now you’re DB is ready to Rock! If you’re really slick, you can then add a field for sorting or ordering, but be aware that the more fields you add the slower your database inserts/updates take, so don’t go crazy unless it is pretty critical. The main focus is to cut down the number of records that need to be considered for the majority of the queries.

Some of you may be asking: “How do I know which columns are being used in WHERE clauses the most?” Well, three’s two choices there: (1) Get tools that profile your database and give you quantifiable information about that very topic. These are rare and expensive. If you’re a smaller shop, then (2) Start Analyzing! Ask the project lead, ask the users, use common sense, etc. Creating indexes can be an art in this respect. Go buy a smock!!

Find more performance tuning tips here.

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating