Index for column with a lot of Null

  • I am not sure how should I create the fillfactor for the index on a column with a lot of null value. e.g. I rebuild the index with 90%, the logical scan becomes 12.57% and extent scan becomes 37.36% - worst than before I rebuild. Please help. Thanks!

  • What is the percentage of values in the table that are null for that column? It may not be advantageous to have an index at all...

  • I have 3 columns with the same situation. The percent null for the 3 columns are around 40%, 65% and 92%. The table bas 2.5 million records and grows by about 80,000 records per week. Thanks!

  • I'll tell you right now that there's no use having an index on the ones that are 65% and 92.5% NULL saturation. To make an index worthwhile, there must be a decent selectivity on the data in the column. That means, there should be a wide range of distinct values that occur in a field. For instance, if you have a total number of 3 distinct values in your field, an index is probably not going be useful, and will add more time to INSERTs and UPDATES than it will help on SELECTs. Likewise, if you have 80 distinct values in the field, with a table of 1M rows, and 999,000 of those rows have 1 of those 80 values in the field, the index will be similarly less-than-useful.

    --

    So, the point is, if either of these situations describe your own experience, you may be better off without the index altogether, or only creating it for intensive queries if it seems to help at all. As for the fillfactor, it will not particularly be affected by NULL values; fillfactor generally helps to avoid page splitting for the duration between an index is rebuilt. So, my guess is that you have placed an index on a wide varchar or nvarchar field that has values of varying widths, which have caused unusual fragmentation in the index. You might think about possibly indexing on a computed column of the CHECKSUM value, or even the LEFT X number of characters.

    --

    Am I on the right track here?

  • Actually, all the 3 indexes are based on the int data type.

    Let's say I do need the indexes, disregarding the selectivity. What can I do in order to minimize the fragmentation? With different fillfactor, the logical scan may get better, but the extent scan is horrible.

  • I am not trying to offend anyone but if for a certein column the null percentage is 65% or higher i suggest moving those fields out of the table.

    Why whase space on disk ( also performance ) if they are mostly NULL.

    Move the columns with the null values to a seperate table. This is actually a database design flaw....

  • I agree that there is probably a design flaw/oversight if there are that many NULL's.

    However, if a column in a table has values that are REALLY lopsided, like a bit column with 99% of values set to one value, then an index is helpfull in some cases if you are only querying for the 1% of the other value. This has been useful to me on some megamillion record tables.

  • Yes, the index does help me a lot when there are values in the field.

    This is a very important table and changing the table structure would mean an extensive change on the application codes, which is not possible right now. Therefore, I would like to minimize the fragmentation.

    By the way, thank you very much to all of your for your participation!!

  • You may be better off using a composite index on all 3 fields instead of 3 indexes.

    I would agree with jpipes that indexes are not going to help you much without a high selectivity. But, since you're going to create on anyway a composite index would cost less for inserts and updates and may be just as good for selects.

    If this is a very important table you could be in for some issues in the near future...good luck!

    Signature is NULL

  • I Agree with Calvin; be sure that the order of the columns in the composite key is that same of that in your queries.

    --

    But, back to the fragmentation question, tikus, when you are "rebuilding" the index, are you dropping and re-creating the index, or are you doing something different?

  • BTW, tikus, this table does have clustered index on it, right? If not, the logical and extent scan frag numbers on SHOWCONTIG should be ignored. Just checking the basics, I guess...

  • One more point on the fragmentation issue:

    Simply issuing a DBCC DBREINDEX on the table will not affect logical and extent scan fragmentation numbers unless you are reindexing the clustered index. You will only improve fragmentation on the indexes if you drop and recreate the clustered index on the table (which will in turn update the non-clustered indexes on the table which use the clustered index's clustering key as their bookmark lookup in the leaf page). So, if you a) do not have a clustered index on this table or b) your 3 INT fields are not part of the clustered index, you might want to include the fields in a clustered index. If, however, you have not yet dropped and recreated an existing clustered index on the table, try that first.

    --

    HTH,

    Jay

  • Yes, I have clustered index in the table. And yes, I use dbcc DBREINDEX to rebuild first the clustered index, and then the rest of the nonclustered indexes.

    Creating a composite index will not work in my situation. This is becuase those three columns are in the Where clause of three different SPs. To be clear:

    1) SP1: Where AKey = @AKey

    2) SP2: Where BKey = @BKey

    3) SP3: Where CKey = @CKey

    What I am really confuse is after rebuilding the indexes, the logical scan may improve or stay the same, but the extent scan can go up to almost 36%. For example, I have one index:

    logical scan = 0.52%

    Extent scan = 25.05%

    By the way, the Clustered index itself:

    logical scan = 12.33%

    extent scan = 1.55%

    Don't quite understand how come they are still fragmented ........ 🙁

    Thanks for all your help, guys!!

  • It is possible to have more than a single table's data in an extent, so the index fragmentation of other indexes housed in the same extents as this index may be to blame. Not quite sure how to address this; moving this table to its own disk, perhaps, to avoid conflict with other tables. Is this a highly transactional table?

  • Yes, this is a very highly transactional table. There are about 10 processes querying and updating this table every other seconds...

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply