Use of Key Column

  • Nice question.....

  • Given that some of us might consider this a problem to be solved...

    Can you index a computed column that is: left( [varcharmaxfield], 900 )

    I understand there could be an ambiguous string in those 900 characters. A unique index might not be possible.

    I am more curious about the potential for hidden disaster when performance is acceptable with the first 10k worth of small/short records, then degrades when someone starts using that varchar(max) for complete html pages or long journal entries, etc.

    Also, I'm interested to hear from the gurus how/why this solution is suboptimal. (of course, "it depends") In general, a large free-form text field isn't a sensible "index" candidate. If you are trying to find the record based on content inside this field we should use full-text search.

    QotD are great for testing what we keep in our heads, but honestly I'm not sure there's too much value in knowing some of this stuff. If an error is generated immediately upon trying to run some code, I read the error message and fix the code.

    The real hard-won skills come from live code that works but doesn't work well that you rewrite for the benefit of users (wait time) and resource utilization (cpu/disk). Eventually we learn to write code that won't need a revisit and a rewrite. I'm not sure QotD can test for that though 🙂

  • Mike Dougherty-384281 (5/29/2013)


    Given that some of us might consider this a problem to be solved...

    Can you index a computed column that is: left( [varcharmaxfield], 900 )

    I didn't test it, but I'm pretty sure you can (you probably have to cast it as varchar(900), otherwise the data type will still be varchar(max).). This will persist the computed column, so you take a storage hit - those maximum 900 bytes are added to the table's main pages, *and* they are (obviously) stored in the index.

    Also, I'm interested to hear from the gurus how/why this solution is suboptimal. (of course, "it depends") In general, a large free-form text field isn't a sensible "index" candidate. If you are trying to find the record based on content inside this field we should use full-text search.

    I would normally be hesitant to index large character columns. It's not like they will ever be searched anyway!

    That being said, if there is a business requirement to store string values in excess of 8000 bytes, *and* there is a business requirement that disallows duplicates, the inability to just create a UNIQUE constraint becomes a nuisance. If you want to enforce this rule in a way that does not report false violation when the first 900 characters happen to be the same, you'd need a trigger. For performance reasons, I would not compare every inserted or updated value to all others in the table. I might first do a quick test for duplicates on a substring - for instance that 900-character substring, maybe even a shorter one. That quick test would definitely benefot from a (nonunique) index as described above.

    However, I would definitely also explore the option of using a checksum instead of a substring as the first quick test. And if I were a gambler, I would put my money on the checksum giving a better overall performance.

    QotD are great for testing what we keep in our heads, but honestly I'm not sure there's too much value in knowing some of this stuff.

    True, but that goes for a lot of QotD's. Some are good tests that I could see used in a job interview, but lots are just bits of trivia. Nice to know, but hardly relevant to the job.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Hugo Kornelis (5/28/2013)


    L' Eomot Inversé (5/28/2013)


    There is indeed a rule that no column with type nvarchar(max), varchar(max) or varbinary(max) can be used in index keys. However, it isn't documented on that BOL page (or anywhere else that I can find, for that matter).

    This is documented in the page for CREATE INDEX: http://msdn.microsoft.com/en-us/library/ms188783.aspx:

    "Columns that are of the large object (LOB) data types ntext, text, varchar(max), nvarchar(max), varbinary(max), xml, or image cannot be specified as key columns for an index".

    I don't know how I missed that.

    Tom

  • Hugo Kornelis (5/29/2013)[hrHowever, I would definitely also explore the option of using a checksum instead of a substring as the first quick test. And if I were a gambler, I would put my money on the checksum giving a better overall performance.

    So would I. I would probably have a persisted computed column with the checksum in it and an index on that, to speed things up some more.

    It's a pity that hashbytes is still restricted to 8000 bytes arguments, as it would probably provide a big performance improvement if it could be used for this.

    Tom

  • I didn't even notice the varchar(max), I figured it would fail because there is no comma between it and the constraint. Still got the right answer.

    Tony
    ------------------------------------
    Are you suggesting coconuts migrate?

  • Excellent Question. Had everybody looking in the wrong place ! Well thought out.

  • Good question. It needed attention rather than sheer talent. 🙂

Viewing 8 posts - 16 through 22 (of 22 total)

You must be logged in to reply to this topic. Login to reply