Integer vs nchar(6) primary keys

  • Hello.

    We have a database which has the following structure:

    - a central table for core data and several satellite tables, with 1-1 relationships to the main one. This is because of the sheer number of data items we needed to store, and the 8K page limit.

    - each table stores current versions only, and has a corresponding 'history' table where we keep old versions, and a 'delta' table where we keep new and changed rows which have still to be authorised.

    The primary key on the 'current' tables is an nchar(6). Internally generated, but we avoided an integer as the business wanted a meaningful key.

    The PK on the history and delta tables is an integer surrogate key which represents the specific version of the entity. This is included as an FK on the 'current' tables so it can be linked back to the 'delta' from which it came.

    Now the problem. We have a view which looks at the current tables only, but joins 10 of them using the nchar(6). And it takes ages - 40 seconds to bring back thousands of records. So one of the developers has redesigned the current tables to use the integer version key instead, and the view to join on that instead. With a unique index on the nchar(6) to prevent duplicates. The performance gain is significant - the screens are filled almost instantly.

    My question is - can the change from nchar(6) to integer PK's really make such a huge difference, or should we be looking for another cause? The redesign is still correct but I am reluctant to move away from the true primary keys.

    Has anyone done any research on join efficiency using different data types?

  • Remember that any datatype starting with the letter "n" is a Unicode-compatible datatype and requires extra overhead for processing. Because Unicode characters have so much more space in them than regular characters, plus the number of characters in a Unicode set is far higher than the # in a non-Unicode set of characters, this could be causing some of your problems. The question you should ask yourself is "Do I really need to account for foreign language accents / characters?"

    If not, I suggest you remove every instance of a Unicode compatible datatype in your DB. Not only will it save you some space, but it will save you some processing power too. How much savings you see depend on how large and wide your tables are / database is. If, however, you think you might need to account for the ' character that goes above certain vowels or the umlauts character (German language), definitely keep the Unicode datatypes, but only for the specific columns you might need it for.

    Back to your primary key issue... You can't just change a character value to an int unless the character values only have numbers in them. And if this key is so meaningful, then why would it have all numbers in it? Can you give us a couple of examples of your PK?

    You could also put in a couple of identity columns that don't have to be your primary key and use that to join on, assuming you've got similar values in the other tables you're searching for data on.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Thanks for the advice, I think that explains it.

    We do need to cope with foreign language which is why we've got Unicode, but we had decided on a policy of using it for all character fields. We weren't aware of the overhead and it's obviously inappropriate for this particular column.

    Re your comment about changing the key, we haven't changed the datatype of an existing column, we are using a different column entirely - the 'versioned' key instead of the natural key of the entity.

  • I would advise testing with both an int PK and a change of the nchar(6) to char(6). Run Profiler during the tests so you can see what it's doing. Maybe even System Monitor to see how the RAM and CPU are handing the original and the changes.

    This will give you a good idea of what SQL Server is doing during the query and let you know if it's just a datatype issue or if something else is going on that the datatype issue just happens to be aggrivating.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply