Workaround maximum length for a nonclustered index in SQL Server

  • Hi, hope everyone is well.

    I need to create a unique nonclustered index but that key lenght is 17oo bytes.

    I know the work around is to create the index with include.

    My question is: would that index with unique be unique? Will an insert fail if it is breaking the logic of the unique nonclustered index with include?

     

    Thanks

    Astrid

  • Maximum length for a nonclustered index in SQL Server 2017 is 1700 bytes. Since you have not provided enough information it is hard to say what you problem may be.  Perhaps if you provided the full text of the error message and/or the DDL for the table and the index you are trying to create we could provide a better answer.

    Edit:  This is for the actual indexed columns.  You can add additional columns that are not a part of the indexed columns as included columns.

     

     

    • This reply was modified 4 years, 1 month ago by  Lynn Pettis.
  • astrid 69000 wrote:

    My question is: would that index with unique be unique? Will an insert fail if it is breaking the logic of the unique nonclustered index with include?

    No - included columns are not a part of the index or unique constraint.  I am curious about how you ended up with a table where the only way to identify a unique row is 1700 characters long.  How many columns are a part of this index - and what are the data types of each column?

    What does this table represent?

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Thanks for the reply. This is the table I need to make - we follow the datavault.

    I am able to build the table but I get a warning:

    Warning! The maximum key length for a nonclustered index is 1700 bytes. The index 'UC_HubHistoryDataNew' has maximum length of 1802 bytes. For some combination of large values, the insert/update operation will fail.

    This is the table code.

    CREATE TABLE [Raw].[HubHistoryDataNew](
    [HistoryDataHashKey] [char](40) NOT NULL,
    [LoadDate] [datetime] NULL,
    [RecordSource] [nvarchar](200) NULL,
    [LastSeenDate] [datetime] NULL,
    [CreateDate] [datetime2](7) NULL,
    [UserID] [nvarchar](20) NULL,
    [TableName] [nvarchar](50) NULL,
    [ColumnName] [nvarchar](50) NULL,
    [PrimaryKey] [nvarchar](254) NULL,
    [Action] [nvarchar](15) NULL,
    [OldValue] [nvarchar](254) NULL,
    [NewValue] [nvarchar](254) NULL,
    CONSTRAINT [HubHistoryDataNew_PK] PRIMARY KEY NONCLUSTERED
    (
    [HistoryDataHashKey] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 80) ON [INDEX],
    CONSTRAINT [UC_HubHistoryDataNew] UNIQUE NONCLUSTERED
    (
    [CreateDate] ASC,
    [UserID] ASC,
    [TableName] ASC,
    [ColumnName] ASC,
    [PrimaryKey] ASC,
    [Action] ASC,
    [OldValue] ASC,
    [NewValue] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 80) ON [INDEX]
    ) ON [DATA]
    GO
  • if you are following Data Vault design then you have too many columns on that hub. most of the columns there look like attributes of your business key and are not, as far as I can think of, a business key on its own.

     

    Also, and if you can, don't define the columns as nvarchar - unless you use data that is unicode based stick with varchar types as it will save space.

    edit: and looking at the columns this looks like it should be a link table (split into link and satellite) rather than a hub.

  • This is a special table, that the column are the composition of that primary hash key. That is why are all there.

    I could check about checking between nvarchar and varchar.

    Thanks

  • Speaking of hashes, you could create a hash column for all the columns and make that the Clustered PK.  That would certainly be within the limits of a CI.

     

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • i did not choose the type of table it is, I don't even know what the table is for. I was just asked to create a hub

  • Jeff Moden wrote:

    Speaking of hashes, you could create a hash column for all the columns and make that the Clustered PK.  That would certainly be within the limits of a CI.

    what do you mean? we have already hash all the columns to create a hash.

  • Speaking of hashes, you could create a hash column for all the columns and make that the Clustered PK.  That would certainly be within the limits of a CI.

    what do you mean? we have already hash all the columns to create a hash.

  • astrid 69000 wrote:

    Speaking of hashes, you could create a hash column for all the columns and make that the Clustered PK.  That would certainly be within the limits of a CI.

    what do you mean? we have already hash all the columns to create a hash.

    Ah... I see.  I missed the fact that you posted the code for the table.

    There's no way to make an index with 1700 bytes of keys without a possibility of future failure, period, because everyone forgets about the row-headers within the indexes.   Even on 2017, it's likely to come up a bit short because of that (although I've not tested it specifically because a 1700 byte keyed index isn't practical, IMHO).  You'll need to identify the minimum of what makes the index unique and use INCLUDE for the rest if you want to come even close.  Certainly the old and new value columns are not needed to be included in the keys.

    The real key here (no pun intended) is that this is a column based AUDIT/HISTORY table and, ostensibly, it's being fed only by triggers (if you're using CLR TRIGGERs, that's a HUGE performance mistake... obviously, I've been there and done that and it needs to be fixed if you have those or anything dynamic within your triggers and there's a way around that).  That being said, you really shouldn't need a UNIQUE index on the columns that you're trying to include as keys.  At best, you only need the TableName, ColumnName, PrimaryKey, and CreateDate for keys and, with the understanding that it's a huge and wasteful duplication of data, use INCLUDE for the other columns.

    Since we have also have this form of auditing at work, I can tell you that your trigger is also being bogged down by converting old and new values to NVARCHAR().  This is one of the very few places in the world where SQL_Variant datatype works a treat (there are some minor caveats but works perfect for everything I need such an audit table for).

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Looking at the table you are creating I have to agree, this is not a Hub table in the DataVault methodology. A HUB table will have the natural business key; if using DataVault 2.0 it will also have an MD5 hash of the business key which will be used as the primary key in the hub, satellite, and link tables; the source system, the date first seen, the date deleted or last seen.  It will have no other data.  All other data will be in one or more satellite tables and possibly in link tables.

    Also, using the DataVault methodology there would be no reason to have columns to hold old and new values.  The DataVault methodology is an insert only into the raw data vault even when a physical delete of data occurs in any of the source systems.

    If you need to insure uniqueness across all the columns you have listed as index columns you can use CHECKSUM, BINARY_CHECKSUM, or HASHBYTES using all those columns and store that as a persisted column and index on that column.  Just remember that you have to consider the data, especially the character data.  For example, HASHBYTES will treat the character data as case sensitive data when it hashes the data.  This means "SEPTEMBER" and "September" and "September" will generate different hashes.

     

  • astrid 69000 wrote:

    i did not choose the type of table it is, I don't even know what the table is for. I was just asked to create a hub

    Then go back to who asked you to create it and get full and correct definition. Under DataVault Methodology you do need to follow the rules - otherwise you just creating a mess.

    without doing a full and proper analysis, and based just on the table definition you supplied the following would be a possible representation of the data under DV method.

    and i'm stil not happy with the link representation - and Table entry most likely could also be split into table further tables (hub for table, hub for column, link between them - potentially another hub for PrimaryKey and yet another link to join all together)

    Hub - User
    UserHash
    BusinessKey
    -- UserID

    Hub - TableEntry
    TableEntryHash
    BusinessKey
    --TableName
    --ColumnName
    --PrimaryKey

    Link - TableEntryChanges
    LinkHash
    UserHash
    TableEntryHash

    Link Sat
    LinkHash
    LinkAttributes
    --CreateDate
    --Action
    --OldValue
    --NewValue
  • And on a different note - there is no need for that extra unique index - if the hash is done correctly and it is based on those columns then it is already unique by definition as you set it as Primary Key.

    and when accessing the table to see if a record already exists it should ALWAYS go there with the hash, never the individual columns, so that index would not be really that useful

  • The other constraint on the size of unique keys is the number of columns, which I believe is 16.  Bigint is only 8 bytes.  So to the extent all of the OP's nvarchar columns could be substituted by surrogate keys... it's nowhere near the limit of what Sql Server can handle.

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply