Non-Unique Clustered index impact

  • SQL 2019 Standard, 32 GB, 4Core, Win 2019 vm.

    We have a table with an INT Identity as PK (non-clustered)

    A clustered index on a date (non-unique) about 300 rows per day. One column, that is Nvarchar(max) usually contains about 60K per record.

    We need a covering index that contains the NVARCHAR(MAX) as records are accessed by a single date or small date range (2 days)

    Using the clustered index on the date gives us what we need for a covering index that includes the blob column.

    The indexes are maintained weekly using Ola Hallengren's script.

    Outside of the NVARCHAR(MAX) the record is fairly small. The highest daily record count could eventually in years from now get up to 1000.

    I know that SQL needs to maintain the uniquifyer value, forwarded records could be an issue, but I'm not sure since blob is stored separate from the data pages. My concern is decreasing performance over time. This is tied to an app and response time is important.

    What are the issues if any with this sort of setup?

     

     

  • The only thing I can think of is to make sure the nvarchar(max) column is stored off-row.  By default it will be stored in-row and that is where you would see problems.

    Use sp_tableoption on that table to set the option to store large value types out of row.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • I thought the default was off row when the value exceeded the page size.

    There can be updates to these documents (the column contains a text file that is about 6-8 pages) It will be updated if there is a new version of the file.

    I'll look at the sp_tableoption, thank you.

  • The default for the MAX columns is in-row...not sure why that choice was made.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • It might be the default but doesn't apply in this situation.

    0 = varchar(max), nvarchar(max), varbinary(max), xml and large UDT values are stored directly in the data row, up to a limit of 8000 bytes and as long as the value can fit in the record. If the value does not fit in the record, a pointer is stored in-row and the rest is stored out of row in the LOB storage space. 0 is the default value.

    Since the smallest value is well above 8k they will all be out of row with pointers.

  • TangoVictor wrote:

    It might be the default but doesn't apply in this situation.

    0 = varchar(max), nvarchar(max), varbinary(max), xml and large UDT values are stored directly in the data row, up to a limit of 8000 bytes and as long as the value can fit in the record. If the value does not fit in the record, a pointer is stored in-row and the rest is stored out of row in the LOB storage space. 0 is the default value.

    Since the smallest value is well above 8k they will all be out of row with pointers.

    Is the NVARCHAR(MAX) column ALWAYS populated during the INSERT of the row or do you do an INSERT and later follow that with an UPDATE?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Just checked with the developer and rows are created all at once with no updates.

  • TangoVictor wrote:

    Just checked with the developer and rows are created all at once with no updates.

    TangoVictor wrote:

    Just checked with the developer and rows are created all at once with no updates.

    Ok... so no "ExpAnsive Updates" from NULL to something on the BLOB column.  That's a good thing.

    The clustered index you made on the date column is fine but you also seem to know that SQL Server really likes a UNIQUE clustered index.  What you may want to do is to explicitly include the IDENTITY column as the 2nd column of the clustered index and redefine the clustered index as UNIQUE.  To be sure, though, sometimes it helps and sometimes you're just taking up a bit of extra space in the BTREE of the index.

    As far as the BLOB goes itself, can you do a count of how many BLOB entries have less than 8K bytes?  The reason I ask is just to make sure that there are no "Short Trapped Pages" in the In-Row part of the table, which can waste relatively huge amounts of space.  Another way to check would be to rebuild the clustered index and then use sys.dm_db_index_physical_stats to check the page density.  If it's less than the Fill Factor for the clustered index, you may have such a problem.  It's also relatively easy to fix but no sense in even describing that fix if the problem doesn't exist.

    Yes... it's understood that if the Fill Factor = 0 or 100, the Page Density almost always be something less.  How much less can be a strong indication of the "Short Trapped Pages" problem.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Correct no updates, when a new file is generated it's read and combined with the meta data then inserted.

    There are only 2 rows with less than 8K and these would be an exception, this is pretty new and I think there was an issue with the source files being generated by the third party system.

    I rebuilt the index and checked sys dm index there are 2 rows, one for IN_ROW and the other for LOB_DATA. See screenshot.

    I don't know why space used and record counts are NULL. Table has about 30k rows right now.

    Attachments:
    You must be logged in to view attached files.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply