storage and performance issues of using NVarchar(Max)

  • I have a legacy database in which I have a reasonable understanding of the way it stores and accesses data, and in the past we have split Notes columns into 3 tables:

    1. 250 chars

    2. 1000 chars

    3. 4000 chars (which is the max nr of chars this database could handle)

    This was done as that database pre-allocated space to the size of text blocks indicated, and if defined as 1 block of max 4k chars would be very wasteful of space for all those entries that only had a few characters stored in them.

    Several issues with this methodology:

    1. needing to track which of the 3 tables a specific set of notes resides in and the pointer to that table from the tables that require to join to them

    2. If notes go over their current limit then they need to go into the next table that can accommodate the new length, which adds to processing overhead and complicates things

    In SQL Server 2008 it would appear that I can have a single notes table with a column which would be defined as nvarchar(max) to overcome the above limitations.

    However the fact that you have the option to define the nr of chars nn in nvarchar(nn) implies to me that there is some reason for this, otherwise why not allow all nvarchars to default to max?

    So my question is whether this is a reasonable thing to do or am I going to be hit with a storage or performance overhead in using (max).

    My current server needs are to support some 40-50 users and each note can be from 0 to max chars (but on average say 2000 chars) and is planned to hold RTF data as well as plain text.

    So for example (in the case of being linked to an address table) the column could hold the value 'TBA' or 'Proforma' right up to an image map of the address location and directions on how to get there.

    Currently the totality of all the notes data amounts to 40megabytes with an expected weekly growth of 2-4kb. I will want to apply FULLTEXT Indexing to the column

    So what are the performance and storage hits (if any) if I place all notes from my former 3 Notes Tables into the 1 new NVarChar(Max) table?

  • This was done as that database pre-allocated space to the size of text blocks indicated, and if defined as 1 block of max 4k chars would be very wasteful of space for all those entries that only had a few characters stored in them.

    There's a fundamental misunderstanding there. Defining something as an NVARCHAR(4000) does NOT preallocate storage at 4k bytes. An NVARCHAR(10) with 5 characters stored takes exactly the same amount of room as an NVARCHAR(4000) with 5 characters stored. The only thing you need to be careful of is to try not to exceed a total of 8060 bytes of actual storage per row. It can be done but it will cause a performance hit. Please read on...

    For NVARCHAR(MAX), the same would also hold true but you have to be aware of a possible performance hit depending on what you're doing. Any time the actual byte count of a given row exceeds 8060 bytes, some of those bytes (columns) will be stored "out of row" which takes extra time to access just like the old NTEXT data type did. Further, even if your NVARCHAR(MAX) data does happen to fit with 8060 bytes, the MAX datatypes can instantly cause a 2X to 10X performance hit if you try to join to it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks for the reply

    But I am not clear on this as my choice was down to whether I continue to segregate my Notes into 4 tables depending on the size of their content, or bring them all into 1?

    90% of my data will be in under 8kb but approx 5-10% won't

    While 2 tables may be an improvement on 4 but it is still remains a similar basic question

    is there an advantage to having 2 tables, 1 for the sub 8kb and another for the rest that overrides the clear disadvantage of the overhead of:

    1. checking the row and moving it from 1 to the other when it goes over or under the 8kb? and

    2. maintaining the pointers to which of the 2 tables I should read from?

    Thanks in advance

  • Let's look at it another way. When would you NOT check either table of Notes? I suspect the answer is that you'd always have to check both which means an extra join. Having 5 to 10% in one table isn't going to buy you anything unless you actually need to join on the Notes column itself which you probably wouldn't do anyway.

    I don't know of or forsee any advantage to maintaining the Notes in two separate tables based on the size of the Note unless there's a join on the Note column itself.

    You could easily make a couple of million row test tables and try it yourself.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • All else being equal then it would make sense to have all notes in 1 table with NVarchar(Max), and for those 5-10% of notes that are longer than 8kb in a row accept that they will take a fraction longer to load - which from the users point of view would probably be acceptable, and from the dba's and app programmers pov would be fine also - so everyone is happy.

    But I can't help thinking that maybe all else is not equal

    Why define Nvarchar(1 to Max) when Max is fine every time - as there is no hit on storage and there is no hit on performance?

    Is this just convention now to maintain old code for legacy versions? or have I misunderstood the answers here?

  • But there IS a hit on performance... try adding an index to NVARCHAR(MAX) or anything that involves over 900 bytes.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • RichardC-1023789 (2/27/2011)


    Why define Nvarchar(1 to Max) when Max is fine every time - as there is no hit on storage and there is no hit on performance?

    There can be a performance hit, as Jeff points out. Check out the following code.

    create table Notes

    (

    Id int,

    Note nvarchar(max)

    )

    create table Notes2

    (

    Id int,

    Note nvarchar(30)

    )

    create table Notes3

    (

    Id int,

    Note nvarchar(max)

    )

    create table Notes4

    (

    Id int,

    Note nvarchar(4000)

    )

    insert into Notes values(1, 'Some note.');

    insert into Notes values(2, 'Some longer note.');

    insert into Notes values(3, replicate('Some note.', 8000));

    insert into Notes2 values(1, 'Some note.');

    insert into Notes2 values(2, 'Some longer note.');

    insert into Notes2 values(3, 'Some even longer note.');

    insert into Notes3 values(1, 'Some note.');

    insert into Notes3 values(2, 'Some longer note.');

    insert into Notes3 values(3, 'Some even longer note.');

    set statistics io on

    select * from Notes

    select * from Notes where Id in (1, 2)

    select * from Notes2

    select * from Notes3

    set statistics io off

    /* You cannot create an index on an nvarchar(max) column.

    If you create an index on an nvarchar(4000) column, inserts

    or updates may fail if the length is greater than 900 bytes.*/

    --create index IX_Notes3 on dbo.Notes3(Note);

    --create index IX_Notes3 on dbo.Notes4(Note);

    drop table Notes

    drop table Notes2

    drop table Notes3

    When you run this, you'll see that all four queries have 1 scan on the table, but the first query also has 5 LOB (Large Object) logical reads. With a large enough table, those LOB reads can be a big performance drag because can quickly become physical reads.

    Also, if you put an index on the Note column, row insertions or updates will fail if set of index values for a row is greater than 900 bytes (as Jeff points out), but the indexing process will fail entirely for an nvarchar(max). So even if you can guarantee that the column will never be more thirty characters long, say, you could never put an index on an nvarchar(max) column, whereas you could on an nvarchar(4000) column. You might actually consider this a benefit, though, because if you later try to insert a row with a 4000-character Note field with an index on the Note column, the insert would fail. I have commented out the create index statements above; you can uncomment them and try it out if you'd like.

    For your case, I would personally recommend a single notes table with an nvarchar(max) note column on it. If you are doing index seeks to get to the relevant note(s) instead of doing a table scan or clustered index scan, that is even better. If not, I would recommend looking into an index on your Note table's foreign key column (or however you are retrieving the correct set of notes for your queries).

    But this doesn't mean that n/varchar(max) is a good fit for all situations. For example, if you want a unique key on a name--like for a lookup column--you can't do it with a varchar or nvarchar(max) field. As always, in the SQL world, sometimes you need a rubber mallet and sometimes you need a sledgehammer...

  • Thanks for your answers - food for thought and plenty to test when I get back to work later today

    My Notes table will only have a handful of columns:

    - Id,

    - Notes and

    - Type (for those 5-10% occasions that the users will want to store an RTF, DOC etc., rather than plain text)

    and I do not plan to index the Notes column for the reasons you have given, EXCEPT that I will want FULLTEXT indexing

    Unless I see a real performance hit then fingers crossed this will be the way to go for this particular table

    Many thanks for all the useful (and fast!) responses

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply