Redesign of old Table

  • Hi,

    I am working on an old database table which stores emails, including From, To, CC etc but it also stores the body of the email and the person that designed the table has used datatype nvarchar(max) to store the body.

    At the moment the table is relatively small with only 97500 records but the data size for the table exceeds 2.5Gb, with the body data column taking up almost all of this space.

    Is there an easy way to change how this data is stored and let me reduce the size of this data for future rows.

    Also is there anyway of reducing the size of the current data by storing it in a different format?

    The body of the email is searched in a program written for the user, so it is indexed too.

    Any help would be much appreciated

    Many thanks

    Bob :o)

  • In the event you don't need unicode data in the body, you could convert it to a VARCHAR(MAX) as one thought.

    Alternately, if you select the max length of the body and it is 4000 or under, you could change that from NVARCHAR(MAX) to NVARCHAR(4000) (or lower).  Won't be future proof that way mind you.

    If possible, combining both of those approaches would save you space.

    If you don't need all of the emails, you could remove old ones.

    You could use your email client (outlook  365 for example has a pretty quick search) to store emails instead of SQL.

    If there are any duplicate email bodies, you could pull the bodies out to a separate table and have an int that links the 2 tables (ie basically have a lookup table for the body).

     

    As a random thought, is the index on the body actually being used? I'd check the execution plan to make sure that index is actually being used appropriately. My thoughts here are that if I was searching an email body, it is very unlikely that I remember exactly the text of the email and am actually doing a LIKE '% something %' search on it isn't going to play well with the index.  I would be curious if dropping the index on the body resulted in any performance hit on your queries AND if it helped with the space used.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • To save space, COMPRESS the body (and other very long columns).  This process is rather slow, but it typically does save a lot of space.

    And, using triggers, you can make the COMPRESSion and DECOMPRESSion transparent to the developers and the users except that a $IDENTITY column for the table can no longer be returned directly to code after an INSERT.  That is, SCOPE_IDENTITY() (or any of the other standard methods) no longer work to return the assigned $IDENTITY value, because the $id assignment is a trigger and thus is in a separate scope.

    There are numerous other ways to return the $id to the code -- such as sp_set_session_context and sp_get_session_context -- but the INSERTing code must be changed to do that, of course.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • ScottPletcher wrote:

    To save space, ENCRYPT the body (and other very long columns).  This process is rather slow, but it typically does save a lot of space.

    And, using triggers, you can make this transparent to the developers and the users except that a $IDENTITY column for the table can no longer be returned directly to code after an INSERT.  That is, SCOPE_IDENTITY() (or any of the other standard methods) no longer work to return the assigned $IDENTITY value, because the $id assignment is a trigger and thus is in a separate scope.

    There are numerous other ways to return the $id to the code -- such as sp_set_session_context and sp_get_session_context -- but the INSERTing code must be changed to do that, of course.

    Did you mean COMPRESS instead of  ENCRYPT?

  • Jonathan AC Roberts wrote:

    ScottPletcher wrote:

    To save space, ENCRYPT the body (and other very long columns).  This process is rather slow, but it typically does save a lot of space.

    And, using triggers, you can make this transparent to the developers and the users except that a $IDENTITY column for the table can no longer be returned directly to code after an INSERT.  That is, SCOPE_IDENTITY() (or any of the other standard methods) no longer work to return the assigned $IDENTITY value, because the $id assignment is a trigger and thus is in a separate scope.

    There are numerous other ways to return the $id to the code -- such as sp_set_session_context and sp_get_session_context -- but the INSERTing code must be changed to do that, of course.

    Did you mean COMPRESS instead of  ENCRYPT?

    D'OH, yeah, COMPRESS / DECOMPRESS, not ENCRYPT.  Had ENCRYPT on the brain from what I was doing here.

    Edit: I've corrected the original post.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Thank you so much for your replies Mr. Brian Gale, ScottPletcher and Jonathan AC Roberts, I will now try these out and see how I get on with a trial db first and if they work implement them into the live db.

    Many thanks

    Bob

  • bobski wrote:

    the table is relatively small with only 97500 records but the data size for the table exceeds 2.5Gb

    That's an average of 3.5 Pages of text (assuming Nvarchar) PER message. Seems a lot to me ... of the messages are being stored with HTML and more besides - maybe Images embedded rather than linked.

    My first aim would be to only store the "meat" from the messages, e.g. plain-text, because long term that will have the greatest impact on saving.

  • maybe some analysis of what proportion of the Email Body are Small,. Medium and Big in size. Something like this

    SELECTSUM(CASE WHEN LEN(Body) <= 1000 THEN 1 ELSE 0 END) AS T1000
    , SUM(CASE WHEN LEN(Body) > 1000 AND LEN(Body) <= 2000 THEN 1 ELSE 0 END) AS T2000
    , SUM(CASE WHEN LEN(BodyBody) > 2000 AND LEN(Body) <= 5000 THEN 1 ELSE 0 END) AS T5000
    , SUM(CASE WHEN LEN(Body) > 5000 AND LEN(Body) <= 10000 THEN 1 ELSE 0 END) AS T10000
    , SUM(CASE WHEN LEN(Body) > 10000 THEN 1 ELSE 0 END) AS T10001
    , SUM(CASE WHEN LEN(Body) > TBiggest * 0.9 THEN 1 ELSE 0 END) AS T90Percent
    , [TBiggest] = MAX(TBiggest)
    FROMMyEmailTable
    CROSS APPLY
    (
    SELECTMAX(LEN(Body)) AS TBiggest
    FROMMyEmailTable
    ) AS X

    Then if a relatively number of records are significantly larger than average then worth having a look at why that is.

    Check the biggest ones anyway

    SELECT TOP 100 Body
    FROMMyEmailTable
    ORDER BY LEN(Body) DESC
  • ScottPletcher wrote:

    And, using triggers, you can make the COMPRESSion and DECOMPRESSion transparent to the developers

    Interesting... How do you make DECOMPRESSion transparent to SELECTs (for example) using triggers?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I agree with the suggestions from Brian. Also, I was wondering Table Compression can help to save space. I've noticed up-to 80-85% of storage space savings with Table Compression. It can be given a try.

  • This was removed by the editor as SPAM

  • Jeff Moden wrote:

    ScottPletcher wrote:

    And, using triggers, you can make the COMPRESSion and DECOMPRESSion transparent to the developers

    Interesting... How do you make DECOMPRESSion transparent to SELECTs (for example) using triggers?

    The DECOMPRESS is transparent using a view.  Sorry, I was in a hurry and conflated the trigger and view.

    The original table is renamed.  You create a view that matches the original table name.  That view handles the DECOMPRESS and an INSTEAD OF INSERT trigger on that view transparently handles the COMPRESS on the INSERTs.

    The downside, as I noted, is that after the INSERT, any $IDENTITY value is no longer directly available to the code that did the INSERT because the INSERT is now down in a trigger, which has its own scope.  If there's no $IDENTITY column, or the INSERTing code doesn't need to reference it directly after the INSERT, no problem.  If the INSERTing code does need ident, you'd have to adjust that code to not rely on SCOPE_IDENTITY(), etc.

     

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • ScottPletcher wrote:

    Jeff Moden wrote:

    ScottPletcher wrote:

    And, using triggers, you can make the COMPRESSion and DECOMPRESSion transparent to the developers

    Interesting... How do you make DECOMPRESSion transparent to SELECTs (for example) using triggers?

    The DECOMPRESS is transparent using a view.  Sorry, I was in a hurry and conflated the trigger and view.

    The original table is renamed.  You create a view that matches the original table name.  That view handles the DECOMPRESS and an INSTEAD OF INSERT trigger on that view transparently handles the COMPRESS on the INSERTs.

    The downside, as I noted, is that after the INSERT, any $IDENTITY value is no longer directly available to the code that did the INSERT because the INSERT is now down in a trigger, which has its own scope.  If there's no $IDENTITY column, or the INSERTing code doesn't need to reference it directly after the INSERT, no problem.  If the INSERTing code does need ident, you'd have to adjust that code to not rely on SCOPE_IDENTITY(), etc.

    Thanks, Scott.  I've done that type of thing in the past and, you're correct... is totally transparent to the user except for the $IDENTITY thing  you mentioned (haven't tried a SCOPE_IDENTITY() in an INSERT trigger but that would likely work as a hack.).  I was just confused for the reason you stated... no view was mentioned and thought you found something different. 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply