Large objects data type issues

  • Hi All,

    Have some questions LOB data. One of the table with below schema has reached to 500GB.

    Application team is trying to delete some data from this table and it is taking more time.

    We aren’t sure what is really happening. The deletes are getting successful but the number of rows are not getting reduced nor space is getting released.

    We are on SQL Server 2017 RTM. application team is deleting data based on date range condition.

    Other thing we noticed is database is part of AG and it is in FULL recovery model. Also, read_committed_snapshot is ON for this database.

    create table test

    ( c1 bigint not null,

    c2 int not null,

    c3 ntext,

    c4 int,

    c5 int,

    c6 int,

    c7 int,

    c8 int,

    c9 int,

    c10 nvarchar(255),

    c11 ntext,

    c12 int,

    c13 bigint,

    c14 ntext,

    c15 int

    )

    Questions

    ==========

    1) What are the options available to clean up LOB data

    2) Does lob data stored in single 8kb page or somewhere else?

    3) If I am selecting data from disk in what type of pages the LOB data is read into inside sql server memory?

    4) Why LOB row deletions are slow?

    5) How can calculate the row size of above table?

    6) Even though we are deleting the data, why the row count and space is not reclaimed?

    Thanks,

    Bob

  • 7) Why are you still on 2017 RTM

  • Jo Pattyn wrote:

    7) Why are you still on 2017 RTM

    Same question from me.  As well as why are you still using an ntext data type?  That was deprecated with SQL 2005.

    ntext is stored in the row, which is one reason that it's slow.

    You are not going to get any space back until you re-build the clustered index.

    Depending upon what you are deleting, you may be better off selecting what you want to keep into a new table, drop the old table, and rename the new table.

     

     

    Michael L John
    If you assassinate a DBA, would you pull a trigger?
    To properly post on a forum:
    http://www.sqlservercentral.com/articles/61537/

  • It's a POC server for 2017 upgrade. Waiting for a service pack to be released. I think only cumulative updates are available so far.

    This database is a old database and now they wanted to upgrade from sql 2012 to sql 2017 as is.

     

     

    • This reply was modified 3 years, 5 months ago by  bobrooney.81.
  • There are no more servicepacks since SQL 2017 as part of the cloudfirst strategy. Currently at update 22. https://sqlserverbuilds.blogspot.com/

    *edit* You might consider SQL Server 2019 is your project is still a POC.

    • This reply was modified 3 years, 5 months ago by  Jo Pattyn.
  • Michael L John wrote:

    Jo Pattyn wrote:

    7) Why are you still on 2017 RTM

    Same question from me.  As well as why are you still using an ntext data type?  That was deprecated with SQL 2005.

    ntext is stored in the row, which is one reason that it's slow.

    You are not going to get any space back until you re-build the clustered index.

    Depending upon what you are deleting, you may be better off selecting what you want to keep into a new table, drop the old table, and rename the new table.

    Sorry but that's not correct.  NText, Text, and Image are all stored out-of-row by default.  The newer MAX and XML lobs are (very unfortunately) stored in-row by default and, of course, only if the fit in-row.

    https://docs.microsoft.com/en-us/sql/relational-databases/system-stored-procedures/sp-tableoption-transact-sql?view=sql-server-ver15

    I totally agree on your last two points.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • As per my understanding, LOB data will also get stored inside the page. However , if the row data is exceeding 8000 bytes, then the data is stored off the page. Now , I have question. All data inside sql server are stored in 8k pages. For, LOB data how the data is stored in memory? For  holding LOB data will there be any different pages inside buffer pool ? How does an LOB column update work? will it bring data from disk and puts into 8k pages in buffer pool , makes the update and then those dirty 8k pages written back to disk or how does I/O work for LOB tables?

  • Updating LOB does not remove old value from the storage, it creates a new one and then changes the pointer in the row of the table.

    cleaning up happens on some stage later, when the server finds sufficient spare time and resources.

    _____________
    Code for TallyGenerator

  • I must say - storing big documents in SQL Server tables is a bad idea. Storing them inline with transactional updatable data is an extremely bad idea.

    if you cannot store documents outside of database in a file storage system (storing only links in the table), at least move them into a “LOB only” table and when you need to update a LOB just add a new one to the LOB table and update LOB_id in the relevant table(s).

    _____________
    Code for TallyGenerator

  • bobrooney.81 wrote:

    As per my understanding, LOB data will also get stored inside the page. However , if the row data is exceeding 8000 bytes, then the data is stored off the page. Now , I have question. All data inside sql server are stored in 8k pages. For, LOB data how the data is stored in memory? For  holding LOB data will there be any different pages inside buffer pool ? How does an LOB column update work? will it bring data from disk and puts into 8k pages in buffer pool , makes the update and then those dirty 8k pages written back to disk or how does I/O work for LOB tables?

    The key is that MS kind of screwed everyone that works with LOBs way back in 2005.  The MAX datatypes are great compared to the predecessors but the predecessors had a huge advantage... they did and still do DEFAULT to being stored out-of-row, which is a bit like some (pardon the term) document storage whereas the MAX datatypes default to in-row, if they fit.  That leads to all sorts of clustered index woes like trapped short rows and slow queries because they have to load all that in-row junk when you're just trying to read normal data.

    Whether you have in-row or out-of-row LOBs, the only way you can "compress" deletions is to REORGANIZE the indexes that contain them (Clustered Indexes and Non-Clustered index that you've made the mistake of INCLUDEing them in).

    The best thing to do is to set the table option to force them out of row and then doing an "in-place" update to actually get existing lobs to do the move to out-of-row.  I strongly recommend that any new table you build that will contain MAX or XML datatypes to have the out-of-row table option set on its birthday.

    The out-of-row stuff actually works quite well.  It will reuse space and a bunch of other cool things automatically.  The bad part is that of you do deletes because you need space, you have to run REORGANIZE and the nasty bit of code may have to be executed many times before it does the job completely.  I did a test with some not-so-big manufactured LOBs and I had to run REORGANZE 10 times to achieve the final result.  REBUILDing won't do much at all for you for either form of LOB.

    And, yeah... the Buffer Pool takes a hell of a beating if defaulting to in-row LOBs and a lot of them actually fit in row.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    Michael L John wrote:

    Jo Pattyn wrote:

    7) Why are you still on 2017 RTM

    Same question from me.  As well as why are you still using an ntext data type?  That was deprecated with SQL 2005.

    ntext is stored in the row, which is one reason that it's slow.

    You are not going to get any space back until you re-build the clustered index.

    Depending upon what you are deleting, you may be better off selecting what you want to keep into a new table, drop the old table, and rename the new table.

    Sorry but that's not correct.  NText, Text, and Image are all stored out-of-row by default.  The newer MAX and XML lobs are (very unfortunately) stored in-row by default and, of course, only if the fit in-row.

    https://docs.microsoft.com/en-us/sql/relational-databases/system-stored-procedures/sp-tableoption-transact-sql?view=sql-server-ver15

    I totally agree on your last two points.

    Yeah.  My bad.  My brain and my fingers were not in sync.  Thanks for correcting me.

    Michael L John
    If you assassinate a DBA, would you pull a trigger?
    To properly post on a forum:
    http://www.sqlservercentral.com/articles/61537/

  • Michael L John wrote:

    Yeah.  My bad.  My brain and my fingers were not in sync.  Thanks for correcting me.

    Heh... NP.  At least it wasn't me for a change. 😀 😀 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Jeff,

    I still didn't understand in what type of pages will the LOB data get stored ? apart from 8k data page are there any other type of pages to LOB data in buffer pool?

    Thanks,

    Bob

  • bobrooney.81 wrote:

    Hi Jeff,

    I still didn't understand in what type of pages will the LOB data get stored ? apart from 8k data page are there any other type of pages to LOB data in buffer pool?

    Thanks,

    Bob

    They are still stored in 8K pages.  By default for the MAX datatypes, if they fit "in-row" they will be (very unfortunately) stored as a part of the clustered index in the same 8k pages as the clustered index, which can trap "short rows" (can cause page density to be remarkably low) and causes the average row size to be rather huge and that causes a whole lot more pages to be scanned if a clustered index scan occurs (sometimes hundred or thousands more).

    If they go or are forced out of row (the latter being the best option, IMHO), they still live on 8K pages but the storage structure is quite different and it doesn't follow the same rules as a clustered index.  For example, the data is not compelled to live on pages designated by the clustered index keys.  It's more like a non-structured heterogeneous storage area and it can be quite efficient.  It requires virtually no maintenance unless you do a shedload of deletes but the deleted space will be reused.  It also (again) keeps all that stuff from actually being stored in either the Clustered Index or Heap, depending on what you have.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks you Jeff and everyone.

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply