Find space used by image columns

  • By default, Image data (and some other types) are written to PRIMARY.  Is there any way to find out how much space is consumed by these images?

  • Tom John-342103 - Tuesday, August 14, 2018 10:59 AM

    By default, Image data (and some other types) are written to PRIMARY.  Is there any way to find out how much space is consumed by these images?

    You can use DATALENGTH
    SELECT SUM(DATALENGTH(myImageCol))
    FROM myTable

    https://docs.microsoft.com/en-us/sql/t-sql/functions/datalength-transact-sql?view=sql-server-2017

  • Jonathan AC Roberts - Tuesday, August 14, 2018 11:03 AM

    Tom John-342103 - Tuesday, August 14, 2018 10:59 AM

    By default, Image data (and some other types) are written to PRIMARY.  Is there any way to find out how much space is consumed by these images?

    You can use DATALENGTH
    SELECT SUM(DATALENGTH(myImageCol))
    FROM myTable

    https://docs.microsoft.com/en-us/sql/t-sql/functions/datalength-transact-sql?view=sql-server-2017

    Thanks.  That worked with one modification as shown below:
    SELECT SUM(cast(DATALENGTH(BinaryImageData) as bigint)) FROM IMAGEINDEX

    I had to do this because the value returned was >2^31.  The value was 31,116,152,328

  • Tom John-342103 - Tuesday, August 14, 2018 11:27 AM

    Jonathan AC Roberts - Tuesday, August 14, 2018 11:03 AM

    Tom John-342103 - Tuesday, August 14, 2018 10:59 AM

    By default, Image data (and some other types) are written to PRIMARY.  Is there any way to find out how much space is consumed by these images?

    You can use DATALENGTH
    SELECT SUM(DATALENGTH(myImageCol))
    FROM myTable

    https://docs.microsoft.com/en-us/sql/t-sql/functions/datalength-transact-sql?view=sql-server-2017

    Thanks.  That worked with one modification as shown below:
    SELECT SUM(cast(DATALENGTH(BinaryImageData) as bigint)) FROM IMAGEINDEX

    I had to do this because the value returned was >2^31.  The value was 31,116,152,328

    That's actually going to provide a bit of an inaccurate count.  Yes, you're counting the bytes in the image but you're not counting the number of bytes it takes to store the data.  You need to count the pages in the LOB and divide that by 128.0 to get the accurate count of the bytes consumed not just by the images but by the pages they are stored on.

    The hard part is if you have more than one LOB column in the same table.  I've not had to split them out before and so don't know how to do such a thing off the top of my head.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Tuesday, August 14, 2018 8:18 PM

    Tom John-342103 - Tuesday, August 14, 2018 11:27 AM

    Jonathan AC Roberts - Tuesday, August 14, 2018 11:03 AM

    Tom John-342103 - Tuesday, August 14, 2018 10:59 AM

    By default, Image data (and some other types) are written to PRIMARY.  Is there any way to find out how much space is consumed by these images?

    You can use DATALENGTH
    SELECT SUM(DATALENGTH(myImageCol))
    FROM myTable

    https://docs.microsoft.com/en-us/sql/t-sql/functions/datalength-transact-sql?view=sql-server-2017

    Thanks.  That worked with one modification as shown below:
    SELECT SUM(cast(DATALENGTH(BinaryImageData) as bigint)) FROM IMAGEINDEX

    I had to do this because the value returned was >2^31.  The value was 31,116,152,328

    That's actually going to provide a bit of an inaccurate count.  Yes, you're counting the bytes in the image but you're not counting the number of bytes it takes to store the data.  You need to count the pages in the LOB and divide that by 128.0 to get the accurate count of the bytes consumed not just by the images but by the pages they are stored on.

    The hard part is if you have more than one LOB column in the same table.  I've not had to split them out before and so don't know how to do such a thing off the top of my head.

    Thanks Jeff.  I actually moved the image data to it's own file group.  The file has a size of 47,626,321,920 bytes with no wasted space.  The sum of the data lengths is 31,116,152,328.

    How do I get the page count from the LOB.

  • Tom John-342103 - Wednesday, August 15, 2018 5:47 AM

    Jeff Moden - Tuesday, August 14, 2018 8:18 PM

    Tom John-342103 - Tuesday, August 14, 2018 11:27 AM

    Jonathan AC Roberts - Tuesday, August 14, 2018 11:03 AM

    Tom John-342103 - Tuesday, August 14, 2018 10:59 AM

    By default, Image data (and some other types) are written to PRIMARY.  Is there any way to find out how much space is consumed by these images?

    You can use DATALENGTH
    SELECT SUM(DATALENGTH(myImageCol))
    FROM myTable

    https://docs.microsoft.com/en-us/sql/t-sql/functions/datalength-transact-sql?view=sql-server-2017

    Thanks.  That worked with one modification as shown below:
    SELECT SUM(cast(DATALENGTH(BinaryImageData) as bigint)) FROM IMAGEINDEX

    I had to do this because the value returned was >2^31.  The value was 31,116,152,328

    That's actually going to provide a bit of an inaccurate count.  Yes, you're counting the bytes in the image but you're not counting the number of bytes it takes to store the data.  You need to count the pages in the LOB and divide that by 128.0 to get the accurate count of the bytes consumed not just by the images but by the pages they are stored on.

    The hard part is if you have more than one LOB column in the same table.  I've not had to split them out before and so don't know how to do such a thing off the top of my head.

    Thanks Jeff.  I actually moved the image data to it's own file group.  The file has a size of 47,626,321,920 bytes with no wasted space.  The sum of the data lengths is 31,116,152,328.

    How do I get the page count from the LOB.

    One way is to execute sys.dm_db_index_physical stats on the related table's Clustered Index and then look for the LOB_DATA marker for the row.  Note that it will include not only the column in question but anything else that qualifies as LOB_DATA.  There may also be an out of row marker. I don't remember off the top of my head but it's marked something like ROW_OVERFLOW or something like that.  That's for any normal VARCHARS/NVARCHARS that overflowed the ~8060 / 8094 byte mark for the row.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Wednesday, August 15, 2018 7:10 AM

    Tom John-342103 - Wednesday, August 15, 2018 5:47 AM

    Jeff Moden - Tuesday, August 14, 2018 8:18 PM

    Tom John-342103 - Tuesday, August 14, 2018 11:27 AM

    Jonathan AC Roberts - Tuesday, August 14, 2018 11:03 AM

    Tom John-342103 - Tuesday, August 14, 2018 10:59 AM

    By default, Image data (and some other types) are written to PRIMARY.  Is there any way to find out how much space is consumed by these images?

    You can use DATALENGTH
    SELECT SUM(DATALENGTH(myImageCol))
    FROM myTable

    https://docs.microsoft.com/en-us/sql/t-sql/functions/datalength-transact-sql?view=sql-server-2017

    Thanks.  That worked with one modification as shown below:
    SELECT SUM(cast(DATALENGTH(BinaryImageData) as bigint)) FROM IMAGEINDEX

    I had to do this because the value returned was >2^31.  The value was 31,116,152,328

    That's actually going to provide a bit of an inaccurate count.  Yes, you're counting the bytes in the image but you're not counting the number of bytes it takes to store the data.  You need to count the pages in the LOB and divide that by 128.0 to get the accurate count of the bytes consumed not just by the images but by the pages they are stored on.

    The hard part is if you have more than one LOB column in the same table.  I've not had to split them out before and so don't know how to do such a thing off the top of my head.

    Thanks Jeff.  I actually moved the image data to it's own file group.  The file has a size of 47,626,321,920 bytes with no wasted space.  The sum of the data lengths is 31,116,152,328.

    How do I get the page count from the LOB.

    One way is to execute sys.dm_db_index_physical stats on the related table's Clustered Index and then look for the LOB_DATA marker for the row.  Note that it will include not only the column in question but anything else that qualifies as LOB_DATA.  There may also be an out of row marker. I don't remember off the top of my head but it's marked something like ROW_OVERFLOW or something like that.  That's for any normal VARCHARS/NVARCHARS that overflowed the ~8060 / 8094 byte mark for the row.

    p.s.  And there's always wasted space in pages because the data just won't fit right.  Even LOB_DATA lives on 8K pages.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Wednesday, August 15, 2018 7:11 AM

    Jeff Moden - Wednesday, August 15, 2018 7:10 AM

    Tom John-342103 - Wednesday, August 15, 2018 5:47 AM

    Jeff Moden - Tuesday, August 14, 2018 8:18 PM

    Tom John-342103 - Tuesday, August 14, 2018 11:27 AM

    Jonathan AC Roberts - Tuesday, August 14, 2018 11:03 AM

    Tom John-342103 - Tuesday, August 14, 2018 10:59 AM

    By default, Image data (and some other types) are written to PRIMARY.  Is there any way to find out how much space is consumed by these images?

    You can use DATALENGTH
    SELECT SUM(DATALENGTH(myImageCol))
    FROM myTable

    https://docs.microsoft.com/en-us/sql/t-sql/functions/datalength-transact-sql?view=sql-server-2017

    Thanks.  That worked with one modification as shown below:
    SELECT SUM(cast(DATALENGTH(BinaryImageData) as bigint)) FROM IMAGEINDEX

    I had to do this because the value returned was >2^31.  The value was 31,116,152,328

    That's actually going to provide a bit of an inaccurate count.  Yes, you're counting the bytes in the image but you're not counting the number of bytes it takes to store the data.  You need to count the pages in the LOB and divide that by 128.0 to get the accurate count of the bytes consumed not just by the images but by the pages they are stored on.

    The hard part is if you have more than one LOB column in the same table.  I've not had to split them out before and so don't know how to do such a thing off the top of my head.

    Thanks Jeff.  I actually moved the image data to it's own file group.  The file has a size of 47,626,321,920 bytes with no wasted space.  The sum of the data lengths is 31,116,152,328.

    How do I get the page count from the LOB.

    One way is to execute sys.dm_db_index_physical stats on the related table's Clustered Index and then look for the LOB_DATA marker for the row.  Note that it will include not only the column in question but anything else that qualifies as LOB_DATA.  There may also be an out of row marker. I don't remember off the top of my head but it's marked something like ROW_OVERFLOW or something like that.  That's for any normal VARCHARS/NVARCHARS that overflowed the ~8060 / 8094 byte mark for the row.

    p.s.  And there's always wasted space in pages because the data just won't fit right.  Even LOB_DATA lives on 8K pages.

    Tom John now has a rule of thumb for this type of data:

    File size = Results from DATALENGTH * 1.5

  • Jeff Moden - Wednesday, August 15, 2018 7:10 AM

    Tom John-342103 - Wednesday, August 15, 2018 5:47 AM

    Jeff Moden - Tuesday, August 14, 2018 8:18 PM

    Tom John-342103 - Tuesday, August 14, 2018 11:27 AM

    Jonathan AC Roberts - Tuesday, August 14, 2018 11:03 AM

    Tom John-342103 - Tuesday, August 14, 2018 10:59 AM

    By default, Image data (and some other types) are written to PRIMARY.  Is there any way to find out how much space is consumed by these images?

    You can use DATALENGTH
    SELECT SUM(DATALENGTH(myImageCol))
    FROM myTable

    https://docs.microsoft.com/en-us/sql/t-sql/functions/datalength-transact-sql?view=sql-server-2017

    Thanks.  That worked with one modification as shown below:
    SELECT SUM(cast(DATALENGTH(BinaryImageData) as bigint)) FROM IMAGEINDEX

    I had to do this because the value returned was >2^31.  The value was 31,116,152,328

    That's actually going to provide a bit of an inaccurate count.  Yes, you're counting the bytes in the image but you're not counting the number of bytes it takes to store the data.  You need to count the pages in the LOB and divide that by 128.0 to get the accurate count of the bytes consumed not just by the images but by the pages they are stored on.

    The hard part is if you have more than one LOB column in the same table.  I've not had to split them out before and so don't know how to do such a thing off the top of my head.

    Thanks Jeff.  I actually moved the image data to it's own file group.  The file has a size of 47,626,321,920 bytes with no wasted space.  The sum of the data lengths is 31,116,152,328.

    How do I get the page count from the LOB.

    One way is to execute sys.dm_db_index_physical stats on the related table's Clustered Index and then look for the LOB_DATA marker for the row.  Note that it will include not only the column in question but anything else that qualifies as LOB_DATA.  There may also be an out of row marker. I don't remember off the top of my head but it's marked something like ROW_OVERFLOW or something like that.  That's for any normal VARCHARS/NVARCHARS that overflowed the ~8060 / 8094 byte mark for the row.

    Tom: it is incredibly difficult to get an accurate accounting of the actual footprint of a particular row, column, or value. There are various types of overhead to account for, various structures for storing large data, external factors, etc.  For example:

    • TEXT / NTEXT / IMAGE datatypes typically take up just 18 bytes per row in the main data page (16 bytes for the pointer to the LOB data page, plus 2 bytes of other overhead -- NULL BITMAP, etc).
    • TEXT / NTEXT / IMAGE datatypes can be stored in-row, but that requires setting a Table Option that few people are aware of. However, if set, then if the value can fit in row, it will be stored there and not on a LOB page.
    • If the LOB data is small enough, it will be pointed to directly by the 16-byte pointer stored in the row.
    • If the LOB data is large enough, it will be mapped by one or more TEXT_TREE pages
    • There is overhead / metadata for all page types, so an 8k page does not give you 8192 bytes of YOUR data. I just filled up some pages in a temp table using an IMAGE column (side note: IMAGE was deprecated with the release of SQL Server 2005, and if at all possible, you should be using VARBINARY(MAX) for a variety of reasons). I think the most I was able to store on a full page (value was 56,000 bytes) was 8150 bytes, meaning 42 bytes of overhead minimum.
    • All pages, whether DATA or LOB or ROW_OVERFLOW (and I would guess TEXT_TREE), are mapped in IAM pages.
    • If you have multiple LOB columns in a table, they can share space on the same LOB page (if the data fits, obviously)

    Still, if you have just a single LOB column, then you can get somewhat close by using the following query, executed in the DB containing the table (be sure to put the correct table name into the OBJECT_ID function):

    SELECT [allocation_unit_type_desc], [allocated_page_file_id], [allocated_page_page_id], [page_type_desc],
       '---' AS [---], *
    FROM sys.dm_db_database_page_allocations(DB_ID(), OBJECT_ID(N'SchemaName.TableName'), NULL, NULL, N'DETAILED')
    WHERE [allocation_unit_type_desc] = N'LOB_DATA';

    You can then add in the number of rows * 18 for the number of bytes required in-row (NULL rows take up 0 bytes, at least for this column).

    ----
    P.S. Jeff, re: types that can overflow, don't forget about VARBINARY , and I believe also XML :).

    P.P.S. Tom, re: your statement of "By default, Image data (and some other types) are written to PRIMARY", I do not believe that is correct. I am fairly certain that LOB data is stored in the same file group that the table / clustered index is in. I think you are referring to the TEXT_IMAGE_ON option that allows you to specify a different file group, but I don't think that implies that without it being specified that the data is in PRIMARY. If anything, I would expect the "default" to be the default file group, and that is not necessarily PRIMARY (I always create a different file group for user data and mark it as "default" so that hopefully nothing but system meta-data goes into PRIMARY).

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • Image data (or successor types) is a bit of a pain.   I am satisfied, at this point, with the option of putting image data into it's own file group.  What I found interesting is that the process of getting the data into the new structure was time consuming.  Image data doesn't seem to copy very fast.  To do this, I renamed the original table and all of its constraints and indexes.  Recreated the table with the TextImage in it's own file group and then copied from the renamed table to the recreated table. To copy 10,000,000 or so rows with around 50GB of image data took more than 12 hours.

  • Tom John-342103 - Thursday, August 16, 2018 12:38 PM

    Image data (or successor types) is a bit of a pain.   I am satisfied, at this point, with the option of putting image data into it's own file group.  What I found interesting is that the process of getting the data into the new structure was time consuming.  Image data doesn't seem to copy very fast.  To do this, I renamed the original table and all of its constraints and indexes.  Recreated the table with the TextImage in it's own file group and then copied from the renamed table to the recreated table. To copy 10,000,000 or so rows with around 50GB of image data took more than 12 hours.

    Did you pre-grow the data files (data and log)? Did you copy it over in batches of less than 3000 rows? Of course, you have 29 GB of data, taking up 3,817,933 LOB pages at bare minimum (assuming they all fit into 3150 byte pages), and likely thousands more. So I can imagine that it takes time to break up 200k values into the proper structure, allocate that structure, store the data, and write it again (everything is written to both log and data files). So I wouldn't expect it to happen super quick-like 😉 .

    Take care, Solomon..

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply