When does SQL decide to load all of an index in to memory?

  • We're having a bit of a performance issue with one of our tables, which seems to be behaving in a slightly unexpected way.

    The table has about 13.5m rows. It has an integer, identity, primary key column (DeliveryTimetableItemResolvedCostID) and various other columns, one of which is another integer column (DeliveryTimetableItemID); this has about 11.5m distinct values in it.

    CREATE TABLE [dbo].[T_DeliveryTimetableItemResolvedCost](
     [DeliveryTimetableItemResolvedCostID] [int] IDENTITY(1,1) NOT NULL,
     [DeliveryTimetableItemID] [int] NOT NULL,
     [Quantity] [decimal](10, 4) NOT NULL,
     [Cost] [money] NULL,
     etc.
     CONSTRAINT [PK_T_DeliveryTimetableItemResolvedCost] PRIMARY KEY CLUSTERED
    (
     [DeliveryTimetableItemResolvedCostID] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
    ) ON [PRIMARY]
    GO

    This other integer column has a non-unique index on it that includes a couple of the columns (Quantity and Cost):

    CREATE NONCLUSTERED INDEX [IX_T_DeliveryTimetableItemResolvedCost_DeliveryTimetableItemID] ON [dbo].[T_DeliveryTimetableItemResolvedCost]
    (
     [DeliveryTimetableItemID] ASC
    )
    INCLUDE (  [Quantity],
     [Cost]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 80) ON [PRIMARY]
    GO

    The problem we're having is that the entirety of the non-unique index (IX_T_DeliveryTimetableItemResolvedCost_DeliveryTimetableItemID) is being loaded in to memory even when we're only wanting the Quantity and Cost of 1% of the total number of rows in the table as selected via the DeliveryTimetableItemID. That ends up using quite a lot of memory.

    My question is; does SQL have some threshold over which it just says "S*d it - let's load up the whole thing!" (Presumably a dynamically-determined threshold.) I know "scans are generally faster than seeks" - is it simply a result of that?

  • Post the query that causes this problem.  Aggregates could be involved.  Alternatively, some kind of function could be involved, that would have to touch every row, just to then filter out most of them.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • julian.fletcher - Monday, February 4, 2019 10:22 AM

    We're having a bit of a performance issue with one of our tables, which seems to be behaving in a slightly unexpected way.

    The table has about 13.5m rows. It has an integer, identity, primary key column (DeliveryTimetableItemResolvedCostID) and various other columns, one of which is another integer column (DeliveryTimetableItemID); this has about 11.5m distinct values in it.

    CREATE TABLE [dbo].[T_DeliveryTimetableItemResolvedCost](
     [DeliveryTimetableItemResolvedCostID] [int] IDENTITY(1,1) NOT NULL,
     [DeliveryTimetableItemID] [int] NOT NULL,
     [Quantity] [decimal](10, 4) NOT NULL,
     [Cost] [money] NULL,
     etc.
     CONSTRAINT [PK_T_DeliveryTimetableItemResolvedCost] PRIMARY KEY CLUSTERED
    (
     [DeliveryTimetableItemResolvedCostID] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
    ) ON [PRIMARY]
    GO

    This other integer column has a non-unique index on it that includes a couple of the columns (Quantity and Cost):

    CREATE NONCLUSTERED INDEX [IX_T_DeliveryTimetableItemResolvedCost_DeliveryTimetableItemID] ON [dbo].[T_DeliveryTimetableItemResolvedCost]
    (
     [DeliveryTimetableItemID] ASC
    )
    INCLUDE (  [Quantity],
     [Cost]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 80) ON [PRIMARY]
    GO

    The problem we're having is that the entirety of the non-unique index (IX_T_DeliveryTimetableItemResolvedCost_DeliveryTimetableItemID) is being loaded in to memory even when we're only wanting the Quantity and Cost of 1% of the total number of rows in the table as selected via the DeliveryTimetableItemID. That ends up using quite a lot of memory.

    My question is; does SQL have some threshold over which it just says "S*d it - let's load up the whole thing!" (Presumably a dynamically-determined threshold.) I know "scans are generally faster than seeks" - is it simply a result of that?

    I'm thinking SQL never says "lets load up the whole thing", but it loads pages if they aren't in the buffer pool, and it unloads pages that haven't been accessed in a while. So if you did a select that had all the keys and included columns that would also force a full table scan, I'm thinking that's got the best chance, including only the keys and included columns hopefully indicate that SQL does not also need to read the table, but that's just me thinking out loud 🙂

    Of course making that happens is sort of an artificial thing, as you would then not be "using" the index in your query, except to access the included columns which I'm thinking SQL MIGHT assign a lower cost by reading the index instead of the table.

  • sgmunson - Monday, February 4, 2019 11:55 AM

    Post the query that causes this problem.  Aggregates could be involved.  Alternatively, some kind of function could be involved, that would have to touch every row, just to then filter out most of them.

    The actual query relies on other stuff so I've extracted its essence. This also shows the problem.


    -- Get a small, random selection of IDs.
    Select Top 1 Percent DeliveryTimetableItemID
     Into #Subset
     From dbo.T_DeliveryTimetableItemResolvedCost DTIRC
     Order By NEWID ()

    -- Empty the buffer and display how much of the index is loaded. (It is 0%, as expected.)
    DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS
    Select dbo.IndexPercentInMemory ('IX_T_DeliveryTimetableItemResolvedCost_DeliveryTimetableItemID', Null)

    -- Use the index.
    Select Quantity, Cost
     Into #Temp
     From dbo.T_DeliveryTimetableItemResolvedCost DTIRC
     Join #Subset S On (S.DeliveryTimetableItemID = DTIRC.DeliveryTimetableItemID)

    -- This shows 100%.
    Select dbo.IndexPercentInMemory ('IX_T_DeliveryTimetableItemResolvedCost_DeliveryTimetableItemID', Null)

    Drop Table #Temp
    Drop Table #Subset

    If you change the percentage in the first select statement to "0.001" then the percentage loaded in to memory falls to 1.85. In fact, here's a table:

    Entirely reproducible. It just seems very odd that it would load up pretty much the whole index just to get 0.1% of the data.

  • What is the size of the index ?
    How many rows are in each index page ? (Average).

    For Clustered indexes there are some 'methods' to obtain the position of each row and you can exactly calculate which pages are needed. Maybe there are equivalent 'methods' for index rows pages. (See below).

    If there are a hundred indexes in each page and you read 1 percent, most index pages will be touched, so the optimizer might choose to do a full index scan. With more than a hundred indexes in each page the likelyhood of touching a page increases. A full index scan is faster than individually getting most pages, but consumes more cache.

    Ben

    Maybe there is something similar for indexes ???
    ---------------------------------------------------------------------------------
    -- RID physical row locator segment/page/row segment/page/row
    ---------------------------------------------------------------------------------
    -- http://www.sqlskills.com/blogs/paul/post/sql-server-2008-new-%28undocumented%29-physical-row-locator-function.aspx

    -- example
    select top 10 
    '--' [--]
    ,convert(bigint,convert (binary (4), reverse (substring (V.%%physloc%%, 1, 4)))) page
    ,convert(bigint,convert (binary (4), reverse (substring (V.%%physloc%%, 1, 4))))/8 segment
    ,convert (binary (2), reverse (substring (v.%%physloc%%, 5, 2))) file_id
    ,convert (binary (2), reverse (substring (v.%%physloc%%, 7, 2))) slot
    from A_table v

  • julian.fletcher - Tuesday, February 5, 2019 2:24 AM

    sgmunson - Monday, February 4, 2019 11:55 AM

    Post the query that causes this problem.  Aggregates could be involved.  Alternatively, some kind of function could be involved, that would have to touch every row, just to then filter out most of them.

    The actual query relies on other stuff so I've extracted its essence. This also shows the problem.


    -- Get a small, random selection of IDs.
    Select Top 1 Percent DeliveryTimetableItemID
     Into #Subset
     From dbo.T_DeliveryTimetableItemResolvedCost DTIRC
     Order By NEWID ()

    -- Empty the buffer and display how much of the index is loaded. (It is 0%, as expected.)
    DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS
    Select dbo.IndexPercentInMemory ('IX_T_DeliveryTimetableItemResolvedCost_DeliveryTimetableItemID', Null)

    -- Use the index.
    Select Quantity, Cost
     Into #Temp
     From dbo.T_DeliveryTimetableItemResolvedCost DTIRC
     Join #Subset S On (S.DeliveryTimetableItemID = DTIRC.DeliveryTimetableItemID)

    -- This shows 100%.
    Select dbo.IndexPercentInMemory ('IX_T_DeliveryTimetableItemResolvedCost_DeliveryTimetableItemID', Null)

    Drop Table #Temp
    Drop Table #Subset

    If you change the percentage in the first select statement to "0.001" then the percentage loaded in to memory falls to 1.85. In fact, here's a table:

    Entirely reproducible. It just seems very odd that it would load up pretty much the whole index just to get 0.1% of the data.

    Quick thought, what SQL Server loads into memory is one of the things that is not directly configurable, hence a better way of addressing this kind of performance problems would be to fully understand where the bottlenecks are (read: where it hurts)
    😎

  • Another thing that could affect this is stats that need to be updated.  Distribution of data within the index could also be a factor and easily have a tipping point as to whether SQL Server will do individual SEEKs, a SEEK/RANGE SCAN, or just a full up index scan.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • julian.fletcher - Tuesday, February 5, 2019 2:24 AM

    sgmunson - Monday, February 4, 2019 11:55 AM

    Post the query that causes this problem.  Aggregates could be involved.  Alternatively, some kind of function could be involved, that would have to touch every row, just to then filter out most of them.

    The actual query relies on other stuff so I've extracted its essence. This also shows the problem.


    -- Get a small, random selection of IDs.
    Select Top 1 Percent DeliveryTimetableItemID
     Into #Subset
     From dbo.T_DeliveryTimetableItemResolvedCost DTIRC
     Order By NEWID ()

    -- Empty the buffer and display how much of the index is loaded. (It is 0%, as expected.)
    DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS
    Select dbo.IndexPercentInMemory ('IX_T_DeliveryTimetableItemResolvedCost_DeliveryTimetableItemID', Null)

    -- Use the index.
    Select Quantity, Cost
     Into #Temp
     From dbo.T_DeliveryTimetableItemResolvedCost DTIRC
     Join #Subset S On (S.DeliveryTimetableItemID = DTIRC.DeliveryTimetableItemID)

    -- This shows 100%.
    Select dbo.IndexPercentInMemory ('IX_T_DeliveryTimetableItemResolvedCost_DeliveryTimetableItemID', Null)

    Drop Table #Temp
    Drop Table #Subset

    If you change the percentage in the first select statement to "0.001" then the percentage loaded in to memory falls to 1.85. In fact, here's a table:

    Entirely reproducible. It just seems very odd that it would load up pretty much the whole index just to get 0.1% of the data.

    Okay, no aggregates or functions, but what does the index look like?  Are Quantity and Cost any part of the index?  Please post the index definition.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • sgmunson - Tuesday, February 5, 2019 1:46 PM

    julian.fletcher - Tuesday, February 5, 2019 2:24 AM

    sgmunson - Monday, February 4, 2019 11:55 AM

    Post the query that causes this problem.  Aggregates could be involved.  Alternatively, some kind of function could be involved, that would have to touch every row, just to then filter out most of them.

    The actual query relies on other stuff so I've extracted its essence. This also shows the problem.


    -- Get a small, random selection of IDs.
    Select Top 1 Percent DeliveryTimetableItemID
     Into #Subset
     From dbo.T_DeliveryTimetableItemResolvedCost DTIRC
     Order By NEWID ()

    -- Empty the buffer and display how much of the index is loaded. (It is 0%, as expected.)
    DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS
    Select dbo.IndexPercentInMemory ('IX_T_DeliveryTimetableItemResolvedCost_DeliveryTimetableItemID', Null)

    -- Use the index.
    Select Quantity, Cost
     Into #Temp
     From dbo.T_DeliveryTimetableItemResolvedCost DTIRC
     Join #Subset S On (S.DeliveryTimetableItemID = DTIRC.DeliveryTimetableItemID)

    -- This shows 100%.
    Select dbo.IndexPercentInMemory ('IX_T_DeliveryTimetableItemResolvedCost_DeliveryTimetableItemID', Null)

    Drop Table #Temp
    Drop Table #Subset

    If you change the percentage in the first select statement to "0.001" then the percentage loaded in to memory falls to 1.85. In fact, here's a table:

    Entirely reproducible. It just seems very odd that it would load up pretty much the whole index just to get 0.1% of the data.

    Okay, no aggregates or functions, but what does the index look like?  Are Quantity and Cost any part of the index?  Please post the index definition.

    he did post the index 
    on DeliveryTimetableItemID
    with INCLUDE of  Quantity and Cost

    with regards to the above example - 130k rows on #Subset with a possibly very wide distribution is most likely a reason for SQL to determine its better to do a index scan vs multiple lookups
    And we don't even know if that is what is doing as we don't have any actual explain plan - but likely

  • frederico_fonseca - Tuesday, February 5, 2019 2:13 PM

    sgmunson - Tuesday, February 5, 2019 1:46 PM

    julian.fletcher - Tuesday, February 5, 2019 2:24 AM

    sgmunson - Monday, February 4, 2019 11:55 AM

    Post the query that causes this problem.  Aggregates could be involved.  Alternatively, some kind of function could be involved, that would have to touch every row, just to then filter out most of them.

    The actual query relies on other stuff so I've extracted its essence. This also shows the problem.


    -- Get a small, random selection of IDs.
    Select Top 1 Percent DeliveryTimetableItemID
     Into #Subset
     From dbo.T_DeliveryTimetableItemResolvedCost DTIRC
     Order By NEWID ()

    -- Empty the buffer and display how much of the index is loaded. (It is 0%, as expected.)
    DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS
    Select dbo.IndexPercentInMemory ('IX_T_DeliveryTimetableItemResolvedCost_DeliveryTimetableItemID', Null)

    -- Use the index.
    Select Quantity, Cost
     Into #Temp
     From dbo.T_DeliveryTimetableItemResolvedCost DTIRC
     Join #Subset S On (S.DeliveryTimetableItemID = DTIRC.DeliveryTimetableItemID)

    -- This shows 100%.
    Select dbo.IndexPercentInMemory ('IX_T_DeliveryTimetableItemResolvedCost_DeliveryTimetableItemID', Null)

    Drop Table #Temp
    Drop Table #Subset

    If you change the percentage in the first select statement to "0.001" then the percentage loaded in to memory falls to 1.85. In fact, here's a table:

    Entirely reproducible. It just seems very odd that it would load up pretty much the whole index just to get 0.1% of the data.

    Okay, no aggregates or functions, but what does the index look like?  Are Quantity and Cost any part of the index?  Please post the index definition.

    he did post the index 
    on DeliveryTimetableItemID
    with INCLUDE of  Quantity and Cost

    with regards to the above example - 130k rows on #Subset with a possibly very wide distribution is most likely a reason for SQL to determine its better to do a index scan vs multiple lookups
    And we don't even know if that is what is doing as we don't have any actual explain plan - but likely

    Yes, I think that must be it - the fact that although we're accessing a small percentage of rows in the index, those rows are spread throughout the whole index. It's not something we see in the rest of the database but that's probably because access generally tends to be more focused.

    The original 'hurt' arose from the index not having the "etc." columns which resulted in a key lookup and the whole table being loaded in to memory. Putting those columns on the index meant we just loaded up 100% of that instead of 100% of the table; and that was about 700MB instead of 2GB, which was a step in the right direction.

    Thanks everybody!

  • julian.fletcher - Wednesday, February 6, 2019 7:09 AM

    frederico_fonseca - Tuesday, February 5, 2019 2:13 PM

    sgmunson - Tuesday, February 5, 2019 1:46 PM

    julian.fletcher - Tuesday, February 5, 2019 2:24 AM

    sgmunson - Monday, February 4, 2019 11:55 AM

    Post the query that causes this problem.  Aggregates could be involved.  Alternatively, some kind of function could be involved, that would have to touch every row, just to then filter out most of them.

    The actual query relies on other stuff so I've extracted its essence. This also shows the problem.


    -- Get a small, random selection of IDs.
    Select Top 1 Percent DeliveryTimetableItemID
     Into #Subset
     From dbo.T_DeliveryTimetableItemResolvedCost DTIRC
     Order By NEWID ()

    -- Empty the buffer and display how much of the index is loaded. (It is 0%, as expected.)
    DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS
    Select dbo.IndexPercentInMemory ('IX_T_DeliveryTimetableItemResolvedCost_DeliveryTimetableItemID', Null)

    -- Use the index.
    Select Quantity, Cost
     Into #Temp
     From dbo.T_DeliveryTimetableItemResolvedCost DTIRC
     Join #Subset S On (S.DeliveryTimetableItemID = DTIRC.DeliveryTimetableItemID)

    -- This shows 100%.
    Select dbo.IndexPercentInMemory ('IX_T_DeliveryTimetableItemResolvedCost_DeliveryTimetableItemID', Null)

    Drop Table #Temp
    Drop Table #Subset

    If you change the percentage in the first select statement to "0.001" then the percentage loaded in to memory falls to 1.85. In fact, here's a table:

    Entirely reproducible. It just seems very odd that it would load up pretty much the whole index just to get 0.1% of the data.

    Okay, no aggregates or functions, but what does the index look like?  Are Quantity and Cost any part of the index?  Please post the index definition.

    he did post the index 
    on DeliveryTimetableItemID
    with INCLUDE of  Quantity and Cost

    with regards to the above example - 130k rows on #Subset with a possibly very wide distribution is most likely a reason for SQL to determine its better to do a index scan vs multiple lookups
    And we don't even know if that is what is doing as we don't have any actual explain plan - but likely

    Yes, I think that must be it - the fact that although we're accessing a small percentage of rows in the index, those rows are spread throughout the whole index. It's not something we see in the rest of the database but that's probably because access generally tends to be more focused.

    The original 'hurt' arose from the index not having the "etc." columns which resulted in a key lookup and the whole table being loaded in to memory. Putting those columns on the index meant we just loaded up 100% of that instead of 100% of the table; and that was about 700MB instead of 2GB, which was a step in the right direction.

    Thanks everybody!

    Yep.  Missed that the index def was already posted.  The problem is clearly the use of NEWID() for your sample, as that can distribute the rows all across the index with even the smallest percentages.   Simulating normal use requires a different way to get your sample data.  Use whatever date constraints might usually apply first, and then sample within a date range.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply