Understanding execution plan query

  • All,

    Firstly to put this into context - I'm trying to improve my knowledge on reading execution plans, there isn't a particular business issue that I need to solve.

    Secondly I think I've provided all the information required. Apologises if I've missed anything?

    DDL:

    /****** Object:  Table [dbo].[Posts]    Script Date: 19/10/2020 14:25:24 ******/
    SET ANSI_NULLS ON
    GO

    SET QUOTED_IDENTIFIER ON
    GO

    CREATE TABLE [dbo].[Posts](
    [Id] [int] IDENTITY(1,1) NOT NULL,
    [AcceptedAnswerId] [int] NULL,
    [AnswerCount] [int] NULL,
    [Body] [nvarchar](max) NOT NULL,
    [ClosedDate] [datetime] NULL,
    [CommentCount] [int] NULL,
    [CommunityOwnedDate] [datetime] NULL,
    [CreationDate] [datetime] NOT NULL,
    [FavoriteCount] [int] NULL,
    [LastActivityDate] [datetime] NOT NULL,
    [LastEditDate] [datetime] NULL,
    [LastEditorDisplayName] [nvarchar](40) NULL,
    [LastEditorUserId] [int] NULL,
    [OwnerUserId] [int] NULL,
    [ParentId] [int] NULL,
    [PostTypeId] [int] NOT NULL,
    [Score] [int] NOT NULL,
    [Tags] [nvarchar](150) NULL,
    [Title] [nvarchar](250) NULL,
    [ViewCount] [int] NOT NULL,
    [CreationYear] [int] NULL,
    [CreationMonth] [int] NULL
    ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
    GO


    /****** Object: Index [idxPostCDID] Script Date: 19/10/2020 14:26:03 ******/
    CREATE UNIQUE CLUSTERED INDEX [idxPostCDID] ON [dbo].[Posts]
    (
    [CreationDate] DESC,
    [Id] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
    GO
    USE [StackOverflow]
    GO


    /****** Object: Index [idxPostID] Script Date: 19/10/2020 14:26:15 ******/
    CREATE UNIQUE NONCLUSTERED INDEX [idxPostID] ON [dbo].[Posts]
    (
    [Id] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
    GO


    My reading of the attached execution plan is as follows:

    1. Node id 4 scans the idxPostCDID clustered index and outputs the OwnerUserID

    2. Node id 9 scans the same index with the predicate 'ownerID is null'.

    3. The outputs from these nodes get refined by other operators then and then joined in node ID 1.

    4. Node ID 4 and 9 use 47% of the query each.

    I would appreciate any advice/help on the following:

    The index is far too large to fit into memory. Will the system ensure that when a page is loaded into memory it is used by both node 4 and 9 before being released from memory or could it load a page, remove it from the buffer cache and then have to load it again? Could it go a step further by reading a page from the buffer cache and using it for both nodes?

    The reason for my question is as follows:

    Initially I was surprised to see two operators reading through the same index. I think the optimiser is trying to take advantage of parallelism? However the I/O system is slow (it's just a training setup) so reading the pages into memory twice would be far slower than any advantage gained by parallelism. Statistics are up to date and I noticed it shows 'optimisation level: full' for the plan.

    If it does co-ordinate the pages then is the plan working out the percentage for nodes 4 and 9 as follows?:

    Reading the index will take 94% of the plan, it's used by two operators so show it as 47% each.

    I appreciate that a post execution plan (or to use a, perhaps, newer phrasing - execution plan with run time metrics) is better for performance tuning, I'm more focusing on understanding plans rather than performance tuning in this instance.

    Thanks

    Editor 2  - Adding a line here

    Editor: Testing the edits

     

    • This topic was modified 3 years, 6 months ago by  as_1234. Reason: Small clarification and renaming the attachment to .txt so it wil upload
    • This topic was modified 3 years, 6 months ago by  Steve Jones - SSC Editor. Reason: testing
    • This topic was modified 3 years, 6 months ago by  Steve Jones - SSC Editor. Reason: Including a line
    Attachments:
    You must be logged in to view attached files.
  • When using NOT IN, you should make sure a NULL value does not appear in the results.  For example, like below.  Try that first and see if it cleans up the query plan.

    select id from users

    where id not in (

    select posts.OwnerUserId from posts where posts.OwnerUserId is not null )

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Thank you for your help. That did improve the plan.

    I would also be interested to know if my reading of the original plan was correct.

  • Yes, I believe you reading of the original plan was correct.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply