Eliminating a Distinct SORT operation

  • I have an UPDATE statement that joins two table by SendId. One table, I'll call it T1, has a clustered index on SendId ASC. The other table I will call T2 also has a clustered index on SendID ASC. All the columns from T2 are used to update T1. The execution plan shows a Clustered index scan on T2 and a Clustered Index Seek on T1 going into a Nested Loops inner join. Immediately following is a Distinct Sort that is done on SendId ASC. Why the Distinct SORT if the tables are already ordered by SendID?

  • lmarkum (6/19/2015)


    I have an UPDATE statement that joins two table by SendId. One table, I'll call it T1, has a clustered index on SendId ASC. The other table I will call T2 also has a clustered index on SendID ASC. All the columns from T2 are used to update T1. The execution plan shows a Clustered index scan on T2 and a Clustered Index Seek on T1 going into a Nested Loops inner join. Immediately following is a Distinct Sort that is done on SendId ASC. Why the Distinct SORT if the tables are already ordered by SendID?

    Without the query at a bare minimum there is nothing we can do help here. It would probably also help to know the table structures and indexes.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • And also post the actual execution plan as a *.sqlplan file.

  • lmarkum (6/19/2015)


    I have an UPDATE statement that joins two table by SendId. One table, I'll call it T1, has a clustered index on SendId ASC. The other table I will call T2 also has a clustered index on SendID ASC. All the columns from T2 are used to update T1. The execution plan shows a Clustered index scan on T2 and a Clustered Index Seek on T1 going into a Nested Loops inner join. Immediately following is a Distinct Sort that is done on SendId ASC. Why the Distinct SORT if the tables are already ordered by SendID?

    Are the two clustered indexes both unique?


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • Here is the SQL and the attached sql plan

    DECLARE @SendId INT

    SET @SendId = 1

    UPDATEEmailTrackingSummary

    SETEmailName = tets.EmailName,

    SentDate = tets.SentDate,

    SendDate = tets.SendDate,

    NumberSent = tets.NumberSent,

    NumberDelivered = tets.NumberDelivered,

    Duplicates = tets.Duplicates,

    UniqueClicks = tets.UniqueClicks,

    UniqueOpens = tets.UniqueOpens,

    HardBounces = tets.HardBounces,

    Unsubscribes = tets.Unsubscribes,

    ExistingUndeliverables = tets.ExistingUndeliverables,

    ExistingUnsubscribes = tets.ExistingUnsubscribes,

    InvalidAddresses = tets.InvalidAddresses,

    NumberExcluded = tets.NumberExcluded,

    NumberTargeted = tets.NumberTargeted,

    OtherBounces = tets.OtherBounces,

    SoftBounces = tets.SoftBounces,

    PreviewURL = tets.PreviewURL,

    ts = GetDate(),

    Reconciled = 0

    FROMEmailTrackingSummary ets

    INNER JOIN TempEmailTrackingSummary tets ON tets.SendId = ets.SendId

    WHERE@SendId IS NULL OR TETS.SendId = @SendId

  • lmarkum (6/19/2015)


    Here is the SQL and the attached sql plan

    DECLARE @SendId INT

    SET @SendId = 1

    UPDATEEmailTrackingSummary

    SETEmailName = tets.EmailName,

    SentDate = tets.SentDate,

    SendDate = tets.SendDate,

    NumberSent = tets.NumberSent,

    NumberDelivered = tets.NumberDelivered,

    Duplicates = tets.Duplicates,

    UniqueClicks = tets.UniqueClicks,

    UniqueOpens = tets.UniqueOpens,

    HardBounces = tets.HardBounces,

    Unsubscribes = tets.Unsubscribes,

    ExistingUndeliverables = tets.ExistingUndeliverables,

    ExistingUnsubscribes = tets.ExistingUnsubscribes,

    InvalidAddresses = tets.InvalidAddresses,

    NumberExcluded = tets.NumberExcluded,

    NumberTargeted = tets.NumberTargeted,

    OtherBounces = tets.OtherBounces,

    SoftBounces = tets.SoftBounces,

    PreviewURL = tets.PreviewURL,

    ts = GetDate(),

    Reconciled = 0

    FROMEmailTrackingSummary ets

    INNER JOIN TempEmailTrackingSummary tets ON tets.SendId = ets.SendId

    WHERE@SendId IS NULL OR TETS.SendId = @SendId

    Can you post the query plan and DDL for EmailTrackingSummary and TempEmailTrackingSummary?

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Answer removed, bad advice.

    I think I got to the bottom of this... Give me a few minutes...

    Ok I'm back. Again, DDL would really be helpful. Note the link in my signature line (or Sean's or Lynn's) for best practices on getting help/asking questions. What I'm really curious about is if there's a one-to-many relationship between EmailTrackingSummary and TempEmailTrackingSummary. If there is then there is a problem with your logic. Take the query below where I'm using a join to set the val of dbo.T_one/col1 with the value of dbo.T_many/col1.

    USE tempdb

    GO

    IF OBJECT_ID('tempdb.dbo.T_One') IS NOT NULL DROP TABLE dbo.T_One;

    IF OBJECT_ID('tempdb.dbo.T_Many') IS NOT NULL DROP TABLE dbo.T_Many;

    GO

    CREATE TABLE dbo.T_One

    (

    to_id int identity primary key,

    col1 varchar(10)

    );

    CREATE TABLE dbo.T_Many

    (

    tm_id int identity primary key,

    to_id int,

    col1 varchar(10)

    );

    INSERT INTO dbo.T_One (col1)

    VALUES ('xxx'),('xxx'),('xxx'),('xxx'),('xxx'),('xxx'),('xxx');

    INSERT INTO dbo.T_Many (to_id, col1)

    VALUES (1,'asd'),(1,'fgb'),(2,'ddd'),(3,'ddd'),(3,'ppp'),(3,'ooo'),

    (4,'ddd'),(4,'zzz'),(5,'123'),(6,'vvv'),(7,'yyy');

    UPDATE dbo.T_One

    SET col1 = m.col1

    FROM dbo.T_One o

    JOIN dbo.T_Many m ON o.to_id = m.to_id;

    SELECT * FROM T_One;

    Because I'm doing an update based on a one-to-many join the optimizer is selecting the first match between each id and performing the update based on that.

    Hopefully this helps you understand the problem with your update statement and why you are getting a DISTINCT sort. See attached query plan.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Alan,

    In your example the SORT is happening right after the table is read and prior to the Nested Loops join, which makes sense given your example. In my situation the TempEmailTrackingSummary table and the EmailTrackingSummary table go into a Nested Loops inner join and then the distinct sort occurs after the tables are joined. The clustered index on TempEmailTrackingSummary is not unique. I will have to learn more about the data and how this table is used in order to know if it can be made a unique index.

    Here is the DDL:

    /****** Object: Table [dbo].[TempEmailTrackingSummary] Script Date: 6/21/2015 10:32:44 PM ******/

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    SET ANSI_PADDING ON

    GO

    IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[TempEmailTrackingSummary]') AND type in (N'U'))

    BEGIN

    CREATE TABLE [dbo].[TempEmailTrackingSummary](

    [SendID] [int] NOT NULL,

    [EmailName] [varchar](150) NULL,

    [SentDate] [datetime] NULL,

    [SendDate] [datetime] NULL,

    [NumberSent] [int] NULL,

    [NumberDelivered] [int] NULL,

    [Duplicates] [int] NULL,

    [UniqueClicks] [int] NULL,

    [UniqueOpens] [int] NULL,

    [HardBounces] [int] NULL,

    [Unsubscribes] [int] NULL,

    [ExistingUndeliverables] [int] NULL,

    [ExistingUnsubscribes] [int] NULL,

    [InvalidAddresses] [int] NULL,

    [NumberExcluded] [int] NULL,

    [NumberTargeted] [int] NULL,

    [OtherBounces] [int] NULL,

    [SoftBounces] [int] NULL,

    [PreviewURL] [varchar](150) NULL

    ) ON [PRIMARY]

    END

    GO

    SET ANSI_PADDING OFF

    GO

    CREATE CLUSTERED INDEX [ClusteredIndex-SendId] ON [dbo].[TempEmailTrackingSummary]

    (

    [SendID] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)

    GO

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    SET ANSI_PADDING ON

    GO

    IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[EmailTrackingSummary]') AND type in (N'U'))

    BEGIN

    CREATE TABLE [dbo].[EmailTrackingSummary](

    [SendID] [int] NOT NULL,

    [EmailName] [varchar](150) NULL,

    [SentDate] [datetime] NULL,

    [SendDate] [datetime] NULL,

    [NumberSent] [int] NULL,

    [NumberDelivered] [int] NULL,

    [Duplicates] [int] NULL,

    [UniqueClicks] [int] NULL,

    [UniqueOpens] [int] NULL,

    [HardBounces] [int] NULL,

    [Unsubscribes] [int] NULL,

    [ExistingUndeliverables] [int] NULL,

    [ExistingUnsubscribes] [int] NULL,

    [InvalidAddresses] [int] NULL,

    [NumberExcluded] [int] NULL,

    [NumberTargeted] [int] NULL,

    [OtherBounces] [int] NULL,

    [SoftBounces] [int] NULL,

    [ts] [datetime] NULL,

    [PreviewURL] [varchar](150) NULL,

    [Reconciled] [bit] NULL DEFAULT ((0)),

    [ReconciledCount] [int] NULL DEFAULT ((0)),

    CONSTRAINT [PK_EmailTrackingSummary] PRIMARY KEY CLUSTERED

    (

    [SendID] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 80) ON [PRIMARY]

    ) ON [PRIMARY]

    END

    GO

    SET ANSI_PADDING OFF

    GO

    SET ANSI_PADDING ON

    GO

    /****** Object: Index [IX_EmailTrackingSummary_EmailName] Script Date: 6/21/2015 10:01:02 PM ******/

    IF NOT EXISTS (SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID(N'[dbo].[EmailTrackingSummary]') AND name = N'IX_EmailTrackingSummary_EmailName')

    CREATE NONCLUSTERED INDEX [IX_EmailTrackingSummary_EmailName] ON [dbo].[EmailTrackingSummary]

    (

    [EmailName] ASC

    )

    INCLUDE ( [NumberSent],

    [UniqueClicks],

    [UniqueOpens],

    [HardBounces],

    [Unsubscribes]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 80) ON [PRIMARY]

    GO

    /****** Object: Index [IX_EmailTrackingSummary_Reconciled_ReconciledCount] Script Date: 6/21/2015 10:01:02 PM ******/

    IF NOT EXISTS (SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID(N'[dbo].[EmailTrackingSummary]') AND name = N'IX_EmailTrackingSummary_Reconciled_ReconciledCount')

    CREATE NONCLUSTERED INDEX [IX_EmailTrackingSummary_Reconciled_ReconciledCount] ON [dbo].[EmailTrackingSummary]

    (

    [Reconciled] ASC,

    [ReconciledCount] ASC

    )

    INCLUDE ( [SendID]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 80) ON [PRIMARY]

    GO

  • lmarkum (6/19/2015)


    I have an UPDATE statement that joins two table by SendId. One table, I'll call it T1, has a clustered index on SendId ASC. The other table I will call T2 also has a clustered index on SendID ASC. All the columns from T2 are used to update T1. The execution plan shows a Clustered index scan on T2 and a Clustered Index Seek on T1 going into a Nested Loops inner join. Immediately following is a Distinct Sort that is done on SendId ASC. Why the Distinct SORT if the tables are already ordered by SendID?

    I think you've more or less answered your question with some encouragement from other folks. Whether you have or not, here's a demo which clearly shows the point - without a unique index on the source set, SQL Server can't determine if it contains dupes or not, so dedupes with a DISTINCT SORT.

    DROP TABLE #a; DROP TABLE #b;

    WITH

    e1 AS (SELECT d.n FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0)) d (n)),

    e2 AS (SELECT e1.n FROM e1 CROSS JOIN e1 b),

    e4 AS (SELECT e2.n FROM e2 CROSS JOIN e2 b),

    iT AS (SELECT n = ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM e4 CROSS JOIN e2 CROSS JOIN e1)

    SELECT

    n,

    c1 = NEWID(),

    c2 = DATEADD(millisecond,CHECKSUM(NEWID()),GETDATE()),

    c3 = ABS(CHECKSUM(NEWID()))

    INTO #a

    FROM iT

    CREATE CLUSTERED INDEX cx_Thing ON #a (n)

    SELECT TOP 24 *

    INTO #b

    FROM #a

    CREATE CLUSTERED INDEX cx_NotUnique ON #b (n)

    -- Check the execution plan: DISTINCT SORT

    UPDATE a SET

    c1 = b.c1,

    c2 = b.c2,

    c3 = b.c3

    FROM #a a

    INNER JOIN #b b ON b.n = a.n

    DROP INDEX cx_NotUnique ON #b

    CREATE UNIQUE CLUSTERED INDEX cx_Unique ON #b (n)

    -- Check the execution plan: NO SORT

    UPDATE a SET

    c1 = b.c1,

    c2 = b.c2,

    c3 = b.c3

    FROM #a a

    INNER JOIN #b b ON b.n = a.n

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply