Please Help - trying to eliminate "used tempdb to spill data"

  • Hi All,

    I've been trying to make the following query more performant by breaking it up into smaller pieces.

    SELECT MT.A3+MT.A4 AS A34,MT.A3

    -- ,M.*

    FROM

    Master_TAB M

    JOIN (SELECT M.A1,t3.A3,t3.A4,M.A6,M.A2,ROW_NUMBER() OVER (PARTITION BY A1,A6,A3,A4 ORDER BY A5 DESC) AS rownum

    FROM Master_TAB M

    JOIN TABle2 t2 ON M.A2=t2.A2

    JOIN Table3 t3 ON t2.A2=t3.A2

    ) MT ON M.A1=MT.A1 AND M.A2=MT.A2 AND MT.rownum=1

    ORDER BY MT.A3+MT.A4,MT.A1;

    I know that the Spill is caused by the Sort but I can't remove the sort (sort can't be done in front end).

    my master table had 1.7 million rows and almost 200 columns (bad design? I know but can't be changed as there's too much that would be affected)

    every row is little over 1KB

    here's my attempt...

    -- MASTER_TAB has 1.7 million rows and 50 columns

    CREATE TABLE [dbo].[tmp_ABC](

    [A1] [varchar](13) NOT NULL,

    [A2] [varchar](5) NOT NULL,

    [A3] [varchar](4) NOT NULL,

    [A4] [varchar](4) NOT NULL,

    [A5] [int] NULL

    ) ON [PRIMARY]

    -- inserted data into tmp_ABC from master Table MASTER_TAB (PK= A1,A2,A3)

    create index IDX_tmp_ABC on [tmp_all_price_Target] (A1, A3, A4, A5) INCLUDE (A2)

    CREATE TABLE [dbo].[tmp_DEF](

    [A1] [varchar](13) NOT NULL,

    [A2] [varchar](5) NOT NULL,

    [A3] [varchar](4) NOT NULL,

    [A4] [varchar](4) NOT NULL,

    [A5] [int] NULL

    ) ON [PRIMARY]

    insert into tmp_DEF

    Select p.A1, t.A2, p.A3, p.A4,p.A5

    from [tmp_ABC] t

    join (SELECT A1, A3, A4, max(A5) as A5

    FROM tmp_ABC

    group by A1, A3, A4) pon t.A1 = p.A1 and t.A3 = p.A3 and t.A4=p.A4 and t.A5 = p.A5

    create index IDX_tmp_DEF on [tmp_all_UPC_Sto_Loc_for_pri] (A1,A2, A3, A4, A5)

    create index IDX_tmp_DEF_sort on [tmp_all_UPC_Sto_Loc_for_pri] (A3,A4) INCLUDE (A1,A2, A5)

    -- this is the Query that is causing the Spill (in reality I'm supposed to bring back all 200 columns fro the master table but for debug purposes I limited the columns)

    Select c.A3+c.A4 as A34, c.A3, c.A1

    -- M.*

    from tmp_DEF c

    join MASTER_TAB M on M.A1 = c.A1 and M.A2 = c.A2

    order by c.A3, C.A4

    if I just run the following I get no spill:

    Select c.A3+c.A4 as A34, c.A3, c.A1

    from tmp_DEF c

    order by c.A3, C.A4

    as soon as I add the Master table as a Join I get the Spill...

    I read many articles, tried many suggested things (creating indexes... clustered, non-clustered) without success.

    Maybe I'm totally in Left Field and should enhance the performance going another route?

    I've been breaking my head for the past 3 days and can't seem to figure it out... I had to turn for help here...

    Thank you in advanced

    JG

  • Just curious as to whether you might be allowed to add the ROW_NUMBER() column to the table as a computed, persisted column... Might simplify things quite a bit, although you might pay for it on INSERTs. Just a thought...

    Steve (aka sgmunson) πŸ™‚ πŸ™‚ πŸ™‚
    Rent Servers for Income (picks and shovels strategy)

  • Hi sgmunson,

    I can't touch the Master Table...

    but a computed persistent column is virtual and calculated so it wouldn't affect the inserts or updates which might be a good idea...

    but how will it affect the performance?

    row numbers can change with every insert... not sure how this will work...

    maybe explain more what and how you see it...

    I appreciate this very much.

    thanks

    JG

  • I think you can optimise this by using a CROSS APPLY and TOP(1). Doing it with this method you should be able to remove the master table from the subquery and join to the outermost Master_TAB.

    SELECT MT.A3+MT.A4 AS A34,

    MT.A3

    -- ,M.*

    FROM Master_TAB M

    CROSS APPLY (SELECT TOP(1)

    t3.A3,

    t3.A4

    FROM TABle2 t2

    INNER JOIN Table3 t3

    ON t2.A2=t3.A2

    WHERE t2.A2 = M.A2

    ORDER BY A5 DESC

    ) AS MT

    ORDER BY MT.A3 + MT.A4, M.A1;

  • -- The outer query makes little sense, it only adds duplicate rows to a set which you're eliminating rows from

    SELECT MT.A3+MT.A4 AS A34, MT.A3

    -- ,M.*

    FROM Master_TAB M

    JOIN (

    SELECT M.A1, t3.A3, t3.A4, M.A6, M.A2, ROW_NUMBER() OVER (PARTITION BY A1, A6, A3, A4 ORDER BY A5 DESC) AS rownum

    FROM Master_TAB M

    JOIN TABle2 t2

    ON M.A2 = t2.A2

    JOIN Table3 t3

    ON t2.A2 = t3.A2

    ) MT

    ON M.A1=MT.A1 AND M.A2=MT.A2 AND MT.rownum=1

    ORDER BY MT.A3+MT.A4,MT.A1;

    -- Try variations on this

    SELECT MT.A3 + MT.A4 AS A34, MT.A3

    -- ,M.*

    FROM (

    SELECT

    --M.A1,

    t3.A3,

    t3.A4,

    --M.A6, M.A2,

    ROW_NUMBER() OVER (PARTITION BY A1, A6, A3, A4 ORDER BY A5 DESC) AS rownum

    FROM Master_TAB M

    JOIN TABle2 t2

    ON M.A2=t2.A2

    JOIN Table3 t3

    ON t2.A2=t3.A2

    ) MT

    WHERE MT.rownum=1

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • normally it would work, but the t2 table is a table that contains locations and groups of locations and the master table is the item master which contains pricing for groups of locations and single locations... depending on the priority which is A5 the price goes to a group or a location... each location can only get one item price.

    the t3 table is the links between the locations and groups of locations...

    the priority is set at the location and group level...

    if Location X is part of 5 groups + itself and an item is only defined for 3 groups, the query should return the highest group of the 3 groups (not 6)

    I hope this clarifies things...

  • I should add that the final result has to be the item, the location and the price

    example if Loc1, Loc2, and Loc3 are part of Group1 and the item price is for Group1,

    although the master table contains

    Item X, Group1

    the final query should return:

    item X, Loc1, priceY

    item X, Loc2, priceY

    item X, Loc3, priceY

    and the sort is to sort by location.

  • Couple of things. First, it would help if you could post the DDL for the table(s) involved, some sample data for the table(s) that is representative of the problem domain (again, sample data not production data), and the expected results of the query.

    All of this should be readily consumable (in other words all we need to do to is cut/paste/execute in SSMS).

    Regarding the DDL for the table(s), you only need to provide the columns needed for the query, not all 200 or how ever many there are if only 50 to 10 are actually used. Also need the indexes on the tables, and of course this may mean adding additional columns to the DDL for the table(s).

    Second, how often is this query run and is there any filtering done to reduce the data returned?

  • I will try to work on sample data and DDL info.... and post it tonight.

    thanks

    JG

  • Here it is... This script creates and populates the necessary tables to optimize the final query.

    The final Query spills data to Tempdb which I'm trying to avoid.

    CREATE TABLE [dbo].[Master_TAB](

    [UPC] [varchar](13) NOT NULL,

    [LocGroup] [varchar](5) NOT NULL,

    [priceLvl] [int] NOT NULL,

    [Price] [money] NULL,

    CONSTRAINT [Master_TAB_P01] PRIMARY KEY CLUSTERED

    ([UPC] ASC,

    [LocGroup] ASC,

    [PriceLvl] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    CREATE NONCLUSTERED INDEX [MASTER_TAB_LocGroup] ON [dbo].[Master_TAB]

    ([LocGroup] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)

    CREATE NONCLUSTERED INDEX [MASTER_TAB_PriceLvl] ON [dbo].[Master_TAB]

    ([PriceLvl] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)

    GO

    insert into [Master_TAB] values ('ABC','ALL',1,10.25)

    insert into [Master_TAB] values ('ABC','00002',1,10.55)

    insert into [Master_TAB] values ('ABC','Grp1',1,10.75)

    insert into [Master_TAB] values ('DEF','ALL',1,10.25)

    insert into [Master_TAB] values ('DEF','00001',1,10.55)

    insert into [Master_TAB] values ('DEF','Grp2',1,10.75)

    -- insert 10000 skus

    Declare @Cnt int

    Set @Cnt = 0

    while @cnt <10000

    begin

    insert into [Master_TAB]

    Select 'ABC' + convert(varchar(10),@cnt), 'ALL',1,10.25

    set @cnt = @cnt + 1

    end

    CREATE TABLE [dbo].[GroupLoc](

    [GroupLoc] [varchar](5) NOT NULL,

    [Priority] [int] NULL,

    CONSTRAINT [GroupLoc_GroupLoc] PRIMARY KEY CLUSTERED

    ([GroupLoc] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    insert into GroupLoc values ('00001', 10)

    insert into GroupLoc values ('00002', 10)

    insert into GroupLoc values ('00003', 10)

    insert into GroupLoc values ('00004', 10)

    insert into GroupLoc values ('All', 5)

    insert into GroupLoc values ('grp1', 15)

    insert into GroupLoc values ('grp2', 11)

    insert into GroupLoc values ('grp3', 13)

    CREATE TABLE [dbo].GroupLocLink(

    [GroupLoc] [varchar](5) NOT NULL,

    [Loc] [varchar](4) NOT NULL,

    [Terminal] [varchar](4) NOT NULL,

    CONSTRAINT [GroupLocLink_GroupLoc] PRIMARY KEY CLUSTERED

    ([GroupLoc] ASC,

    [Loc] ASC,

    [Terminal] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    CREATE NONCLUSTERED INDEX [GroupLocLink_Loc] ON [dbo].GroupLocLink

    ([Loc] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)

    CREATE NONCLUSTERED INDEX [GroupLocLink_Term] ON [dbo].GroupLocLink

    ([Terminal] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)

    insert into GroupLocLink values ('00001','0001','0001')

    insert into GroupLocLink values ('00001','0001','0002')

    insert into GroupLocLink values ('00002','0002','0001')

    insert into GroupLocLink values ('00003','0003','0001')

    insert into GroupLocLink values ('00004','0004','0001')

    insert into GroupLocLink values ('grp1','0001','0001')

    insert into GroupLocLink values ('grp1','0001','0002')

    insert into GroupLocLink values ('grp1','0003','0001')

    insert into GroupLocLink values ('grp2','0002','0001')

    insert into GroupLocLink values ('grp2','0004','0001')

    insert into GroupLocLink values ('grp3','0002','0001')

    insert into GroupLocLink values ('grp3','0003','0001')

    insert into GroupLocLink values ('grp3','0004','0001')

    insert into GroupLocLink values ('All','0001','0001')

    insert into GroupLocLink values ('All','0001','0002')

    insert into GroupLocLink values ('All','0002','0001')

    insert into GroupLocLink values ('All','0003','0001')

    insert into GroupLocLink values ('All','0004','0001')

    SELECT MT.loc+MT.terminal AS LocTerm,MT.Loc ,[priority], M.*

    FROM

    Master_TAB M JOIN

    (SELECT M.UPC,L.Loc,L.terminal,M.PriceLvl,L.GroupLoc, [priority], ROW_NUMBER() OVER (PARTITION BY UPC,PriceLvl,L.LOC,terminal ORDER BY [Priority] DESC) AS rownum

    FROM Master_TAB M

    JOIN GroupLoc G ON M.LocGroup=G.GroupLoc

    JOIN GroupLoclink L ON G.GroupLoc = L.GroupLoc

    ) MT ON M.UPC=MT.UPC AND M.PriceLvl=MT.PriceLvl AND M.LocGroup=MT.GroupLoc AND MT.rownum=1

    ORDER BY MT.loc,MT.terminal,M.UPC;

    /* CLean Up

    drop table [Master_TAB]

    drop table [GroupLoc]

    drop table GroupLocLink

    */

  • sgmunson (7/10/2015)


    Just curious as to whether you might be allowed to add the ROW_NUMBER() column to the table as a computed, persisted column... Might simplify things quite a bit, although you might pay for it on INSERTs. Just a thought...

    I don't believe that windowed functions can be used in computed columns, although I can't find anything specifically stating that.

    When I tried to create one, I got an error stating that windowed functions can only be used in a SELECT or an ORDER BY clause.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • drew.allen (7/10/2015)


    sgmunson (7/10/2015)


    Just curious as to whether you might be allowed to add the ROW_NUMBER() column to the table as a computed, persisted column... Might simplify things quite a bit, although you might pay for it on INSERTs. Just a thought...

    I don't believe that windowed functions can be used in computed columns, although I can't find anything specifically stating that.

    When I tried to create one, I got an error stating that windowed functions can only be used in a SELECT or an ORDER BY clause.

    Drew

    Can your column = (SELECT ROW_NUMBER() OVER(PARTITION BY somefield ORDER BY someotherfield) FROM thistablename) ?

    Steve (aka sgmunson) πŸ™‚ πŸ™‚ πŸ™‚
    Rent Servers for Income (picks and shovels strategy)

  • If my memory is correct, it will complain about not being able to use subqueries in that context, and requiring a scalar expression. It makes sense, since a query like that could (and likely would) return many rows.

    It is a bit strange that it isn't just like other places where subqueries can be used with expressions that expect one value, and fail only if they return more than 1 value. Probably a choice they made for simplicity's sake.

    Cheers!

  • Firstly, ROW_NUMBER() doesn't necessarily invoke a sort. Compare the execution plan of these two queries:

    SELECT *, rn = ROW_NUMBER() OVER(PARTITION BY [UPC], [LocGroup] ORDER BY PriceLvl) FROM #Master_TAB

    SELECT *, rn = ROW_NUMBER() OVER(PARTITION BY [LocGroup], [UPC] ORDER BY PriceLvl) FROM #Master_TAB

    In your case the parameters passed to ROW_NUMBER are not sufficiently aligned with an existing

    index sort order for SQL Server to avoid a sort.

    Secondly, the sort order of the ROW_NUMBER is different to the sort order of the outer query. Even

    if the sort order of ROW_NUMBER() was aligned with an index to make a sort operation unnecessary,

    you would still have the sort on the outer query. This outer query sort cost is reported in the plan to

    be about 2%. If you hardcode rownum as 1, the sort to support ROW_NUMBER is eliminated from the plan

    but the cost of the sort on the outer query rises to about 94%, so the gain in performance is way less

    than you would expect.

    Thirdly, a spill occurs when the actual number of rows being sorted exceeds the estimate. SQL Server

    relies on statistics to derive an excution plan. Try updating statistics on the tables used by the query

    to see of the plan changes. You could also consider alternative shapes for the query which might assist

    the optimiser choose a better plan. Here's an example:

    SELECT *

    FROM (

    SELECT

    l.loc + l.terminal AS LocTerm,

    l.Loc,

    l.[priority],

    l.terminal,

    m.*,

    rownum = ROW_NUMBER() OVER (PARTITION BY l.LOC, l.terminal, m.UPC, m.PriceLvl ORDER BY l.[Priority] DESC)

    FROM #Master_TAB M

    CROSS APPLY (

    SELECT l.loc, l.terminal, g.[priority]

    FROM #GroupLoc G

    INNER JOIN #GroupLoclink L

    ON M.LocGroup = L.GroupLoc

    WHERE M.LocGroup = G.GroupLoc

    ) l

    ) d

    WHERE rownum = 1

    ORDER BY d.loc, d.terminal, d.UPC

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Thanks for your response...

    1. the execution plans of the Row_number for:

    SELECT *, rn = ROW_NUMBER() OVER(PARTITION BY [UPC], [LocGroup] ORDER BY PriceLvl) FROM #Master_TAB

    SELECT *, rn = ROW_NUMBER() OVER(PARTITION BY [LocGroup], [UPC] ORDER BY PriceLvl) FROM #Master_TAB

    Tells me that the sort is invoked in the second query due to the fact that there's no index that covers [LocGroup], [UPC]...

    The first of the 2 queries doesn't invoke a sort due to the fact that [UPC], [LocGroup] is covered by the PK .

    2. correct me if I'm wrong or if I misunderstood, Although the execution plans of the queries are relative, For a query cost, 2 sort spills will have 2 lower costs overall, compared to 1 spill that will have 1 higher %.

    conclusion is that sometimes you can't avoid the spills but you need to manage them. right?

    3. RE:"Thirdly, a spill occurs when the actual number of rows being sorted exceeds the estimate."

    That's one case... In my case the statistics are up to date and the estimated rows match the actual rows.

    Based on the articles I read and this thread, We have to try to avoid Spills but sometimes it's just not feasible and you need to minimize the number of spills...

    Is this a good conclusion?

    thank you

    JG

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply