Slow running query

  • I am currently working on project where we are extracting data from some AX tables. The data from these extracts is transformed in vary ways to be consumed by various applications. One of the of the transformations requires many column values to be concatenated into one column values. The query that is currently used to do this was written by a contractor who has now left. It uses the PATH mode FOR XML Path to concatenate values. The query currently takes 24 minutes to run. We are currently in development but ideally in production this data needs to be extracted every 7 minutes. I would just like to know if there is a way I can tune the query or rewrite it in another way that could speed up the retrieval. The source tables contain about 4- 5 million records.

    The query is shown below

    SELECT i.ITEMID AS CRItemID, i.PSTPRODUCTID AS CRProductID, i.NAMEALIAS AS CRName, i.CREATIONPERFORMER AS CRPerformer,

    i.CREATIONDURATION AS CRDuration, i.CREATIONSTATUS AS CRStatus, i.[MODIFIEDDATETIME] AS CRModifiedDateTime,

    STUFF

    ((SELECT ' ; ' + n.CRNUMBERTYPEID + '|#|' + n.CRNUMBER + CASE n.CRSOCIETYCODE WHEN '0' THEN '' ELSE '|#|' + n.CRSOCIETYCODE END

    FROM CRALTNUMBERS n

    WHERE n.CREATIONID = I.Itemid FOR XML PATH('')), 1, 3, '') AS CRNumbers,

    STUFF

    ((SELECT ' ; ' + t .CRTITLE

    FROM CRALTTITLES t

    WHERE t .CREATIONID = I.Itemid FOR XML PATH('')), 1, 3, '') AS CRTitles,

    STUFF

    ((SELECT ' ; ' + LTRIM(m.FIRSTNAME + ' ' + m.NAME)

    FROM MATCHNAME m

    WHERE m.CREATIONID = I.Itemid FOR XML PATH('')), 1, 3, '') AS CRNames,

    STUFF

    ((SELECT ' ; ' + b.BOMID

    FROM BOM b

    WHERE b.ITEMID = I.Itemid FOR XML PATH('')), 1, 3, '') AS CRParents,

    STUFF

    ((SELECT ' ; ' + b.ITEMID

    FROM BOM b

    WHERE b.BOMID = I.Itemid FOR XML PATH('')), 1, 3, '') AS CRChildren,

    STUFF

    ((SELECT ' ; ' + p.DATA

    FROM PSTPRODTMPLDATA p

    WHERE p.TEMPLATEID = I.Itemid AND p.SPECID = 'Episode title' FOR XML PATH('')), 1, 3, '') AS CREpisodeTitle,

    STUFF

    ((SELECT ' ; ' + p.DATA

    FROM PSTPRODTMPLDATA p

    WHERE p.TEMPLATEID = I.Itemid AND p.SPECID = 'Commercial brand' FOR XML PATH('')), 1, 3, '') AS CRComBrand,

    STUFF

    ((SELECT ' ; ' + p.DATA

    FROM PSTPRODTMPLDATA p

    WHERE p.TEMPLATEID = I.Itemid AND p.SPECID = 'Commercial product' FOR XML PATH('')), 1, 3, '') AS CRComProduct,

    STUFF

    ((SELECT ' ; ' + p.DATA

    FROM PSTPRODTMPLDATA p

    WHERE p.TEMPLATEID = I.Itemid AND p.SPECID = 'Commercial type' FOR XML PATH('')), 1, 3, '') AS CRComType,

    STUFF

    ((SELECT ' ; ' + p.DATA

    FROM PSTPRODTMPLDATA p

    WHERE p.TEMPLATEID = I.Itemid AND p.SPECID = 'Country of origin' FOR XML PATH('')), 1, 3, '') AS CRCountry,

    STUFF

    ((SELECT ' ; ' + p.DATA

    FROM PSTPRODTMPLDATA p

    WHERE p.TEMPLATEID = I.Itemid AND p.SPECID = 'Year of production' FOR XML PATH('')), 1, 3, '') AS CRYear

    FROMINVENTTABLE i

    I have also attached a copy of the execution plan of the latest query run. Any help would be much appreciated.

  • I would start with looking at what indexes exist on tables

    [CRALTNUMBERS]

    [MATCHNAME]

    [BOM]

    [CRALTTITLES]

    [font="Courier New"]

    TableColumnOperationIndexActual RowsEst Rowsdiff

    [BOM]BOMIDIndex Spool (Eager Spool)311,69790,394,96090,083,263

    [CRALTNUMBERS]CREATIONIDIndex Spool (Eager Spool)147,98940,412,55840,264,569

    [MATCHNAME]CREATIONIDIndex Spool (Eager Spool)811,51639,993,03639,181,520

    [BOM]ITEMIDIndex Spool (Eager Spool)185,47019,798,58519,613,115

    [CRALTTITLES]CREATIONIDIndex Spool (Eager Spool)155,4971,639,8011,484,304

    [/font]

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • The way you have done the SELECT (SELECT...), (SELECT..) ... FROM forces those massive nested loop joins on 900K rows and the logical IO from those is stunningly high.

    First, make sure you have a covering index on each of the correlated subqueries. But even that may not help you with all those 900K iterations.

    If it isn't fast enough, I would switch to a CLR object to do the concatenation. That will probably be the most efficient. It also avoids not one but TWO scenarios currently where you can get the WRONG OUTPUT!! A) you have no order by in the correlated subqueries meaning they can put the concatenation in any order on output and B) XML "special characters" can actually BREAK the FOR XML processing (think <, >, etc).

    You could even try a cursor-based solution to build the output of each of those correlated subqueries.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Hi guys,

    thanks for the responses. I will try that alternative approach and let you know what my results are. CLR came up when I was discussing this with a colleague of mine. I will also check this out and let you know what my findings are.

  • eseosaoregie (4/13/2013)


    Hi guys,

    thanks for the responses. I will try that alternative approach and let you know what my results are. CLR came up when I was discussing this with a colleague of mine. I will also check this out and let you know what my findings are.

    I think Adam Machanic has done some blogging on the CLR side of things that could be useful.

    You could also see here for some (mostly bad) options for string concat: https://www.simple-talk.com/sql/t-sql-programming/concatenating-row-values-in-transact-sql/

    And this post shows you why the COALESCE/ISNULL method isn't viable (another ordering issue just like the FOR XML problem): http://msmvps.com/blogs/robfarley/archive/2007/04/08/coalesce-is-not-the-answer-to-string-concatentation-in-t-sql.aspx

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • eseosaoregie (4/13/2013)


    I am currently working on project where we are extracting data from some AX tables. The data from these extracts is transformed in vary ways to be consumed by various applications. One of the of the transformations requires many column values to be concatenated into one column values. The query that is currently used to do this was written by a contractor who has now left. It uses the PATH mode FOR XML Path to concatenate values. The query currently takes 24 minutes to run. We are currently in development but ideally in production this data needs to be extracted every 7 minutes. I would just like to know if there is a way I can tune the query or rewrite it in another way that could speed up the retrieval. The source tables contain about 4- 5 million records.

    The query is shown below

    SELECT i.ITEMID AS CRItemID, i.PSTPRODUCTID AS CRProductID, i.NAMEALIAS AS CRName, i.CREATIONPERFORMER AS CRPerformer,

    i.CREATIONDURATION AS CRDuration, i.CREATIONSTATUS AS CRStatus, i.[MODIFIEDDATETIME] AS CRModifiedDateTime,

    STUFF

    ((SELECT ' ; ' + n.CRNUMBERTYPEID + '|#|' + n.CRNUMBER + CASE n.CRSOCIETYCODE WHEN '0' THEN '' ELSE '|#|' + n.CRSOCIETYCODE END

    FROM CRALTNUMBERS n

    WHERE n.CREATIONID = I.Itemid FOR XML PATH('')), 1, 3, '') AS CRNumbers,

    STUFF

    ((SELECT ' ; ' + t .CRTITLE

    FROM CRALTTITLES t

    WHERE t .CREATIONID = I.Itemid FOR XML PATH('')), 1, 3, '') AS CRTitles,

    STUFF

    ((SELECT ' ; ' + LTRIM(m.FIRSTNAME + ' ' + m.NAME)

    FROM MATCHNAME m

    WHERE m.CREATIONID = I.Itemid FOR XML PATH('')), 1, 3, '') AS CRNames,

    STUFF

    ((SELECT ' ; ' + b.BOMID

    FROM BOM b

    WHERE b.ITEMID = I.Itemid FOR XML PATH('')), 1, 3, '') AS CRParents,

    STUFF

    ((SELECT ' ; ' + b.ITEMID

    FROM BOM b

    WHERE b.BOMID = I.Itemid FOR XML PATH('')), 1, 3, '') AS CRChildren,

    STUFF

    ((SELECT ' ; ' + p.DATA

    FROM PSTPRODTMPLDATA p

    WHERE p.TEMPLATEID = I.Itemid AND p.SPECID = 'Episode title' FOR XML PATH('')), 1, 3, '') AS CREpisodeTitle,

    STUFF

    ((SELECT ' ; ' + p.DATA

    FROM PSTPRODTMPLDATA p

    WHERE p.TEMPLATEID = I.Itemid AND p.SPECID = 'Commercial brand' FOR XML PATH('')), 1, 3, '') AS CRComBrand,

    STUFF

    ((SELECT ' ; ' + p.DATA

    FROM PSTPRODTMPLDATA p

    WHERE p.TEMPLATEID = I.Itemid AND p.SPECID = 'Commercial product' FOR XML PATH('')), 1, 3, '') AS CRComProduct,

    STUFF

    ((SELECT ' ; ' + p.DATA

    FROM PSTPRODTMPLDATA p

    WHERE p.TEMPLATEID = I.Itemid AND p.SPECID = 'Commercial type' FOR XML PATH('')), 1, 3, '') AS CRComType,

    STUFF

    ((SELECT ' ; ' + p.DATA

    FROM PSTPRODTMPLDATA p

    WHERE p.TEMPLATEID = I.Itemid AND p.SPECID = 'Country of origin' FOR XML PATH('')), 1, 3, '') AS CRCountry,

    STUFF

    ((SELECT ' ; ' + p.DATA

    FROM PSTPRODTMPLDATA p

    WHERE p.TEMPLATEID = I.Itemid AND p.SPECID = 'Year of production' FOR XML PATH('')), 1, 3, '') AS CRYear

    FROMINVENTTABLE i

    I have also attached a copy of the execution plan of the latest query run. Any help would be much appreciated.

    The major problem with that query is that it recalculates the colon separate list for like rows which is a huge waste of resources. Using "Divide'n'Conquer" methods, a separate table should be calculated to hold single instances of each concatenation grouped by I.ItemID and p.SPECID using a single query with a GROUP BY to build such a thing.

    The same is also true of the other lookup tables (like the BOM table). The same information is concatenated over and over and over for each row.

    The key to performance on this problem will be to correctly "pre-aggregate" the concatenations in separate tables and then join to those tables.

    To wit, even the introduction of a CLR to do the concatenation might not be as performant as it could be because it would still have to do the concatentation of identical data using the current structure of the current query.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Good points Jeff. I would definitely look at pre-populating the concatenated objects as temp tables. I had that in my notes I took while looking at alternatives for this and just missed putting it in the reply! :hehe:

    I looked at prepopulating those temp objects tho and I think in order to get the table key and the concatenated string is a double hit on the table for the methods I checked (except for cursor, which has is it's own issues obviously). SQL CLR into a temp object could still be best.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Here's an example of what I'm talking about. Of course, I don't have access to the data so the code is completely untested but this will calculate the delimited aggregations just once for each TemplateID/SpecID combination in the PstProdTmplData table instead of 6 identical recalculations for each and every row of the InventTable.

    While the following looks like a lot of code, it will allow your code to run with comparatively blazing performance as to the way it is currently structured. It's a very common method (pre-aggregate and pivot using a Cross Tab) to solve the problem of using an EAV table effectively.

    WITH

    ctePreAgg AS

    (

    --===== Preaggregate the semi-colon delimited "Data" for each TemplateID/SpecID combination

    -- for performance. We'll pivot the data later

    SELECT p1.TemplateID

    , p1.SpecID --To be used as a join filter in another query.

    , SpecIDs =

    STUFF(

    (

    SELECT ' ; ' + p2.Data

    FROM dbo.PstProdTmplData p2

    WHERE p2.TemplateID = p1.TemplateID

    FOR XML PATH('')

    )

    ,1,3,'')

    FROM dbo.PstProdTmplData p1

    WHERE p1.SpecID IN

    (

    'Episode title'

    ,'Commercial brand'

    ,'Commercial product'

    ,'Commercial type'

    ,'Country of origin'

    ,'Year of production'

    )

    AND p.TemplateID IN (SELECT ItemID FROM dbo.InventTable) --Implicitly DISTINCT and as fast as a join.

    GROUP BY p1.TemplateID, p1.SpecID

    ) --=== Now, pivot the data so that it's normalized instead of being an EAV-style result set and store it

    -- all in a temporary lookup table for easy and very high performance joining in the final query.

    SELECT TemplateID = ISNULL(TemplateID,0) --Makes the column in the lookup table NOT NULL.

    , CREpisodeTitle = MAX(CASE WHEN SpecID = 'Episode title' THEN SpecIDs ELSE '' END)

    , CRComBrand] = MAX(CASE WHEN SpecID = 'Commercial brand' THEN SpecIDs ELSE '' END)

    , CRComProduct = MAX(CASE WHEN SpecID = 'Commercial product' THEN SpecIDs ELSE '' END)

    , CRComType = MAX(CASE WHEN SpecID = 'Commercial type' THEN SpecIDs ELSE '' END)

    , CRCountry = MAX(CASE WHEN SpecID = 'Country of origin' THEN SpecIDs ELSE '' END)

    , CRYear = MAX(CASE WHEN SpecID = 'Year of production' THEN SpecIDs ELSE '' END)

    INTO #Lookup_PstProdTmplData

    FROM ctePreAgg

    GROUP BY TemplateID

    ;

    --===== Add a Primary Key for extra join performance. We let the system name the PK on Temp Tables

    -- because such constraints must be uniquely named in the database and we don't want to destroy

    -- the ability of more than one instance of the code to run conncurrently.

    ALTER TABLE #Lookup_PstProdTmplData

    ADD PRIMARY KEY CLUSTERED (TemplateID)

    ;

    Once that's inplace, the final query becomes a blazing-performance cake walk. (Note that I didn't pre-aggregate/pivot all the tables that should be. You have to have some of the fun! 😛 )

    SELECT i.ITEMID AS CRItemID, i.PSTPRODUCTID AS CRProductID, i.NAMEALIAS AS CRName, i.CREATIONPERFORMER AS CRPerformer,

    i.CREATIONDURATION AS CRDuration, i.CREATIONSTATUS AS CRStatus, i.[MODIFIEDDATETIME] AS CRModifiedDateTime,

    p.CREpisodeTitle, p.CRComBrand, p.CRComProduct, p.CRComType, p.CRCountry, p.CRYear

    FROM dbo.InventTable i

    JOIN #Lookup_PstProdTmplData p

    ON p.TemplateID = i.ItemID

    As a bit of a sidebar, this isn't Oracle and we don't have the 30 character object name limitation. Consider NOT using abbreviations for table names in the future as they serve only to daze and confuse the uninitiated. For example, whoever designed these tables used the name "PstProdTmplData" for a table that should have been named "PostProductionTemplate". It's only 7 characters longer and there's no chance of someone misreading the "Tmp" in the original abbreviated name as meaning "Temporary" especially if the miss the "l" in the name.

    Also, we all know tables have data in them so the word "Data" in the table name is a bit superfluous.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Jeff, thanks for your reply. I will give this a go when i am back at the office. Just one question for future reference. What is an EAV table?

  • eseosaoregie (4/13/2013)


    Hi Jeff, thanks for your reply. I will give this a go when i am back at the office. Just one question for future reference. What is an EAV table?

    Entity-Attribute-Value hence EAV.

  • TheSQLGuru (4/13/2013)


    Good points Jeff. I would definitely look at pre-populating the concatenated objects as temp tables. I had that in my notes I took while looking at alternatives for this and just missed putting it in the reply! :hehe:

    I looked at prepopulating those temp objects tho and I think in order to get the table key and the concatenated string is a double hit on the table for the methods I checked (except for cursor, which has is it's own issues obviously). SQL CLR into a temp object could still be best.

    Agreed on the double hit but it would be one-time and that's far better than hitting the table 6 times for every row in the outer query.

    While I agree that SQL CLR can work miracles with strings, I'm also aware that there is some overhead involved. I could certainly be wrong but I don't believe that SQL CLR would be much more effecient in this particular case.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • eseosaoregie (4/13/2013)


    Hi Jeff, thanks for your reply. I will give this a go when i am back at the office. Just one question for future reference. What is an EAV table?

    As Lynn pointed out, it stands for "Entity, Attribute, Value" which identifies the general structure of the EAV table. These tables offer incredible flexibility because the table structure doesn't need to change to "add a column" like a normalized table would. As is true with many other things, such flexibility comes at a great cost when trying to use the data. For example, most people make the mistake of storing the data as character based data which causes the meta-data for the data type of the data to be lost. And, as you've just seen, unless you work with pre-aggregated Cross Tabs or Pivots, the data can be quite difficult to interogate. For the most part, you can also forget about DRI (Declared Referential Integrity) and other forms of check constraints. The big advantages are, of course, the flexibility to add attributes (virtual columns, in this case) to the entities (individual templates for items, in this case) without any structural changes ever as well as only having to worry about a single, well formed index (although, an index on the data itself will be impractical if it exceeds 900 bytes in width). Another advantage is that there's never the need for a NULL in an EAV.

    In most cases, such tables should generally be avoided because the benefits usually don't come close to outweighing the benefits of normalized tables.

    In the case of the post production template table you have, the TemplateID is the "Entity" that will have many attributes (columns). The SpecID column is actually the attribute identifier or "column name" that the data would be stored in if it were a normal table. Of course, your Data column is the "Value" column.

    Again, these types of tables should generally be avoided because they violate most practical rules for relational databases, including but not limited to even the most basic rules of normalization. They're almost as bad as XML or any other tagged data structure when it comes to normalization and DRI. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks for the explanation. After reading it I realized that I knew it as another name, open schema. Will let you know how I get on.

  • interesting......

    still stand by my first post ref table indexes....

    was intrigued with this....so have built a test harness that I hope goes someway to replicating OP real world environment...please feel free to play.

    test harness build take about 1-2mins

    USE [tempdb]

    GO

    /* create some sample data...*/

    IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[MATCHNAME]') AND type in (N'U'))

    DROP TABLE [MATCHNAME]

    GO

    IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[INVENTTABLE]') AND type in (N'U'))

    DROP TABLE [INVENTTABLE]

    GO

    IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[CRALTTITLES]') AND type in (N'U'))

    DROP TABLE [CRALTTITLES]

    GO

    IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[CRALTNUMBERS]') AND type in (N'U'))

    DROP TABLE [CRALTNUMBERS]

    GO

    IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[BOM]') AND type in (N'U'))

    DROP TABLE [BOM]

    GO

    IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[BOM]') AND type in (N'U'))

    DROP TABLE [BOM]

    GO

    IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[PSTPRODTMPLDATA]') AND type in (N'U'))

    DROP TABLE [PSTPRODTMPLDATA]

    GO

    SELECT TOP 500000 ----

    ITEMID = IDENTITY(INT, 1, 1)

    INTO INVENTTABLE

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2

    CROSS JOIN sys.all_columns ac3

    ALTER TABLE [dbo].[INVENTTABLE] ADD CONSTRAINT [PK_INVENTTABLE] PRIMARY KEY CLUSTERED ([ITEMID] ASC)

    GO

    SELECT TOP 2000000 ----

    CREATIONID = 1 + CAST(Abs(Checksum(Newid()) % 500000 ) AS INT),

    CRNUMBERTYPEID = CAST(Abs(Checksum(Newid()) % 100000 ) AS varchar(8)),

    CRNUMBER = CAST(Abs(Checksum(Newid()) % 20000 ) AS varchar(8)),

    CRSOCIETYCODE = CAST(Abs(Checksum(Newid()) % 2 ) AS varchar(8))

    INTO CRALTNUMBERS

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2

    CROSS JOIN sys.all_columns ac3

    CREATE NONCLUSTERED INDEX [NIC_CRALTNUMBERS] ON [dbo].[CRALTNUMBERS]

    ([CREATIONID] ASC )

    INCLUDE ([CRNUMBERTYPEID],[CRNUMBER],[CRSOCIETYCODE])

    GO

    SELECT TOP 4000000 ----

    CREATIONID = 1 + CAST(Abs(Checksum(Newid()) % 500000 ) AS INT),

    CRTITLE = CHAR(Abs(Checksum(Newid())) % 10 + 65)

    + CHAR(Abs(Checksum(Newid())) % 10 + 65)

    + CHAR(Abs(Checksum(Newid())) % 10 + 65)

    + CHAR(Abs(Checksum(Newid())) % 10 + 65)

    + CHAR(Abs(Checksum(Newid())) % 10 + 65)

    INTO CRALTTITLES

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2

    CROSS JOIN sys.all_columns ac3

    CREATE NONCLUSTERED INDEX [NIC_CRALTTITLES] ON [dbo].[CRALTTITLES]

    ([CREATIONID] ASC)

    INCLUDE ([CRTITLE])

    GO

    SELECT TOP 3000000 ----

    CREATIONID = 1 + CAST(Abs(Checksum(Newid()) % 500000 ) AS INT),

    FIRSTNAME = CHAR(Abs(Checksum(Newid())) % 26 + 65)

    + CHAR(Abs(Checksum(Newid())) % 26 + 65)

    + CHAR(Abs(Checksum(Newid())) % 26 + 65)

    + CHAR(Abs(Checksum(Newid())) % 26 + 65)

    + CHAR(Abs(Checksum(Newid())) % 26 + 65),

    NAME = CHAR(Abs(Checksum(Newid())) % 26 + 65)

    + CHAR(Abs(Checksum(Newid())) % 26 + 65)

    + CHAR(Abs(Checksum(Newid())) % 26 + 65)

    + CHAR(Abs(Checksum(Newid())) % 26 + 65)

    + CHAR(Abs(Checksum(Newid())) % 26 + 65)

    INTO MATCHNAME

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2

    CROSS JOIN sys.all_columns ac3

    CREATE NONCLUSTERED INDEX [NIC_MATCHNAME] ON [dbo].[MATCHNAME]

    ([CREATIONID] ASC)

    INCLUDE ([FIRSTNAME],[NAME])

    GO

    SELECT TOP 1000000 ----

    BOMID = 1 + CAST(Abs(Checksum(Newid()) % 300000 ) AS INT),

    ITEMID = 300001 + CAST(Abs(Checksum(Newid()) % 500000 ) AS INT)

    INTO BOM

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2

    CROSS JOIN sys.all_columns ac3

    CREATE NONCLUSTERED INDEX [NIC_BOM1] ON [dbo].[BOM]

    ([ITEMID] ASC,[BOMID] ASC)

    GO

    CREATE NONCLUSTERED INDEX [NIC_BOM2] ON [dbo].[BOM]

    ([BOMID] ASC,[ITEMID] ASC)

    GO

    SELECT TOP 5000000 ----

    TEMPLATEID = 1 + CAST(Abs(Checksum(Newid()) % 500000 ) AS INT),

    SPECID = CHAR(Abs(Checksum(Newid())) % 6 + 65),

    DATA = CAST(Rand(Checksum(Newid())) * 9999 AS VARCHAR(18))

    INTO PSTPRODTMPLDATA

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2

    CROSS JOIN sys.all_columns ac3

    CREATE NONCLUSTERED INDEX [NIC_PSTPRODTMPLDATA] ON [dbo].[PSTPRODTMPLDATA]

    ([SPECID] ASC,[TEMPLATEID] ASC)

    INCLUDE ([DATA])

    GO

    running the code below...with all the asscociated indexes takes about 1 min on a test server...2008R2 64 16gb RAM

    /* BUILD A RESULTS TABLE */

    SET STATISTICS IO , TIME ON;

    IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[TMP_RESULTS_TSQL]') AND type in (N'U'))

    DROP TABLE TMP_RESULTS_TSQL

    GO

    SELECT i.ITEMID AS CRItemID,

    STUFF

    ((SELECT ' ; ' + n.CRNUMBERTYPEID + '|#|' + n.CRNUMBER + CASE n.CRSOCIETYCODE WHEN '0' THEN '' ELSE '|#|' + n.CRSOCIETYCODE END

    FROM CRALTNUMBERS n

    WHERE n.CREATIONID = I.Itemid FOR XML PATH('')), 1, 3, '') AS CRNumbers,

    STUFF

    ((SELECT ' ; ' + t .CRTITLE

    FROM CRALTTITLES t

    WHERE t .CREATIONID = I.Itemid FOR XML PATH('')), 1, 3, '') AS CRTitles,

    STUFF

    ((SELECT ' ; ' + LTRIM(m.FIRSTNAME + ' ' + m.NAME)

    FROM MATCHNAME m

    WHERE m.CREATIONID = I.Itemid FOR XML PATH('')), 1, 3, '') AS CRNames ,

    STUFF

    ((SELECT ' ; ' + CAST ( b.BOMID as varchar)

    FROM BOM b

    WHERE b.ITEMID = I.Itemid FOR XML PATH('')), 1, 3, '') AS CRParents,

    STUFF

    ((SELECT ' ; ' + CAST ( b.ITEMID as varchar)

    FROM BOM b

    WHERE b.BOMID = I.Itemid FOR XML PATH('')), 1, 3, '') AS CRChildren,

    STUFF

    ((SELECT ' ; ' + p.DATA

    FROM PSTPRODTMPLDATA p

    WHERE p.TEMPLATEID = I.Itemid AND p.SPECID = 'A' FOR XML PATH('')), 1, 3, '') AS CREpisodeTitle,

    STUFF

    ((SELECT '; ' + p.DATA

    FROM PSTPRODTMPLDATA p

    WHERE p.TEMPLATEID = I.Itemid AND p.SPECID = 'B' FOR XML PATH('')), 1, 3, '') AS CRComBrand,

    STUFF

    ((SELECT ' ; ' + p.DATA

    FROM PSTPRODTMPLDATA p

    WHERE p.TEMPLATEID = I.Itemid AND p.SPECID = 'C' FOR XML PATH('')), 1, 3, '') AS CRComProduct,

    STUFF

    ((SELECT ' ; ' + p.DATA

    FROM PSTPRODTMPLDATA p

    WHERE p.TEMPLATEID = I.Itemid AND p.SPECID = 'D' FOR XML PATH('')), 1, 3, '') AS CRComType,

    STUFF

    ((SELECT ' ; ' + p.DATA

    FROM PSTPRODTMPLDATA p

    WHERE p.TEMPLATEID = I.Itemid AND p.SPECID = 'E' FOR XML PATH('')), 1, 3, '') AS CRCountry,

    STUFF

    ((SELECT ' ; ' + p.DATA

    FROM PSTPRODTMPLDATA p

    WHERE p.TEMPLATEID = I.Itemid AND p.SPECID = 'F' FOR XML PATH('')), 1, 3, '') AS CRYear

    INTO TMP_RESULTS_TSQL

    FROMINVENTTABLE i

    SET STATISTICS IO , TIME OFF;

    in order to try and improve on this I used SSIS and broke each "STUFF" clause into temp tables. indexed and then combined into a single result set....

    create an "Excecute SQL task" for each "Stuff"...

    eg

    IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[TMP_CRALTNUMBERS]') AND type in (N'U'))

    DROP TABLE TMP_CRALTNUMBERS

    GO

    SELECT ISNULL(CREATIONID,0) as creationid,

    STUFF (

    (

    SELECT ' ; ' + CRNUMBERTYPEID + '|#|' + CRNUMBER + CASE CRSOCIETYCODE WHEN '0' THEN '' ELSE '|#|' + CRSOCIETYCODE END

    FROM CRALTNUMBERS C2

    WHERE C1.CREATIONID = C2.CREATIONID

    FOR XML PATH('')),1,3,' ') AS CRNumbers

    INTO TMP_CRALTNUMBERS

    FROM CRALTNUMBERS C1

    GROUP BY CREATIONID

    ALTER TABLE [dbo].[TMP_CRALTNUMBERS] ADD CONSTRAINT [PK_TMP_CRALTNUMBERS] PRIMARY KEY CLUSTERED ([creationid] ASC)

    ;

    put all of these tasks in one "sequence container"...this allows SSIS to use multiple processing threads.

    on completion of above build a results table

    IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[TMP_RESULTS_SSIS]') AND type in (N'U'))

    DROP TABLE [TMP_RESULTS_SSIS]

    GO

    SELECT INVENTTABLE.ITEMID, TMP_CRALTNUMBERS.CRNumbers, TMP_CRALTTITLES.CRTitles, TMP_MATCHNAME.CRNames, TMP_A.A, tMP_B.A AS B, TMP_C.A AS C,

    TMP_D.A AS D, TMP_E.A AS E, TMP_F.A AS F, TMP_BOMP.CRParents, TMP_BOMC.CRcHILD

    INTO TMP_RESULTS_SSIS

    FROM INVENTTABLE LEFT OUTER JOIN

    TMP_BOMC ON INVENTTABLE.ITEMID = TMP_BOMC.BOMID LEFT OUTER JOIN

    TMP_BOMP ON INVENTTABLE.ITEMID = TMP_BOMP.ITEMID LEFT OUTER JOIN

    TMP_F ON INVENTTABLE.ITEMID = TMP_F.TEMPLATEiD LEFT OUTER JOIN

    TMP_E ON INVENTTABLE.ITEMID = TMP_E.TEMPLATEiD LEFT OUTER JOIN

    TMP_D ON INVENTTABLE.ITEMID = TMP_D.TEMPLATEiD LEFT OUTER JOIN

    TMP_C ON INVENTTABLE.ITEMID = TMP_C.TEMPLATEiD LEFT OUTER JOIN

    tMP_B ON INVENTTABLE.ITEMID = tMP_B.TEMPLATEiD LEFT OUTER JOIN

    TMP_A ON INVENTTABLE.ITEMID = TMP_A.TEMPLATEiD LEFT OUTER JOIN

    TMP_MATCHNAME ON INVENTTABLE.ITEMID = TMP_MATCHNAME.CREATIONID LEFT OUTER JOIN

    TMP_CRALTTITLES ON INVENTTABLE.ITEMID = TMP_CRALTTITLES.CREATIONID LEFT OUTER JOIN

    TMP_CRALTNUMBERS ON INVENTTABLE.ITEMID = TMP_CRALTNUMBERS.creationid

    this process took around 30 secs

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • Hi Guys,

    I have managed to make some headway with the long running query. In the first instance I took the advice J Livingston and indexed the source tables. With this the original query dropped from 25mins to 4min 50sec.

    I rewrote the script along the lines suggested by Jeff and the query ran in 3min 30 secs. I had to make a few adjustments to the Stuff statements as I checked again with the consumer of the data to find out what actually needed to be concatenated.

    The final script was as follows:

    WITH ctePreAggPstProdTmpl

    AS

    (

    SELECT TemplateID

    , SpecID --To be used as a join filter in another query.

    , SpecIDs =

    STUFF(

    (

    SELECT ' ; ' + Data FOR XML PATH('')),1,3,'')

    FROM dbo.PstProdTmplData

    WHERE SpecID IN

    (

    'Episode title'

    ,'Commercial brand'

    ,'Commercial product'

    ,'Commercial type'

    ,'Country of origin'

    ,'Year of production'

    )

    AND TemplateID IN (SELECT ItemID FROM dbo.InventTable) --Implicitly DISTINCT and as fast as a join.

    GROUP BY TemplateID, SpecID,Data

    )

    --Pivot Aggregate Data

    SELECT TemplateID = ISNULL(TemplateID,0) --Makes the column in the lookup table NOT NULL.

    , CREpisodeTitle = MAX(CASE WHEN SpecID = 'Episode title' THEN SpecIDs ELSE '' END)

    , CRComBrand = MAX(CASE WHEN SpecID = 'Commercial brand' THEN SpecIDs ELSE '' END)

    , CRComProduct = MAX(CASE WHEN SpecID = 'Commercial product' THEN SpecIDs ELSE '' END)

    , CRComType = MAX(CASE WHEN SpecID = 'Commercial type' THEN SpecIDs ELSE '' END)

    , CRCountry = MAX(CASE WHEN SpecID = 'Country of origin' THEN SpecIDs ELSE '' END)

    , CRYear = MAX(CASE WHEN SpecID = 'Year of production' THEN SpecIDs ELSE '' END)

    INTO #Lookup_PstProdTmplData

    FROM ctePreAggPstProdTmpl

    GROUP BY TemplateID

    ALTER TABLE #Lookup_PstProdTmplData

    ADD PRIMARY KEY CLUSTERED (TemplateID)

    ;

    -- CREATE CRAltNumber Lookup

    SELECT N1.[CREATIONID],

    STUFF

    ((SELECT ' ; ' + N2.CRNUMBERTYPEID + '|#|' + N2.CRNUMBER + CASE N2.CRSOCIETYCODE WHEN '0' THEN '' ELSE '|#|' + N2.CRSOCIETYCODE END

    FROM CRALTNUMBERS N2

    WHERE N1.CREATIONID = N2.CREATIONID FOR XML PATH('')), 1, 3, '') AS CRNumbers

    INTO #Lookup_CRALTNUMBERS

    FROM CRALTNUMBERS N1 WHERE N1.[CREATIONID] IN (SELECT ItemID FROM dbo.InventTable)

    GROUP BY N1.[CREATIONID]

    ALTER TABLE #Lookup_CRALTNUMBERS

    ADD PRIMARY KEY CLUSTERED (CREATIONID)

    -- CREATE CRTITLE Lookup

    SELECT N1.[CREATIONID],

    STUFF

    ((SELECT ' ; ' + N2.CRTITLE

    FROM [dbo].[CRALTTITLES] N2

    WHERE N1.CREATIONID = N2.CREATIONID FOR XML PATH('')), 1, 3, '') AS CRTitles

    INTO #Lookup_CRALTitles

    FROM [CRALTTITLES] N1 WHERE N1.[CREATIONID] IN (SELECT ItemID FROM dbo.InventTable)

    GROUP BY N1.[CREATIONID]

    ALTER TABLE #Lookup_CRALTitles

    ADD PRIMARY KEY CLUSTERED (CREATIONID)

    -- CREATE MATCHNAME Lookup

    SELECT N1.[CREATIONID],

    STUFF

    ((SELECT ' ; ' + LTRIM(N2.FIRSTNAME + ' ' + N2.NAME)

    FROM [dbo].[MATCHNAME] N2

    WHERE N1.CREATIONID = N2.CREATIONID FOR XML PATH('')), 1, 3, '') AS CRNames

    INTO #Lookup_MatchNames

    FROM [CRALTTITLES] N1 WHERE N1.[CREATIONID] IN (SELECT ItemID FROM dbo.InventTable)

    GROUP BY N1.[CREATIONID]

    ALTER TABLE #Lookup_MatchNames

    ADD PRIMARY KEY CLUSTERED (CREATIONID)

    -- CREATE BOM Parents Lookup

    SELECT b1.[ITEMID],

    STUFF

    ((SELECT ' ; ' + b2.BOMID

    FROM BOM b2

    WHERE b1.ITEMID = b2.ITEMID FOR XML PATH('')), 1, 3, '') AS CRParents

    INTO #Lookup_BOMParents

    FROM [dbo].[BOM] b1 WHERE b1.ITEMID IN (SELECT ItemID FROM dbo.InventTable)

    GROUP BY b1.[ITEMID]

    ALTER TABLE #Lookup_BOMParents

    ADD PRIMARY KEY CLUSTERED (ITEMID)

    --CREATE BOM Children Lookup

    SELECT b1.BOMID,

    STUFF

    ((SELECT ' ; ' + b2.ItemID

    FROM BOM b2

    WHERE b1.BOMID = b2.ITEMID FOR XML PATH('')), 1, 3, '') AS CRChildren

    INTO #Lookup_BOMChildren

    FROM [dbo].[BOM] b1 WHERE b1.BOMID IN (SELECT ItemID FROM dbo.InventTable)

    GROUP BY b1.BOMID

    ALTER TABLE #Lookup_BOMChildren

    ADD PRIMARY KEY CLUSTERED (BOMID)

    SELECT i.ITEMID AS CRItemID, i.PSTPRODUCTID AS CRProductID, i.NAMEALIAS AS CRName, i.CREATIONPERFORMER AS CRPerformer,

    i.CREATIONDURATION AS CRDuration, i.CREATIONSTATUS AS CRStatus, i.[MODIFIEDDATETIME] AS CRModifiedDateTime,

    c.CRNumbers, T.CRTitles, M.CRNames, B.CRParents, W.CRChildren, P.CREpisodeTitle, p.CRComBrand, p.CRComProduct, p.CRComType, p.CRCountry, p.CRYear

    FROM dbo.InventTable i

    LEFT JOIN #Lookup_PstProdTmplData p

    ON p.TemplateID = i.ItemID

    LEFT JOIN #Lookup_CRALTNUMBERS C

    ON C.CREATIONID = i.ItemID

    LEFT JOIN #Lookup_MatchNames M

    ON M.CreationID = I.ItemID

    LEFT JOIN #Lookup_CRALTitles T

    ON T.CREATIONID = i.ItemID

    LEFT JOIN #Lookup_BOMParents B

    ON B.ITEMID = i.ItemID

    LEFT JOIN #Lookup_BOMChildren W

    ON W.BOMID = I.ITEMID

    I will follow the advice of J Livingston and run within SSIS with separate SQL Tasks for each stuff component. Anymore increased performance will be a further bonus. Many thanks for the help and tips. Really helped change my mindset with this query. The comments about naming conventions have been taken on board 😉

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply