Can anyone come up with a better optimized query please?

  • Hello,

    I am working on a query (optimized) to get some extra info for a base table.

    Tables facts:

    cteWBS_Element: 11,220

    CJI3 AIS 2015 : 297,517

    2015_470_900_GLDetl1_combined : 74,237,481

    [2015Vendors]: 28,883

    Query:

    set rowcount 0;

    with cteWBS_Element([WBS Element]) AS

    (

    select distinct [Origins ID Value] from [Telus CAR].[dbo].[AIS]

    where [asset clas] in

    ('D10',

    'A70',

    'A12',

    'A32',

    'A80',

    'B14',

    'C10',

    'E10',

    'H40',

    'B20',

    'L30',

    'L31',

    'G11',

    'G14')

    and [Origins ID Value] <> ''

    )

    select

    AIS.*, v.vendor, v.Name as [Vendor Name]

    from [Telus CAR].[dbo].[CJI3 AIS 2015] AIS

    inner join cteWBS_Element cte on cte.[WBS Element] = AIS.[WBS Element]

    inner join [Telus CAR 2016 (2015 data)].[dbo].[2015_470_900_GLDetl1_combined] gld on gld.[WBS Element] = AIS.[WBS Element]

    inner join [Telus CAR 2016 (2015 data)].[dbo].[2015Vendors] v on v.vendor = gld.vendor

    result returns: 3,061,519

    Time spent: 3:27

    Question:

    The total time spent on it seems not bad but I believe the result is wrong, that's why I post it here to see if anyone can help.

    The base table is from [Telus CAR].[dbo].[CJI3 AIS 2015] AIS, I was expecting the result should be no more than the original size which is 297,517 as I was attempting to retrieve vendor info from other tables for it. how come the result is augmented to much larger than its original result considering I am using inner join.

    The tables structures are purely based on spreadsheets we received from client and each table contains dozens of columns.

    If you do want to see the table schema, let me know and I will post it.

    Thank you very much.

  • 1) Spooling back 3M+ rows to a calling program (SSMS I suppose?) is going to be SLOW.

    2) Can you please capture the actual query plan and post that up?

    3) We will likely need table create scripts too, with indexes/keys.

    4) You mention an exploding of rows. That almost certainly means your INNER JOINs are not "complete" and you are getting a cartesian product somewhere. Review your data and data associations very closely and you can probably find where you went astray. Check the query plan and see the various rowcounts to assist.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Thank you for your reply, I do not have access to the server in weekend but I will post the needed plan here on Monday.

  • TheSQLGuru (1/6/2017)


    1) Spooling back 3M+ rows to a calling program (SSMS I suppose?) is going to be SLOW.

    This is expected

    2) Can you please capture the actual query plan and post that up?

    See the attachment please

    3) We will likely need table create scripts too, with indexes/keys.

    All the current tables have no keys/indice unfortunately.

    4) You mention an exploding of rows. That almost certainly means your INNER JOINs are not "complete" and you are getting a cartesian product somewhere. Review your data and data associations very closely and you can probably find where you went astray. Check the query plan and see the various rowcounts to assist.

    This is the biggest problem, the [WBS Elements], which is used for inner join, does NOT have the same existences in both tables, in this case,

    select distinct [Origins ID Value] from [Telus CAR].[dbo].[AIS]

    where [asset clas] in

    returns 11220 [Origins ID Value] which is actual [WBS Element] that is used for the inner join with [CJI3 AIS 2015] (4441 unique [WBS Element]), however, the other huge table [2015_470_900_GLDetl1_combined] has 19335 unique [WBS Element] (note the table itself has 74237481 records.

    Any thought?

    Thank you again.

  • I did some extraction before hand, and the new query is much faster now, however, still the result gets exploded. Here is what I did:

    1. I generated two perm tables for the [wbs element]

    2. I modified the query as below:

    set rowcount 0;

    select

    AIS.*, v.vendor, v.Name as [Vendor Name]

    from [Telus CAR].[dbo].[CJI3 AIS 2015] AIS

    inner join AIS_WBS_Element abe on abe.[WBS Element] = ais.[WBS Element]

    inner join [Telus CAR 2016 (2015 data)].[dbo].GLDetl1_combined_WBS_Element_w_vendor gld on gld.[WBS Element] = AIS.[WBS Element]

    inner join [Telus CAR 2016 (2015 data)].[dbo].[2015Vendors] v on v.vendor = gld.vendor

    AIS_WBS_Element: 11220

    GLDetl1_combined_WBS_Element_w_vendor: 32105

    [Telus CAR 2016 (2015 data)].[dbo].[2015Vendors]: 28883

    with this modified query, I am expecting vendor info from [2015Vendors] to be added to the [CJI3 AIS 2015], as the [CJI3 AIS 2015] has 297517 records, the final result should be 297517 as well.

    However, my query gets: 543395 records.

    The sqlplan is attached here for review.

    Thanks again for any input

  • You do have a partial CROSS JOIN somewhere. Looking at the query plan, the hash match join has inputs with 297K and 907K rows, but the output is 129,682K rows. A full INNER JOIN should never produce more rows than the larger of the two inputs. Since we don't know your table structure and you haven't provided any sample data, we cannot advise what you need to do to make this a full INNER JOIN.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • halifaxdal (1/9/2017)


    TheSQLGuru (1/6/2017)


    1) Spooling back 3M+ rows to a calling program (SSMS I suppose?) is going to be SLOW.

    This is expected

    2) Can you please capture the actual query plan and post that up?

    See the attachment please

    3) We will likely need table create scripts too, with indexes/keys.

    All the current tables have no keys/indice unfortunately.

    4) You mention an exploding of rows. That almost certainly means your INNER JOINs are not "complete" and you are getting a cartesian product somewhere. Review your data and data associations very closely and you can probably find where you went astray. Check the query plan and see the various rowcounts to assist.

    This is the biggest problem, the [WBS Elements], which is used for inner join, does NOT have the same existences in both tables, in this case,

    select distinct [Origins ID Value] from [Telus CAR].[dbo].[AIS]

    where [asset clas] in

    returns 11220 [Origins ID Value] which is actual [WBS Element] that is used for the inner join with [CJI3 AIS 2015] (4441 unique [WBS Element]), however, the other huge table [2015_470_900_GLDetl1_combined] has 19335 unique [WBS Element] (note the table itself has 74237481 records.

    Any thought?

    Thank you again.

    NOTE: all of this is based just on your first post this morning, not the follow-up you did.

    A) The query will be even slower than the 3+M rows being inserted into the table because of the large table scans and hash-match joins.

    B) You provided the estimated plan, not the full-blown-to-competion actual plan.

    C) cteWBS_Element obviously cannot be introducing a cartesian because of the DISTINCT that instantiates the CTE. However, the below join could be doing so, and you didn't make any mention of that table:

    inner join [Telus CAR 2016 (2015 data)].[dbo].[2015_470_900_GLDetl1_combined] gld on gld.[WBS Element] = AIS.[WBS Element]

    So can this join (unless v.vendor is unique to the 2015Vendors table):

    inner join [Telus CAR 2016 (2015 data)].[dbo].[2015Vendors] v on v.vendor = gld.vendor

    D) Performance is also affected by a pair of CONVERT_IMPLICIT's whereby you are joining varchar to Nvarchar columns. BAD stuff that. You can find this in the plan XML:

    <Warnings>

    <PlanAffectingConvert ConvertIssue="Cardinality Estimate" Expression="CONVERT_IMPLICIT(nvarchar(255),[gld].[WBS Element],0)" />

    <PlanAffectingConvert ConvertIssue="Seek Plan" Expression="[Telus CAR].[dbo].[AIS].[Origins ID Value]=CONVERT_IMPLICIT(nvarchar(255),[gld].[WBS Element],0)" />

    </Warnings>

    I have been consulting on SQL Server for 20+ years so you should pay attention when I tell you that mismatched datatypes are the single worst thing I see in total from a performance perspective. Variables, literals and column types MUST BE EXACTLY THE SAME to the best of your languages ability.

    My guess here is that someone defaulted the import of the excel spreadsheets and the import wizard got you Nvarchar 255 columns.

    I note that even after you fix these the performance will still suck due to the other factors. :hehe:

    E) Speaking of performance killers, your query is asking for a 3.7GB memory grant. Hope you have a very large system or a very fast tempdb drive (where most of that will wind up if you don't have lots of free memory - further hurting performance. Actually, noting the estimated rows (907K) going into two of the joins but actual being 3M you will likely be spooling to tempdb even if you get the requested 3.7GB memory grant. Fixing the cartesian and also the CONVERT_IMPLICITs will hopefully help get a better estimate and things will roll along without spooling down to tempdb.

    F) It may not be a cartesian at all. You may actually have valid data matches that get you the counts you are getting. Check a count of each join you have into the big table to investigate this.

    Sadly I am crushed today and likely won't have more time to assist. 🙁

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • I think I have the answer here now:

    It turns out the [WBS Element] and vendor is NOT a 1:1 relationship, it is 1:n, there are cases one [WBS Element] has 17 vendors!

    That explains the data explosion.

    Anyway, thank you for watching.

  • After I created two interim tables in advance, the final query actually finished in only 8 seconds! So the lesson I learn here is for cases that involve big table, it would be better to do some pre-analysis and create some interim table(s) before the actual query is to be designed, that saves tons of time.

    I hope this help later reader.

  • halifaxdal (1/9/2017)


    I did some extraction before hand, and the new query is much faster now, however, still the result gets exploded. Here is what I did:

    1. I generated two perm tables for the [wbs element]

    2. I modified the query as below:

    set rowcount 0;

    select

    AIS.*, v.vendor, v.Name as [Vendor Name]

    from [Telus CAR].[dbo].[CJI3 AIS 2015] AIS

    inner join AIS_WBS_Element abe on abe.[WBS Element] = ais.[WBS Element]

    inner join [Telus CAR 2016 (2015 data)].[dbo].GLDetl1_combined_WBS_Element_w_vendor gld on gld.[WBS Element] = AIS.[WBS Element]

    inner join [Telus CAR 2016 (2015 data)].[dbo].[2015Vendors] v on v.vendor = gld.vendor

    AIS_WBS_Element: 11220

    GLDetl1_combined_WBS_Element_w_vendor: 32105

    [Telus CAR 2016 (2015 data)].[dbo].[2015Vendors]: 28883

    with this modified query, I am expecting vendor info from [2015Vendors] to be added to the [CJI3 AIS 2015], as the [CJI3 AIS 2015] has 297517 records, the final result should be 297517 as well.

    However, my query gets: 543395 records.

    The sqlplan is attached here for review.

    Thanks again for any input

    something I notice, the difference in your final rowcount, 297517 to 495198 is very close percentagewise to the difference between 11220 to 18630, which is the rowcount after the execution plan joins AIS_WBS_Element with GLDetl1_combined_WBS_Element_w_vendor.

    EDIT: OK looks like I'm a bit late on my analysis, sorry 😉

  • Chris Harshman (1/9/2017)


    halifaxdal (1/9/2017)


    I did some extraction before hand, and the new query is much faster now, however, still the result gets exploded. Here is what I did:

    1. I generated two perm tables for the [wbs element]

    2. I modified the query as below:

    set rowcount 0;

    select

    AIS.*, v.vendor, v.Name as [Vendor Name]

    from [Telus CAR].[dbo].[CJI3 AIS 2015] AIS

    inner join AIS_WBS_Element abe on abe.[WBS Element] = ais.[WBS Element]

    inner join [Telus CAR 2016 (2015 data)].[dbo].GLDetl1_combined_WBS_Element_w_vendor gld on gld.[WBS Element] = AIS.[WBS Element]

    inner join [Telus CAR 2016 (2015 data)].[dbo].[2015Vendors] v on v.vendor = gld.vendor

    AIS_WBS_Element: 11220

    GLDetl1_combined_WBS_Element_w_vendor: 32105

    [Telus CAR 2016 (2015 data)].[dbo].[2015Vendors]: 28883

    with this modified query, I am expecting vendor info from [2015Vendors] to be added to the [CJI3 AIS 2015], as the [CJI3 AIS 2015] has 297517 records, the final result should be 297517 as well.

    However, my query gets: 543395 records.

    The sqlplan is attached here for review.

    Thanks again for any input

    something I notice, the difference in your final rowcount, 297517 to 495198 is very close percentagewise to the difference between 11220 to 18630, which is the rowcount after the execution plan joins AIS_WBS_Element with GLDetl1_combined_WBS_Element_w_vendor.

    EDIT: OK looks like I'm a bit late on my analysis, sorry 😉

    Thanks Chris.

  • drew.allen (1/9/2017)


    You do have a partial CROSS JOIN somewhere. Looking at the query plan, the hash match join has inputs with 297K and 907K rows, but the output is 129,682K rows. A full INNER JOIN should never produce more rows than the larger of the two inputs. Since we don't know your table structure and you haven't provided any sample data, we cannot advise what you need to do to make this a full INNER JOIN.

    Drew

    Thanks Drew

  • TheSQLGuru (1/6/2017)


    1) Spooling back 3M+ rows to a calling program (SSMS I suppose?) is going to be SLOW.

    2) Can you please capture the actual query plan and post that up?

    3) We will likely need table create scripts too, with indexes/keys.

    4) You mention an exploding of rows. That almost certainly means your INNER JOINs are not "complete" and you are getting a cartesian product somewhere. Review your data and data associations very closely and you can probably find where you went astray. Check the query plan and see the various rowcounts to assist.

    Thanks Kevin.

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply