How can I combine the three queries to one and possibly improve performance

  • I wish I can get help from this forum, any suggestion is appreciated.

    I have a project with some existing sql queries, here are three I think should be combined into only one for the performance optimization purpose.

    For the sake of simplicity and confidentiality here are the modified codes:

    1. table1 inner join table2 to generate temp table
    2. from temp table create table cnt with the count for each type
    3. table1 inner join table3, and apply a where clause to generate the final table

    create table table1 (id int, region int, typeid int)
    insert into table1 (id, region, typeid ) values (1,2,1)
    insert into table1 (id, region, typeid ) values (2,3,1)
    insert into table1 (id, region, typeid ) values (3,4,2)
    insert into table1 (id, region, typeid ) values (4,1,2)
    insert into table1 (id, region, typeid ) values (5,1,2)
    insert into table1 (id, region, typeid ) values (6,2,4)

    create table table2 (id int, type varchar(10))
    insert into table2 (id, type) values (1,'A')
    insert into table2 (id, type) values (2,'B')
    insert into table2 (id, type) values (3,'C')
    insert into table2 (id, type) values (4,'D')

    table temp is the initial inner join of the two initial tables:
    create table temp as
    select a.*, b.* from table1 a
    inner join table2 b on a.typeid = b.id

    table cnt has the count for each type
    create table cnt as
    select c.type, count(1) as total from temp c group by type

    Final table (a.type is updated to a.typeid, thanks to Jonathan for pointing that):
    create table final as 
    select a.region, a.type, a.typeid from table1 a
    inner join cnt c on a.type = c.type
    where c.total > 2

    In the given sample data, the final table should contains only records with type = 2 because type2 has count > 2

    Kindly keep in mind  the current table1 and 2 contains millions of row counts.

    Thank you very much.

  • You can nest queries if you want... not that big a deal.

    SELECT x.Field1, x.Field2
    FROM (SELECT a.Field1, a.Field2
                 FROM MyTable a
                 WHERE...) x

    ???

    Oh, and don't forget the HAVING clause.

  • Thank you but it doesn't address the question, key point here is to shrink down the original size (over million row count) of table 1, from the last query:

    select a.region, a.type from table1 a 
    inner join cnt c on a.type = c.type
    where c.total > 2

    You can see it as table cnt is a derived table:

    create table cnt as
    select c.type, count(1) as total from temp c group by type

    And table temp is also a staging table from the original table 1:

    create table temp as
    select a.*, b.* from table1 a
    inner join table2 b on a.typeid = b.id

  • SELECT
     d.region,
     d.type
    FROM (
     SELECT
      t1.region,
      t2.type,
      total = COUNT(*) OVER(PARTITION BY t1.typeID)
     FROM table1 t1
     INNER JOIN table2 t2
      ON t2.id = t1.typeid
    ) d
    WHERE d.Total > 2

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • It could be more efficient to get only typeid from table1 in the first pass, like below.  This depends on indexing and the cardinality (how many t1 rows have typeids that appear at least 3 times vs overall total of t1 rows).


    SELECT t1.region, t1.type
    FROM (
        SELECT typeid, COUNT(*) AS typeid_count
        FROM dbo.table1
        GROUP BY type_id
        HAVING COUNT(*) >= 3
    ) AS typeid_counts
    INNER JOIN dbo.table1 t1 ON t1.typeid = typeid_counts.typeid
    INNER JOIN dbo.table2 t2 ON t2.id = t1.typeid

    Or it could perform better to get the region with the typeid from t1.  You probably want to look at the query plans for each one to decide which is better for your setup.


    SELECT typeid_counts.region, t2.type
    FROM (
        SELECT region, typeid, COUNT(*) OVER(PARTITION BY typeid) AS typeid_count
        FROM dbo.table1
    ) AS typeid_counts
    INNER JOIN dbo.table2 t2 ON t2.id = typeid_counts.typeid
    WHERE typeid_counts.typeid_count >= 3

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • OK, thank you guys but I doubt none of the replies catched it.

    As indicated in my reply to the first reply:

    From the last query:

    select a.region, a.type from table1 a 
    inner join cnt c on a.type = c.type
    where c.total > 2

    You can see it as table cnt is a derived table:

    create table cnt as
    select c.type, count(1) as total from temp c group by type

    And table temp is also a staging table from the original table 1:

    create table temp as
    select a.*, b.* from table1 a
    inner join table2 b on a.typeid = b.id

    Both Chris and Scott's queries are working on the table1 directly, it needs to talk to the temp table that is derived from table1, the row count is different between table1 and temp, temp is generated with table1 joining another table2, right?

    Thank you both.

  • PasLe Choix - Tuesday, December 11, 2018 12:57 PM

    OK, thank you guys but I doubt none of the replies catched it.

    As indicated in my reply to the first reply:

    From the last query:

    select a.region, a.type from table1 a 
    inner join cnt c on a.type = c.type
    where c.total > 2

    You can see it as table cnt is a derived table:

    create table cnt as
    select c.type, count(1) as total from temp c group by type

    And table temp is also a staging table from the original table 1:

    create table temp as
    select a.*, b.* from table1 a
    inner join table2 b on a.typeid = b.id

    Both Chris and Scott's queries are working on the table1 directly, it needs to talk to the temp table that is derived from table1, the row count is different between table1 and temp, temp is generated with table1 joining another table2, right?

    Thank you both.

    Your queries don't work.
    1. Both tables in the following join have a column call 'Id'. All columns on a table have to have distinct names.
    select a.*, b.* from table1 a
    inner join table2 b on a.typeid = b.id

    2. table1 does not have a column named 'type' so would give an error.
    select a.region, a.type from table1 a
    inner join cnt c on a.type = c.type
    where c.total > 2

  • I'm genuinely confused.

    Your q states"How can I combine the three queries to one".  We write queries that combine the 3 queries into a single query. 

    But then you object that we're not keeping the first query separate??  That seems contradictory to me.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Jonathan AC Roberts - Tuesday, December 11, 2018 1:57 PM

    PasLe Choix - Tuesday, December 11, 2018 12:57 PM

    OK, thank you guys but I doubt none of the replies catched it.

    As indicated in my reply to the first reply:

    From the last query:

    select a.region, a.type from table1 a 
    inner join cnt c on a.type = c.type
    where c.total > 2

    You can see it as table cnt is a derived table:

    create table cnt as
    select c.type, count(1) as total from temp c group by type

    And table temp is also a staging table from the original table 1:

    create table temp as
    select a.*, b.* from table1 a
    inner join table2 b on a.typeid = b.id

    Both Chris and Scott's queries are working on the table1 directly, it needs to talk to the temp table that is derived from table1, the row count is different between table1 and temp, temp is generated with table1 joining another table2, right?

    Thank you both.

    Your queries don't work.
    1. Both tables in the following join have a column call 'Id'. All columns on a table have to have distinct names.
    select a.*, b.* from table1 a
    inner join table2 b on a.typeid = b.id

    2. table1 does not have a column named 'type' so would give an error.
    select a.region, a.type from table1 a
    inner join cnt c on a.type = c.type
    where c.total > 2

    Good catch but please do understand that I simplified the original queries.
    1. a.* would include all the fields in Table1, and Table2 also contains a field with/without the same name that can be joined based on them, for the simplicity I call them ID.
    2. yes your second point is correct, sorry about that. it will error out, I'll made update to the original post if it is allowed.

  • PasLe Choix - Tuesday, December 11, 2018 12:57 PM

    OK, thank you guys but I doubt none of the replies catched it.

    As indicated in my reply to the first reply:

    From the last query:

    select a.region, a.type from table1 a 
    inner join cnt c on a.type = c.type
    where c.total > 2

    You can see it as table cnt is a derived table:

    create table cnt as
    select c.type, count(1) as total from temp c group by type

    And table temp is also a staging table from the original table 1:

    create table temp as
    select a.*, b.* from table1 a
    inner join table2 b on a.typeid = b.id

    Both Chris and Scott's queries are working on the table1 directly, it needs to talk to the temp table that is derived from table1, the row count is different between table1 and temp, temp is generated with table1 joining another table2, right?

    Thank you both.

    Heh... that's a bit crazy.  You said you wanted to combine the 3 queries to be more efficient.  You can go straight from the tables to the final answer without the use of any temporary structures whatsoever.  Why are you insisting that we follow the very query methodology that you said you wanted to avoid?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • ScottPletcher - Tuesday, December 11, 2018 2:47 PM

    I'm genuinely confused.

    Your q states"How can I combine the three queries to one".  We write queries that combine the 3 queries into a single query. 

    But then you object that we're not keeping the first query separate??  That seems contradictory to me.

    Hi Scott,

    It doesn't contradict.
    By combining queries doesn't mean to lose its initial function, right?

    I am sorry if the question is not clear to you, let me try to explain here:
    1. the tables (schema, sample data) are just for demo'ing the question, the original queries/tables are much much more complex, I can't post the original queries here for the reasons of: a. I will violate company's security policy; b. it will confuse most of people here because the business logic between the tables are complicated.  
    2. I have three queries, I like to combine them into one single query
    3. the original queries generate temp table from original table1 (by joining a second table)
    4. the temp table will be transformed (group by) into another reference table (cnt)
    5. the cnt table is used to filter the very original table1 and generate the table final

    I hope my explain above help clarify some confuse you have, I do understand your confuse though, I spent half an hour to simplify the original queries. 🙂

    Thank you and thanks to everyone who spent time reading this post. Any suggestion/hint is appreciated.

  • PasLe Choix - Tuesday, December 11, 2018 6:06 PM

    ScottPletcher - Tuesday, December 11, 2018 2:47 PM

    I'm genuinely confused.

    Your q states"How can I combine the three queries to one".  We write queries that combine the 3 queries into a single query. 

    But then you object that we're not keeping the first query separate??  That seems contradictory to me.

    Hi Scott,

    It doesn't contradict.
    By combining queries doesn't mean to lose its initial function, right?

    I am sorry if the question is not clear to you, let me try to explain here:
    1. the tables (schema, sample data) are just for demo'ing the question, the original queries/tables are much much more complex, I can't post the original queries here for the reasons of: a. I will violate company's security policy; b. it will confuse most of people here because the business logic between the tables are complicated.  
    2. I have three queries, I like to combine them into one single query
    3. the original queries generate temp table from original table1 (by joining a second table)
    4. the temp table will be transformed (group by) into another reference table (cnt)
    5. the cnt table is used to filter the very original table1 and generate the table final

    I hope my explain above help clarify some confuse you have, I do understand your confuse though, I spent half an hour to simplify the original queries. 🙂

    Thank you and thanks to everyone who spent time reading this post. Any suggestion/hint is appreciated.

    What will you use the breadcrumb trail of temp tables for?  They're totally unnecessary to get to the final answer.  As you said, the original table can have millions of rows.  Materializing all of that data in the form of temp tables for the interim answers is a serious waste of I/O, memory, disk space, and clock cycles.  Unless the temp tables are going to be used for something else, they're totally unnecessary to solve for the "end game" problem you defined.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Tuesday, December 11, 2018 6:08 PM

    PasLe Choix - Tuesday, December 11, 2018 6:06 PM

    ScottPletcher - Tuesday, December 11, 2018 2:47 PM

    I'm genuinely confused.

    Your q states"How can I combine the three queries to one".  We write queries that combine the 3 queries into a single query. 

    But then you object that we're not keeping the first query separate??  That seems contradictory to me.

    Hi Scott,

    It doesn't contradict.
    By combining queries doesn't mean to lose its initial function, right?

    I am sorry if the question is not clear to you, let me try to explain here:
    1. the tables (schema, sample data) are just for demo'ing the question, the original queries/tables are much much more complex, I can't post the original queries here for the reasons of: a. I will violate company's security policy; b. it will confuse most of people here because the business logic between the tables are complicated.  
    2. I have three queries, I like to combine them into one single query
    3. the original queries generate temp table from original table1 (by joining a second table)
    4. the temp table will be transformed (group by) into another reference table (cnt)
    5. the cnt table is used to filter the very original table1 and generate the table final

    I hope my explain above help clarify some confuse you have, I do understand your confuse though, I spent half an hour to simplify the original queries. 🙂

    Thank you and thanks to everyone who spent time reading this post. Any suggestion/hint is appreciated.

    What will you use the breadcrumb trail of temp tables for?  They're totally unnecessary to get to the final answer.  As you said, the original table can have millions of rows.  Materializing all of that data in the form of temp tables for the interim answers is a serious waste of I/O, memory, disk space, and clock cycles.  Unless the temp tables are going to be used for something else, they're totally unnecessary to solve for the "end game" problem you defined.

    "Materializing all of that data in the form of temp tables for the interim answers is a serious waste" ---- Thank you Jeff, I cannot agree more with you on this. In reality sometimes we have no choice but to compromise in certain extent, right? And that is exactly why I am seeking help here. Those three queries are the existing queries in the current pipeline, and I see there is a where clause at the end, I wonder if I can use that where clause somehow earlier from the beginning so  that the involved data volume could be brought down, if yes, the volume would be dramatically brought and the efficiency of the new query would be dramatically improved.  Thanks.

  • PasLe Choix - Monday, December 10, 2018 7:41 PM

    Final table (a.type is updated to a.typeid, thanks to Jonathan for pointing that):
    create table final as 
    select a.region, a.type, a.typeid from table1 a
    inner join cnt c on a.type = c.type
    where c.total > 2

    where you have inner join cnt c on a.type = c.type 
    There is no type on table1, so what should it be?

  • PasLe Choix - Wednesday, December 12, 2018 7:25 AM

    Jeff Moden - Tuesday, December 11, 2018 6:08 PM

    PasLe Choix - Tuesday, December 11, 2018 6:06 PM

    ScottPletcher - Tuesday, December 11, 2018 2:47 PM

    I'm genuinely confused.

    Your q states"How can I combine the three queries to one".  We write queries that combine the 3 queries into a single query. 

    But then you object that we're not keeping the first query separate??  That seems contradictory to me.

    Hi Scott,

    It doesn't contradict.
    By combining queries doesn't mean to lose its initial function, right?

    I am sorry if the question is not clear to you, let me try to explain here:
    1. the tables (schema, sample data) are just for demo'ing the question, the original queries/tables are much much more complex, I can't post the original queries here for the reasons of: a. I will violate company's security policy; b. it will confuse most of people here because the business logic between the tables are complicated.  
    2. I have three queries, I like to combine them into one single query
    3. the original queries generate temp table from original table1 (by joining a second table)
    4. the temp table will be transformed (group by) into another reference table (cnt)
    5. the cnt table is used to filter the very original table1 and generate the table final

    I hope my explain above help clarify some confuse you have, I do understand your confuse though, I spent half an hour to simplify the original queries. 🙂

    Thank you and thanks to everyone who spent time reading this post. Any suggestion/hint is appreciated.

    What will you use the breadcrumb trail of temp tables for?  They're totally unnecessary to get to the final answer.  As you said, the original table can have millions of rows.  Materializing all of that data in the form of temp tables for the interim answers is a serious waste of I/O, memory, disk space, and clock cycles.  Unless the temp tables are going to be used for something else, they're totally unnecessary to solve for the "end game" problem you defined.

    "Materializing all of that data in the form of temp tables for the interim answers is a serious waste" ---- Thank you Jeff, I cannot agree more with you on this. In reality sometimes we have no choice but to compromise in certain extent, right? And that is exactly why I am seeking help here. Those three queries are the existing queries in the current pipeline, and I see there is a where clause at the end, I wonder if I can use that where clause somehow earlier from the beginning so  that the involved data volume could be brought down, if yes, the volume would be dramatically brought and the efficiency of the new query would be dramatically improved.  Thanks.

    We still have a disconnect in thought here.  The question is, why do you need the interim temp tables in your original code?  They're not necessary to get to the final goal in the last query.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply