How can I combine the three queries to one and possibly improve performance

Question

Post reply

How can I combine the three queries to one and possibly improve performance

PasLe Choix

Say Hey Kid

Points: 665
More actions
December 10, 2018 at 7:41 pm

#378797

I wish I can get help from this forum, any suggestion is appreciated.
I have a project with some existing sql queries, here are three I think should be combined into only one for the performance optimization purpose.
For the sake of simplicity and confidentiality here are the modified codes:
1. table1 inner join table2 to generate temp table
2. from temp table create table cnt with the count for each type
3. table1 inner join table3, and apply a where clause to generate the final table
create table table1 (id int, region int, typeid int) insert into table1 (id, region, typeid ) values (1,2,1) insert into table1 (id, region, typeid ) values (2,3,1) insert into table1 (id, region, typeid ) values (3,4,2) insert into table1 (id, region, typeid ) values (4,1,2) insert into table1 (id, region, typeid ) values (5,1,2) insert into table1 (id, region, typeid ) values (6,2,4)
create table table2 (id int, type varchar(10)) insert into table2 (id, type) values (1,'A') insert into table2 (id, type) values (2,'B') insert into table2 (id, type) values (3,'C') insert into table2 (id, type) values (4,'D')
table temp is the initial inner join of the two initial tables:
create table temp as select a.*, b.* from table1 a inner join table2 b on a.typeid = b.id
table cnt has the count for each type
create table cnt as select c.type, count(1) as total from temp c group by type
Final table (a.type is updated to a.typeid, thanks to Jonathan for pointing that):
create table final as select a.region, a.type, a.typeid from table1 a inner join cnt c on a.type = c.type where c.total > 2
In the given sample data, the final table should contains only records with type = 2 because type2 has count > 2
Kindly keep in mind the current table1 and 2 contains millions of row counts.
Thank you very much.

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply

pietlinden SSC Guru Points: 63421 More actions · Answer 1

You can nest queries if you want... not that big a deal.

SELECT x.Field1, x.Field2
FROM (SELECT a.Field1, a.Field2
FROM MyTable a
WHERE...) x

???

Oh, and don't forget the HAVING clause.

PasLe Choix Say Hey Kid Points: 665 More actions · Answer 2

Thank you but it doesn't address the question, key point here is to shrink down the original size (over million row count) of table 1, from the last query:

select a.region, a.type from table1 a 
inner join cnt c on a.type = c.type 
where c.total > 2

You can see it as table cnt is a derived table:

create table cnt as
select c.type, count(1) as total from temp c group by type

And table temp is also a staging table from the original table 1:

create table temp as
select a.*, b.* from table1 a 
inner join table2 b on a.typeid = b.id

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 3

SELECT d.region, d.type FROM ( SELECT t1.region, t2.type, total = COUNT(*) OVER(PARTITION BY t1.typeID) FROM table1 t1 INNER JOIN table2 t2 ON t2.id = t1.typeid ) d WHERE d.Total > 2

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

ScottPletcher SSC Guru Points: 100942 More actions · Answer 4

It could be more efficient to get only typeid from table1 in the first pass, like below. This depends on indexing and the cardinality (how many t1 rows have typeids that appear at least 3 times vs overall total of t1 rows).

SELECT t1.region, t1.type FROM ( SELECT typeid, COUNT(*) AS typeid_count FROM dbo.table1 GROUP BY type_id HAVING COUNT(*) >= 3 ) AS typeid_counts INNER JOIN dbo.table1 t1 ON t1.typeid = typeid_counts.typeid INNER JOIN dbo.table2 t2 ON t2.id = t1.typeid

Or it could perform better to get the region with the typeid from t1. You probably want to look at the query plans for each one to decide which is better for your setup.

SELECT typeid_counts.region, t2.type FROM ( SELECT region, typeid, COUNT(*) OVER(PARTITION BY typeid) AS typeid_count FROM dbo.table1 ) AS typeid_counts INNER JOIN dbo.table2 t2 ON t2.id = typeid_counts.typeid WHERE typeid_counts.typeid_count >= 3

SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

PasLe Choix Say Hey Kid Points: 665 More actions · Answer 5

OK, thank you guys but I doubt none of the replies catched it.

As indicated in my reply to the first reply:

From the last query:

select a.region, a.type from table1 a 
inner join cnt c on a.type = c.type 
where c.total > 2

You can see it as table cnt is a derived table:

create table cnt as
select c.type, count(1) as total from temp c group by type

And table temp is also a staging table from the original table 1:

create table temp as
select a.*, b.* from table1 a 
inner join table2 b on a.typeid = b.id

Both Chris and Scott's queries are working on the table1 directly, it needs to talk to the temp table that is derived from table1, the row count is different between table1 and temp, temp is generated with table1 joining another table2, right?

Thank you both.

Jonathan AC Roberts SSCoach Points: 18257 More actions · Answer 6

PasLe Choix - Tuesday, December 11, 2018 12:57 PM
OK, thank you guys but I doubt none of the replies catched it.
As indicated in my reply to the first reply:
From the last query:
select a.region, a.type from table1 a 
inner join cnt c on a.type = c.type 
where c.total > 2
You can see it as table cnt is a derived table:
create table cnt as
select c.type, count(1) as total from temp c group by type 
And table temp is also a staging table from the original table 1:
create table temp as
select a.*, b.* from table1 a 
inner join table2 b on a.typeid = b.id 
Both Chris and Scott's queries are working on the table1 directly, it needs to talk to the temp table that is derived from table1, the row count is different between table1 and temp, temp is generated with table1 joining another table2, right?
Thank you both.

Your queries don't work.
1. Both tables in the following join have a column call 'Id'. All columns on a table have to have distinct names.
select a.*, b.* from table1 a inner join table2 b on a.typeid = b.id
2. table1 does not have a column named 'type' so would give an error.
select a.region, a.type from table1 a inner join cnt c on a.type = c.type where c.total > 2

ScottPletcher SSC Guru Points: 100942 More actions · Answer 7

I'm genuinely confused.

Your q states"How can I combine the three queries to one". We write queries that combine the 3 queries into a single query.

But then you object that we're not keeping the first query separate?? That seems contradictory to me.

SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

PasLe Choix Say Hey Kid Points: 665 More actions · Answer 8

Jonathan AC Roberts - Tuesday, December 11, 2018 1:57 PM
PasLe Choix - Tuesday, December 11, 2018 12:57 PM
OK, thank you guys but I doubt none of the replies catched it.
As indicated in my reply to the first reply:
From the last query:
select a.region, a.type from table1 a 
inner join cnt c on a.type = c.type 
where c.total > 2
You can see it as table cnt is a derived table:
create table cnt as
select c.type, count(1) as total from temp c group by type 
And table temp is also a staging table from the original table 1:
create table temp as
select a.*, b.* from table1 a 
inner join table2 b on a.typeid = b.id 
Both Chris and Scott's queries are working on the table1 directly, it needs to talk to the temp table that is derived from table1, the row count is different between table1 and temp, temp is generated with table1 joining another table2, right?
Thank you both.
Your queries don't work.
1. Both tables in the following join have a column call 'Id'. All columns on a table have to have distinct names.
select a.*, b.* from table1 a inner join table2 b on a.typeid = b.id
2. table1 does not have a column named 'type' so would give an error.
select a.region, a.type from table1 a inner join cnt c on a.type = c.type where c.total > 2

Good catch but please do understand that I simplified the original queries.
1. a.* would include all the fields in Table1, and Table2 also contains a field with/without the same name that can be joined based on them, for the simplicity I call them ID.
2. yes your second point is correct, sorry about that. it will error out, I'll made update to the original post if it is allowed.

Jeff Moden SSC Guru Points: 1003853 More actions · Answer 9

PasLe Choix - Tuesday, December 11, 2018 12:57 PM
OK, thank you guys but I doubt none of the replies catched it.
As indicated in my reply to the first reply:
From the last query:
select a.region, a.type from table1 a 
inner join cnt c on a.type = c.type 
where c.total > 2
You can see it as table cnt is a derived table:
create table cnt as
select c.type, count(1) as total from temp c group by type 
And table temp is also a staging table from the original table 1:
create table temp as
select a.*, b.* from table1 a 
inner join table2 b on a.typeid = b.id 
Both Chris and Scott's queries are working on the table1 directly, it needs to talk to the temp table that is derived from table1, the row count is different between table1 and temp, temp is generated with table1 joining another table2, right?
Thank you both.

Heh... that's a bit crazy. You said you wanted to combine the 3 queries to be more efficient. You can go straight from the tables to the final answer without the use of any temporary structures whatsoever. Why are you insisting that we follow the very query methodology that you said you wanted to avoid?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

PasLe Choix Say Hey Kid Points: 665 More actions · Answer 10

ScottPletcher - Tuesday, December 11, 2018 2:47 PM
I'm genuinely confused.
Your q states"How can I combine the three queries to one". We write queries that combine the 3 queries into a single query.
But then you object that we're not keeping the first query separate?? That seems contradictory to me.

Hi Scott,

It doesn't contradict.
By combining queries doesn't mean to lose its initial function, right?

I am sorry if the question is not clear to you, let me try to explain here:
1. the tables (schema, sample data) are just for demo'ing the question, the original queries/tables are much much more complex, I can't post the original queries here for the reasons of: a. I will violate company's security policy; b. it will confuse most of people here because the business logic between the tables are complicated.
2. I have three queries, I like to combine them into one single query
3. the original queries generate temp table from original table1 (by joining a second table)
4. the temp table will be transformed (group by) into another reference table (cnt)
5. the cnt table is used to filter the very original table1 and generate the table final

I hope my explain above help clarify some confuse you have, I do understand your confuse though, I spent half an hour to simplify the original queries. 🙂

Thank you and thanks to everyone who spent time reading this post. Any suggestion/hint is appreciated.

Jeff Moden SSC Guru Points: 1003853 More actions · Answer 11

PasLe Choix - Tuesday, December 11, 2018 6:06 PM
ScottPletcher - Tuesday, December 11, 2018 2:47 PM
I'm genuinely confused.
Your q states"How can I combine the three queries to one". We write queries that combine the 3 queries into a single query.
But then you object that we're not keeping the first query separate?? That seems contradictory to me.
Hi Scott,
It doesn't contradict.
By combining queries doesn't mean to lose its initial function, right?
I am sorry if the question is not clear to you, let me try to explain here:
1. the tables (schema, sample data) are just for demo'ing the question, the original queries/tables are much much more complex, I can't post the original queries here for the reasons of: a. I will violate company's security policy; b. it will confuse most of people here because the business logic between the tables are complicated.
2. I have three queries, I like to combine them into one single query
3. the original queries generate temp table from original table1 (by joining a second table)
4. the temp table will be transformed (group by) into another reference table (cnt)
5. the cnt table is used to filter the very original table1 and generate the table final
I hope my explain above help clarify some confuse you have, I do understand your confuse though, I spent half an hour to simplify the original queries. 🙂
Thank you and thanks to everyone who spent time reading this post. Any suggestion/hint is appreciated.

What will you use the breadcrumb trail of temp tables for? They're totally unnecessary to get to the final answer. As you said, the original table can have millions of rows. Materializing all of that data in the form of temp tables for the interim answers is a serious waste of I/O, memory, disk space, and clock cycles. Unless the temp tables are going to be used for something else, they're totally unnecessary to solve for the "end game" problem you defined.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

PasLe Choix Say Hey Kid Points: 665 More actions · Answer 12

Jeff Moden - Tuesday, December 11, 2018 6:08 PM
PasLe Choix - Tuesday, December 11, 2018 6:06 PM
ScottPletcher - Tuesday, December 11, 2018 2:47 PM
I'm genuinely confused.
Your q states"How can I combine the three queries to one". We write queries that combine the 3 queries into a single query.
But then you object that we're not keeping the first query separate?? That seems contradictory to me.
Hi Scott,
It doesn't contradict.
By combining queries doesn't mean to lose its initial function, right?
I am sorry if the question is not clear to you, let me try to explain here:
1. the tables (schema, sample data) are just for demo'ing the question, the original queries/tables are much much more complex, I can't post the original queries here for the reasons of: a. I will violate company's security policy; b. it will confuse most of people here because the business logic between the tables are complicated.
2. I have three queries, I like to combine them into one single query
3. the original queries generate temp table from original table1 (by joining a second table)
4. the temp table will be transformed (group by) into another reference table (cnt)
5. the cnt table is used to filter the very original table1 and generate the table final
I hope my explain above help clarify some confuse you have, I do understand your confuse though, I spent half an hour to simplify the original queries. 🙂
Thank you and thanks to everyone who spent time reading this post. Any suggestion/hint is appreciated.
What will you use the breadcrumb trail of temp tables for? They're totally unnecessary to get to the final answer. As you said, the original table can have millions of rows. Materializing all of that data in the form of temp tables for the interim answers is a serious waste of I/O, memory, disk space, and clock cycles. Unless the temp tables are going to be used for something else, they're totally unnecessary to solve for the "end game" problem you defined.

"Materializing all of that data in the form of temp tables for the interim answers is a serious waste" ---- Thank you Jeff, I cannot agree more with you on this. In reality sometimes we have no choice but to compromise in certain extent, right? And that is exactly why I am seeking help here. Those three queries are the existing queries in the current pipeline, and I see there is a where clause at the end, I wonder if I can use that where clause somehow earlier from the beginning so that the involved data volume could be brought down, if yes, the volume would be dramatically brought and the efficiency of the new query would be dramatically improved. Thanks.

Jonathan AC Roberts SSCoach Points: 18257 More actions · Answer 13

PasLe Choix - Monday, December 10, 2018 7:41 PM
Final table (a.type is updated to a.typeid, thanks to Jonathan for pointing that):
create table final as select a.region, a.type, a.typeid from table1 a inner join cnt c on a.type = c.type where c.total > 2

where you have inner join cnt c on a.type = c.type
There is no type on table1, so what should it be?

Jeff Moden SSC Guru Points: 1003853 More actions · Answer 14

PasLe Choix - Wednesday, December 12, 2018 7:25 AM
Jeff Moden - Tuesday, December 11, 2018 6:08 PM
PasLe Choix - Tuesday, December 11, 2018 6:06 PM
ScottPletcher - Tuesday, December 11, 2018 2:47 PM
I'm genuinely confused.
Your q states"How can I combine the three queries to one". We write queries that combine the 3 queries into a single query.
But then you object that we're not keeping the first query separate?? That seems contradictory to me.
Hi Scott,
It doesn't contradict.
By combining queries doesn't mean to lose its initial function, right?
I am sorry if the question is not clear to you, let me try to explain here:
1. the tables (schema, sample data) are just for demo'ing the question, the original queries/tables are much much more complex, I can't post the original queries here for the reasons of: a. I will violate company's security policy; b. it will confuse most of people here because the business logic between the tables are complicated.
2. I have three queries, I like to combine them into one single query
3. the original queries generate temp table from original table1 (by joining a second table)
4. the temp table will be transformed (group by) into another reference table (cnt)
5. the cnt table is used to filter the very original table1 and generate the table final
I hope my explain above help clarify some confuse you have, I do understand your confuse though, I spent half an hour to simplify the original queries. 🙂
Thank you and thanks to everyone who spent time reading this post. Any suggestion/hint is appreciated.
What will you use the breadcrumb trail of temp tables for? They're totally unnecessary to get to the final answer. As you said, the original table can have millions of rows. Materializing all of that data in the form of temp tables for the interim answers is a serious waste of I/O, memory, disk space, and clock cycles. Unless the temp tables are going to be used for something else, they're totally unnecessary to solve for the "end game" problem you defined.
"Materializing all of that data in the form of temp tables for the interim answers is a serious waste" ---- Thank you Jeff, I cannot agree more with you on this. In reality sometimes we have no choice but to compromise in certain extent, right? And that is exactly why I am seeking help here. Those three queries are the existing queries in the current pipeline, and I see there is a where clause at the end, I wonder if I can use that where clause somehow earlier from the beginning so that the involved data volume could be brought down, if yes, the volume would be dramatically brought and the efficiency of the new query would be dramatically improved. Thanks.

We still have a disconnect in thought here. The question is, why do you need the interim temp tables in your original code? They're not necessary to get to the final goal in the last query.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)