September 18, 2017 at 2:59 am
Problem statement : I have 2 tables "DEPT" which contains department details and "EMP" which contains employee details.
I am joining based on deptno and getting department name and department location from DEPT table.
I get these details using 2 ways. way-1 using joins and way-2 using co-related sub-queries.
Question is Performance wise which is better, JOINS or CO-RELATED Sub-queries ?? Also, i my below scenario, from DEPT table I am fetching only 2 colums i.e (dname,loc).
Imagine there is table like DEPT and I had to fetch 10 columns from that table. And if, say, Emp table has 10 million records, then which way is efficient to make the query run faster??
If there are any other alternate ways which i might know , please leave some comments/suggestions on those lines as well.
CREATE TABLE DEPT
(DEPTNO INT NOT NULL CONSTRAINT PK_DEPT PRIMARY KEY,
DNAME VARCHAR(14) ,
LOC VARCHAR(13)
) ;
go
INSERT INTO DEPT VALUES (10,'ACCOUNTING','NEW YORK');
INSERT INTO DEPT VALUES (20,'RESEARCH','DALLAS');
INSERT INTO DEPT VALUES (30,'SALES','CHICAGO');
INSERT INTO DEPT VALUES (40,'OPERATIONS','BOSTON');
go
--DROP TABLE EMP;
CREATE TABLE EMP
(EMPNO INT NOT NULL CONSTRAINT PK_EMP PRIMARY KEY, -- DT_I4 4 Signed integers
ENAME VARCHAR(10), -- DT_STR VARCHAR = NON- UNICODE. NEED TO CONVERT ALL EXCEL
JOB VARCHAR(9),
MGR INT,
HIREDATE DATETIME, -- DT_DATE
SAL NUMERIC(7,2), -- DT_NUMERIC
COMM NUMERIC(7,2),
DEPTNO INT CONSTRAINT FK_DEPTNO REFERENCES DEPT(DEPTNO)
)
go
INSERT INTO EMP VALUES(7369,'SMITH','CLERK',7902,'1980-12-17',800,NULL,20);
INSERT INTO EMP VALUES(7499,'ALLEN','SALESMAN',7698,'1981-2-20',1600,300,30);
INSERT INTO EMP VALUES(7521,'WARD','SALESMAN',7698,'1981-2-22',1250,500,30);
INSERT INTO EMP VALUES(7566,'JONES','MANAGER',7839,'1981-4-2',2975,NULL,20);
INSERT INTO EMP VALUES(7654,'MARTIN','SALESMAN',7698,'1981-9-28',1250,1400,30);
INSERT INTO EMP VALUES(7698,'BLAKE','MANAGER',7839,'1981-5-1',2850,NULL,30);
INSERT INTO EMP VALUES(7782,'CLARK','MANAGER',7839,'1981-6-9',2450,NULL,10);
INSERT INTO EMP VALUES(7788,'SCOTT','ANALYST',7566,'1987-6-12',3000,NULL,20);
INSERT INTO EMP VALUES(7839,'KING','PRESIDENT',NULL,'1981-2-12',5000,NULL,10);
INSERT INTO EMP VALUES(7844,'TURNER','SALESMAN',7698,'1981-9-8',1500,0,30);
INSERT INTO EMP VALUES(7876,'ADAMS','CLERK',7788,'1987-01-22',1100,NULL,20);
INSERT INTO EMP VALUES(7900,'JAMES','CLERK',7698,'1981-12-06',950,NULL,30);
INSERT INTO EMP VALUES(7902,'FORD','ANALYST',7566,'1981-12-08',3000,NULL,20);
INSERT INTO EMP VALUES(7934,'MILLER','CLERK',7782,'1982-01-12',1300,NULL,10);
go
select * from dept
go
select * from emp
go
-- way-1 : using joins
select
e.empno as Employeeno,
e.ename as EmployeeEname,
e.sal,
e.deptno,
d.dname as DepartmentNamme,
d.LOC as DepartmentLocation
from emp e
inner join
dept d on e.deptno = d.deptno
go
-- way2 : using co-related subquery
select
e.empno as Employeeno,
e.ename as EmployeeEname,
e.sal,
e.deptno,
(select dname from dept where deptno = e.DEPTNO) DepartmentNamme,
(select LOC from dept where deptno = e.deptno) as DepartmentLocation
from emp e
Thanks in advance.
Sam
September 18, 2017 at 3:11 am
Did you test and compare both queries execution plans?
In general, I prefer using JOIN ( and only switch to alternatives when proven way better for that given solution + document it !! )
Johan
Learn to play, play to learn !
Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:
- How to post Performance Problems
- How to post data/code to get the best help[/url]
- How to prevent a sore throat after hours of presenting ppt
press F1 for solution, press shift+F1 for urgent solution 😀
Need a bit of Powershell? How about this
Who am I ? Sometimes this is me but most of the time this is me
September 18, 2017 at 3:13 am
This might seem a silly thing to say, but what have you done to check yourself? The best way to see which query performs better is to look at the query plans.
Open up SSMS, and put your two statements into a new query window. Click the Include Actual Execution Plan Button (or Ctrl+M) and then run the queries. You'll get a new tab in your results pane called "Execution plan". Have a look at the two plans and you'll see there's a clear winner. 🙂
P.s. When providing dates, try to remain consistent. You have both yyyy-MM-dd and yyyy-M-dd formatting in your query. It makes it harder to insert the data when you have language problems. 🙂
Thom~
Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk
September 18, 2017 at 6:06 am
In addition to the execution plans, I'd also use SET STATISTICS IO ON to take a look at the total number of reads from both tables.
September 18, 2017 at 7:41 am
The first has to read the Dept table once (for the join), the second has to read it twice (once for each subquery). The first is easier to read and extend.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
September 18, 2017 at 7:58 am
Thom A - Monday, September 18, 2017 3:13 AMThis might seem a silly thing to say, but what have you done to check yourself? The best way to see which query performs better is to look at the query plans.Open up SSMS, and put your two statements into a new query window. Click the Include Actual Execution Plan Button (or Ctrl+M) and then run the queries. You'll get a new tab in your results pane called "Execution plan". Have a look at the two plans and you'll see there's a clear winner. 🙂
P.s. When providing dates, try to remain consistent. You have both yyyy-MM-dd and yyyy-M-dd formatting in your query. It makes it harder to insert the data when you have language problems. 🙂
Ah, be careful now. Not being a simple contrarian here... I just worry about people that rely only on the plan because even the actual plan relies heavily on estimates. There are times when things like % Of Batch (for example) is 100% backwards from the actual performance realized by two queries. The execution plan, by itself, will not always determine the "clear winner". Only adequate accuracy and performance tests will do that.
Also, one of the two queries may seem the better even after that but, with some analysis, the right index might make the other query run 10's or 100's of times faster than you could ever make the first query run. Oddly enough, the removal of an index can and has been known to do the same thing.
Heh... to summarize, "It Depends" and "Must Look Eye". 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
September 19, 2017 at 9:26 am
There's a fundamental difference between those queries.
The first is an INNER join to dept, obviously, as explicitly stated, whereas the second is a LEFT OUTER JOIN, although only implicitly.
This could in theory have a big impact on a query plan and the query's ultimate performance, although in this specific case I doubt that would be so.
Viz:
select *
from (values(1),(2)) t1(col1)
inner join (values(1)) t2(col1) on t2.col1 = t1.col1
select t1.*,
(select col1 from (values(1)) t2(col1) where t2.col1 = t1.col1) as t2_col1
from (values(1),(2)) t1(col1)
SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply