Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Database Query


Database Query

Author
Message
Pravasis
Pravasis
SSC-Addicted
SSC-Addicted (417 reputation)SSC-Addicted (417 reputation)SSC-Addicted (417 reputation)SSC-Addicted (417 reputation)SSC-Addicted (417 reputation)SSC-Addicted (417 reputation)SSC-Addicted (417 reputation)SSC-Addicted (417 reputation)

Group: General Forum Members
Points: 417 Visits: 214
Hi,

I have couple of questions related to sql server, though I googled it, did not get any definite answer. I am sure one of you SSC guru would help me to understand SQL Server in a better way :-).

Why the joining between integer columns are faster?
How the sql server joins strings internally, does it checks character by character or uses ASCII to compare strings?
Does the database engine performs an order by on the group by column first to group the data easily?

Thanks in advance.

Regards,
Pravasis
Mansfield
Mansfield
Old Hand
Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)

Group: General Forum Members
Points: 370 Visits: 394
1. Joining integer columns (provided that they are correctly indexed) is faster than joining other data types because it is relatively straightforward (see below)

2. It depends. Varchar and nvarchar columns are joined according to rules specified by the columns' collation.

3. It depends. Run the specific query and check the execution plan. If the group by column is the clustered index I would expect SQL Server exploit the natural sort order.
Pravasis
Pravasis
SSC-Addicted
SSC-Addicted (417 reputation)SSC-Addicted (417 reputation)SSC-Addicted (417 reputation)SSC-Addicted (417 reputation)SSC-Addicted (417 reputation)SSC-Addicted (417 reputation)SSC-Addicted (417 reputation)SSC-Addicted (417 reputation)

Group: General Forum Members
Points: 417 Visits: 214
Mansfield (4/11/2013)
1. Joining integer columns (provided that they are correctly indexed) is faster than joining other data types because it is relatively straightforward (see below)

2. It depends. Varchar and nvarchar columns are joined according to rules specified by the columns' collation.

3. It depends. Run the specific query and check the execution plan. If the group by column is the clustered index I would expect SQL Server exploit the natural sort order.


Thanks for your reply. I am still not sure why the joining of int is faster.

If I have an identity column(bigint) and a 4 character character code. Which I should be given preference while joining. Lets assume none of them have index.

Regards,
Pravasis
Jeff Moden
Jeff Moden
SSC-Forever
SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)

Group: General Forum Members
Points: 45398 Visits: 39942
Pravasis (4/14/2013)
Mansfield (4/11/2013)
1. Joining integer columns (provided that they are correctly indexed) is faster than joining other data types because it is relatively straightforward (see below)

2. It depends. Varchar and nvarchar columns are joined according to rules specified by the columns' collation.

3. It depends. Run the specific query and check the execution plan. If the group by column is the clustered index I would expect SQL Server exploit the natural sort order.


Thanks for your reply. I am still not sure why the joining of int is faster.

If I have an identity column(bigint) and a 4 character character code. Which I should be given preference while joining. Lets assume none of them have index.


Let's NOT assume that "none of them have index" because proper indexing is one of the most important things there is for performance.

That, not withstanding, a 4 character character-code will join faster than a BIGINT simply because BIGINT has 8 bytes and, even though it has to go through collation checks, the 4 character character_code is half that width.

If the comparison is done with an INT instead of a BIGINT, there will be a virtual tie if you use Latin1_General_Bin collation on the character-code.

Here's some test data. Change the "CAST" in each table build to build columns with different types.
-- DROP TABLE dbo.Test, dbo.IntLookup, VarCharLookup
GO
SELECT TOP 1000000
PKCol = IDENTITY(INT,1,1)
, IntCol = CAST(ABS(CHECKSUM(NEWID()))%9+1 AS INT)
, VarCharCol = CAST(REPLICATE(SUBSTRING('ABCDEFGHIJ',ABS(CHECKSUM(NEWID()))%9+1,1),4) AS CHAR(4)) COLLATE LATIN1_General_Bin
INTO dbo.Test
FROM sys.all_columns ac1
CROSS JOIN sys.all_columns ac2
;
ALTER TABLE dbo.Test ADD PRIMARY KEY CLUSTERED (PKCol);
CREATE INDEX IX_Test_IntCol ON dbo.Test (IntCol);
CREATE INDEX IX_Test_VarCharCol ON dbo.Test (VarCharCol);

SELECT PkCol = ISNULL(CAST(PkCol AS INT),0)
, DescCol
INTO dbo.IntLookup
FROM (
SELECT 1,'First' UNION ALL
SELECT 2,'Second' UNION ALL
SELECT 3,'Third' UNION ALL
SELECT 4,'Fourth' UNION ALL
SELECT 5,'Fifth' UNION ALL
SELECT 6,'Sixth' UNION ALL
SELECT 7,'Seventh' UNION ALL
SELECT 8,'Eighth' UNION ALL
SELECT 9,'Nineth'
) d (PkCol, DescCol)
;
ALTER TABLE dbo.IntLookup ADD PRIMARY KEY CLUSTERED (PKCol);

SELECT PkCol = ISNULL(CAST(PkCol AS CHAR(4)),'') COLLATE LATIN1_GENERAL_BIN
, DescCol
INTO dbo.VarCharLookup
FROM (
SELECT 'AAAA','First' UNION ALL
SELECT 'BBBB','Second' UNION ALL
SELECT 'CCCC','Third' UNION ALL
SELECT 'DDDD','Fourth' UNION ALL
SELECT 'EEEE','Fifth' UNION ALL
SELECT 'FFFF','Sixth' UNION ALL
SELECT 'GGGG','Seventh' UNION ALL
SELECT 'HHHH','Eighth' UNION ALL
SELECT 'IIII','Nineth'
) d (PkCol, DescCol)
;
ALTER TABLE dbo.VarCharLookup ADD PRIMARY KEY CLUSTERED (PKCol);



Here's the test code which dumps the results to a variable to take display times out of the picture.
SET STATISTICS TIME ON;
DECLARE @BitBucket VARCHAR(10);
SELECT @BitBucket = lu.DescCol
FROM dbo.Test t
JOIN dbo.IntLookup lu
ON t.IntCol = lu.PKCol;
SET STATISTICS TIME OFF;
GO
SET STATISTICS TIME ON;
DECLARE @BitBucket VARCHAR(10);
SELECT @BitBucket = lu.DescCol
FROM dbo.Test t
JOIN dbo.VarCharLookup lu
ON t.VarCharCol = lu.PKCol COLLATE LATIN1_General_Bin;
SET STATISTICS TIME OFF;



Here are the results from my older/slower machine for the code in the condition it is posted.

(1 row(s) affected)

SQL Server Execution Times:
CPU time = 1125 ms, elapsed time = 1218 ms.

(1 row(s) affected)

SQL Server Execution Times:
CPU time = 1125 ms, elapsed time = 1201 ms.



The bottom line here is to make sure the data in the join columns is "right sized" for maximum efficiency.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
Although they tell us that they want it real bad, our primary goal is to ensure that we dont actually give it to them that way.
Although change is inevitable, change for the better is not.
Just because you can do something in PowerShell, doesnt mean you should. Wink

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Pravasis
Pravasis
SSC-Addicted
SSC-Addicted (417 reputation)SSC-Addicted (417 reputation)SSC-Addicted (417 reputation)SSC-Addicted (417 reputation)SSC-Addicted (417 reputation)SSC-Addicted (417 reputation)SSC-Addicted (417 reputation)SSC-Addicted (417 reputation)

Group: General Forum Members
Points: 417 Visits: 214
Thanks Jeff.
I was exactly looking for something like
The bottom line here is to make sure the data in the join columns is "right sized" for maximum efficiency.
.

Thanks for the detailed clarification. Appreciate it.

Regards,
Pravasis
kapil_kk
kapil_kk
SSCrazy
SSCrazy (2.4K reputation)SSCrazy (2.4K reputation)SSCrazy (2.4K reputation)SSCrazy (2.4K reputation)SSCrazy (2.4K reputation)SSCrazy (2.4K reputation)SSCrazy (2.4K reputation)SSCrazy (2.4K reputation)

Group: General Forum Members
Points: 2437 Visits: 2763
great explanation Jeff :-)

_______________________________________________________________
To get quick answer follow this link:
http://www.sqlservercentral.com/articles/Best+Practices/61537/
Eric M Russell
Eric M Russell
SSCarpal Tunnel
SSCarpal Tunnel (4.7K reputation)SSCarpal Tunnel (4.7K reputation)SSCarpal Tunnel (4.7K reputation)SSCarpal Tunnel (4.7K reputation)SSCarpal Tunnel (4.7K reputation)SSCarpal Tunnel (4.7K reputation)SSCarpal Tunnel (4.7K reputation)SSCarpal Tunnel (4.7K reputation)

Group: General Forum Members
Points: 4653 Visits: 9579
Pravasis (4/14/2013)
Mansfield (4/11/2013)
1. Joining integer columns (provided that they are correctly indexed) is faster than joining other data types because it is relatively straightforward (see below)

2. It depends. Varchar and nvarchar columns are joined according to rules specified by the columns' collation.

3. It depends. Run the specific query and check the execution plan. If the group by column is the clustered index I would expect SQL Server exploit the natural sort order.


Thanks for your reply. I am still not sure why the joining of int is faster.

If I have an identity column(bigint) and a 4 character character code. Which I should be given preference while joining. Lets assume none of them have index.

A BigInt has a width of 8 bytes, which is twice the width of the Char(4) column. So the Char identifier column may prove faster, especially if it's a large table and SQL Server has to build a hash table in background to facilitate the join.
Either way, the joined column needs to be indexed, which will take advantage of a B-tree style search and record exclusion. Without an index on the joined column, SQL Server will be forced to perform a full table scan and read every record.
It's also important that you join between columns of the same data type. For example, join int to int and char(4) to char(4), otherwise you will end up with type casting at runtime, which is a performace hit.
Also, avoid joining on concatentated columns or functions, because that will most likely result in a non-indexable expression.


"The universe is complicated and for the most part beyond your control, but your life is only as complicated as you choose it to be."
Jeff Moden
Jeff Moden
SSC-Forever
SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)

Group: General Forum Members
Points: 45398 Visits: 39942
Pravasis (4/15/2013)
Thanks Jeff.
I was exactly looking for something like
The bottom line here is to make sure the data in the join columns is "right sized" for maximum efficiency.
.

Thanks for the detailed clarification. Appreciate it.


kapil_kk (4/15/2013)
great explanation Jeff :-)


Thank you both. Glad I could help.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
Although they tell us that they want it real bad, our primary goal is to ensure that we dont actually give it to them that way.
Although change is inevitable, change for the better is not.
Just because you can do something in PowerShell, doesnt mean you should. Wink

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search