Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

Database Query Expand / Collapse
Author
Message
Posted Thursday, April 11, 2013 2:45 PM
SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Wednesday, July 30, 2014 4:07 AM
Points: 413, Visits: 207
Hi,

I have couple of questions related to sql server, though I googled it, did not get any definite answer. I am sure one of you SSC guru would help me to understand SQL Server in a better way .

Why the joining between integer columns are faster?
How the sql server joins strings internally, does it checks character by character or uses ASCII to compare strings?
Does the database engine performs an order by on the group by column first to group the data easily?

Thanks in advance.


Regards,
Pravasis
Post #1441484
Posted Thursday, April 11, 2013 2:56 PM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Tuesday, July 15, 2014 2:20 PM
Points: 364, Visits: 384
1. Joining integer columns (provided that they are correctly indexed) is faster than joining other data types because it is relatively straightforward (see below)

2. It depends. Varchar and nvarchar columns are joined according to rules specified by the columns' collation.

3. It depends. Run the specific query and check the execution plan. If the group by column is the clustered index I would expect SQL Server exploit the natural sort order.
Post #1441489
Posted Sunday, April 14, 2013 1:31 AM
SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Wednesday, July 30, 2014 4:07 AM
Points: 413, Visits: 207
Mansfield (4/11/2013)
1. Joining integer columns (provided that they are correctly indexed) is faster than joining other data types because it is relatively straightforward (see below)

2. It depends. Varchar and nvarchar columns are joined according to rules specified by the columns' collation.

3. It depends. Run the specific query and check the execution plan. If the group by column is the clustered index I would expect SQL Server exploit the natural sort order.


Thanks for your reply. I am still not sure why the joining of int is faster.

If I have an identity column(bigint) and a 4 character character code. Which I should be given preference while joining. Lets assume none of them have index.


Regards,
Pravasis
Post #1442089
Posted Sunday, April 14, 2013 12:42 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Yesterday @ 9:58 AM
Points: 36,995, Visits: 31,517
Pravasis (4/14/2013)
Mansfield (4/11/2013)
1. Joining integer columns (provided that they are correctly indexed) is faster than joining other data types because it is relatively straightforward (see below)

2. It depends. Varchar and nvarchar columns are joined according to rules specified by the columns' collation.

3. It depends. Run the specific query and check the execution plan. If the group by column is the clustered index I would expect SQL Server exploit the natural sort order.


Thanks for your reply. I am still not sure why the joining of int is faster.

If I have an identity column(bigint) and a 4 character character code. Which I should be given preference while joining. Lets assume none of them have index.


Let's NOT assume that "none of them have index" because proper indexing is one of the most important things there is for performance.

That, not withstanding, a 4 character character-code will join faster than a BIGINT simply because BIGINT has 8 bytes and, even though it has to go through collation checks, the 4 character character_code is half that width.

If the comparison is done with an INT instead of a BIGINT, there will be a virtual tie if you use Latin1_General_Bin collation on the character-code.

Here's some test data. Change the "CAST" in each table build to build columns with different types.
-- DROP TABLE dbo.Test, dbo.IntLookup, VarCharLookup
GO
SELECT TOP 1000000
PKCol = IDENTITY(INT,1,1)
, IntCol = CAST(ABS(CHECKSUM(NEWID()))%9+1 AS INT)
, VarCharCol = CAST(REPLICATE(SUBSTRING('ABCDEFGHIJ',ABS(CHECKSUM(NEWID()))%9+1,1),4) AS CHAR(4)) COLLATE LATIN1_General_Bin
INTO dbo.Test
FROM sys.all_columns ac1
CROSS JOIN sys.all_columns ac2
;
ALTER TABLE dbo.Test ADD PRIMARY KEY CLUSTERED (PKCol);
CREATE INDEX IX_Test_IntCol ON dbo.Test (IntCol);
CREATE INDEX IX_Test_VarCharCol ON dbo.Test (VarCharCol);

SELECT PkCol = ISNULL(CAST(PkCol AS INT),0)
, DescCol
INTO dbo.IntLookup
FROM (
SELECT 1,'First' UNION ALL
SELECT 2,'Second' UNION ALL
SELECT 3,'Third' UNION ALL
SELECT 4,'Fourth' UNION ALL
SELECT 5,'Fifth' UNION ALL
SELECT 6,'Sixth' UNION ALL
SELECT 7,'Seventh' UNION ALL
SELECT 8,'Eighth' UNION ALL
SELECT 9,'Nineth'
) d (PkCol, DescCol)
;
ALTER TABLE dbo.IntLookup ADD PRIMARY KEY CLUSTERED (PKCol);

SELECT PkCol = ISNULL(CAST(PkCol AS CHAR(4)),'') COLLATE LATIN1_GENERAL_BIN
, DescCol
INTO dbo.VarCharLookup
FROM (
SELECT 'AAAA','First' UNION ALL
SELECT 'BBBB','Second' UNION ALL
SELECT 'CCCC','Third' UNION ALL
SELECT 'DDDD','Fourth' UNION ALL
SELECT 'EEEE','Fifth' UNION ALL
SELECT 'FFFF','Sixth' UNION ALL
SELECT 'GGGG','Seventh' UNION ALL
SELECT 'HHHH','Eighth' UNION ALL
SELECT 'IIII','Nineth'
) d (PkCol, DescCol)
;
ALTER TABLE dbo.VarCharLookup ADD PRIMARY KEY CLUSTERED (PKCol);

Here's the test code which dumps the results to a variable to take display times out of the picture.
SET STATISTICS TIME ON;
DECLARE @BitBucket VARCHAR(10);
SELECT @BitBucket = lu.DescCol
FROM dbo.Test t
JOIN dbo.IntLookup lu
ON t.IntCol = lu.PKCol;
SET STATISTICS TIME OFF;
GO
SET STATISTICS TIME ON;
DECLARE @BitBucket VARCHAR(10);
SELECT @BitBucket = lu.DescCol
FROM dbo.Test t
JOIN dbo.VarCharLookup lu
ON t.VarCharCol = lu.PKCol COLLATE LATIN1_General_Bin;
SET STATISTICS TIME OFF;

Here are the results from my older/slower machine for the code in the condition it is posted.
(1 row(s) affected)

SQL Server Execution Times:
CPU time = 1125 ms, elapsed time = 1218 ms.

(1 row(s) affected)

SQL Server Execution Times:
CPU time = 1125 ms, elapsed time = 1201 ms.



The bottom line here is to make sure the data in the join columns is "right sized" for maximum efficiency.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1442124
Posted Monday, April 15, 2013 4:57 AM
SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Wednesday, July 30, 2014 4:07 AM
Points: 413, Visits: 207
Thanks Jeff.
I was exactly looking for something like
The bottom line here is to make sure the data in the join columns is "right sized" for maximum efficiency.
.

Thanks for the detailed clarification. Appreciate it.


Regards,
Pravasis
Post #1442251
Posted Monday, April 15, 2013 5:59 AM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Friday, July 25, 2014 12:34 AM
Points: 1,888, Visits: 2,326
great explanation Jeff


_______________________________________________________________
To get quick answer follow this link:
http://www.sqlservercentral.com/articles/Best+Practices/61537/
Post #1442275
Posted Monday, April 15, 2013 3:26 PM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Friday, August 29, 2014 4:17 PM
Points: 1,651, Visits: 4,709
Pravasis (4/14/2013)
Mansfield (4/11/2013)
1. Joining integer columns (provided that they are correctly indexed) is faster than joining other data types because it is relatively straightforward (see below)

2. It depends. Varchar and nvarchar columns are joined according to rules specified by the columns' collation.

3. It depends. Run the specific query and check the execution plan. If the group by column is the clustered index I would expect SQL Server exploit the natural sort order.


Thanks for your reply. I am still not sure why the joining of int is faster.

If I have an identity column(bigint) and a 4 character character code. Which I should be given preference while joining. Lets assume none of them have index.

A BigInt has a width of 8 bytes, which is twice the width of the Char(4) column. So the Char identifier column may prove faster, especially if it's a large table and SQL Server has to build a hash table in background to facilitate the join.
Either way, the joined column needs to be indexed, which will take advantage of a B-tree style search and record exclusion. Without an index on the joined column, SQL Server will be forced to perform a full table scan and read every record.
It's also important that you join between columns of the same data type. For example, join int to int and char(4) to char(4), otherwise you will end up with type casting at runtime, which is a performace hit.
Also, avoid joining on concatentated columns or functions, because that will most likely result in a non-indexable expression.
Post #1442508
Posted Monday, April 15, 2013 4:15 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Yesterday @ 9:58 AM
Points: 36,995, Visits: 31,517
Pravasis (4/15/2013)
Thanks Jeff.
I was exactly looking for something like
The bottom line here is to make sure the data in the join columns is "right sized" for maximum efficiency.
.

Thanks for the detailed clarification. Appreciate it.


kapil_kk (4/15/2013)
great explanation Jeff


Thank you both. Glad I could help.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1442528
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse