Database Query

Question

Database Query

Pravasis

Default port

Points: 1455
More actions
April 11, 2013 at 2:45 pm

#292321

Hi,
I have couple of questions related to sql server, though I googled it, did not get any definite answer. I am sure one of you SSC guru would help me to understand SQL Server in a better way :-).
Why the joining between integer columns are faster?
How the sql server joins strings internally, does it checks character by character or uses ASCII to compare strings?
Does the database engine performs an order by on the group by column first to group the data easily?
Thanks in advance.
Regards,
Pravasis

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

Mansfield SSCrazy Points: 2396 More actions · Answer 1

1. Joining integer columns (provided that they are correctly indexed) is faster than joining other data types because it is relatively straightforward (see below)

2. It depends. Varchar and nvarchar columns are joined according to rules specified by the columns' collation.

3. It depends. Run the specific query and check the execution plan. If the group by column is the clustered index I would expect SQL Server exploit the natural sort order.

Pravasis Default port Points: 1455 More actions · Answer 2

Mansfield (4/11/2013)
1. Joining integer columns (provided that they are correctly indexed) is faster than joining other data types because it is relatively straightforward (see below)
2. It depends. Varchar and nvarchar columns are joined according to rules specified by the columns' collation.
3. It depends. Run the specific query and check the execution plan. If the group by column is the clustered index I would expect SQL Server exploit the natural sort order.

Thanks for your reply. I am still not sure why the joining of int is faster.

If I have an identity column(bigint) and a 4 character character code. Which I should be given preference while joining. Lets assume none of them have index.

Regards,
Pravasis

Jeff Moden SSC Guru Points: 1003894 More actions · Answer 3

Pravasis (4/14/2013)
Mansfield (4/11/2013)
1. Joining integer columns (provided that they are correctly indexed) is faster than joining other data types because it is relatively straightforward (see below)
2. It depends. Varchar and nvarchar columns are joined according to rules specified by the columns' collation.
3. It depends. Run the specific query and check the execution plan. If the group by column is the clustered index I would expect SQL Server exploit the natural sort order.
Thanks for your reply. I am still not sure why the joining of int is faster.
If I have an identity column(bigint) and a 4 character character code. Which I should be given preference while joining. Lets assume none of them have index.

Let's NOT assume that "none of them have index" because proper indexing is one of the most important things there is for performance.

That, not withstanding, a 4 character character-code will join faster than a BIGINT simply because BIGINT has 8 bytes and, even though it has to go through collation checks, the 4 character character_code is half that width.

If the comparison is done with an INT instead of a BIGINT, there will be a virtual tie if you use Latin1_General_Bin collation on the character-code.

Here's some test data. Change the "CAST" in each table build to build columns with different types.

-- DROP TABLE dbo.Test, dbo.IntLookup, VarCharLookup

GO

SELECT TOP 1000000

PKCol = IDENTITY(INT,1,1)

, IntCol = CAST(ABS(CHECKSUM(NEWID()))%9+1 AS INT)

, VarCharCol = CAST(REPLICATE(SUBSTRING('ABCDEFGHIJ',ABS(CHECKSUM(NEWID()))%9+1,1),4) AS CHAR(4)) COLLATE LATIN1_General_Bin

INTO dbo.Test

FROM sys.all_columns ac1

CROSS JOIN sys.all_columns ac2

;

ALTER TABLE dbo.Test ADD PRIMARY KEY CLUSTERED (PKCol);

CREATE INDEX IX_Test_IntCol ON dbo.Test (IntCol);

CREATE INDEX IX_Test_VarCharCol ON dbo.Test (VarCharCol);

SELECT PkCol = ISNULL(CAST(PkCol AS INT),0)

, DescCol

INTO dbo.IntLookup

FROM (

SELECT 1,'First' UNION ALL

SELECT 2,'Second' UNION ALL

SELECT 3,'Third' UNION ALL

SELECT 4,'Fourth' UNION ALL

SELECT 5,'Fifth' UNION ALL

SELECT 6,'Sixth' UNION ALL

SELECT 7,'Seventh' UNION ALL

SELECT 8,'Eighth' UNION ALL

SELECT 9,'Nineth'

) d (PkCol, DescCol)

;

ALTER TABLE dbo.IntLookup ADD PRIMARY KEY CLUSTERED (PKCol);

SELECT PkCol = ISNULL(CAST(PkCol AS CHAR(4)),'') COLLATE LATIN1_GENERAL_BIN

, DescCol

INTO dbo.VarCharLookup

FROM (

SELECT 'AAAA','First' UNION ALL

SELECT 'BBBB','Second' UNION ALL

SELECT 'CCCC','Third' UNION ALL

SELECT 'DDDD','Fourth' UNION ALL

SELECT 'EEEE','Fifth' UNION ALL

SELECT 'FFFF','Sixth' UNION ALL

SELECT 'GGGG','Seventh' UNION ALL

SELECT 'HHHH','Eighth' UNION ALL

SELECT 'IIII','Nineth'

) d (PkCol, DescCol)

;

ALTER TABLE dbo.VarCharLookup ADD PRIMARY KEY CLUSTERED (PKCol);

Here's the test code which dumps the results to a variable to take display times out of the picture.

SET STATISTICS TIME ON;

DECLARE @Bitbucket VARCHAR(10);

SELECT @Bitbucket = lu.DescCol

FROM dbo.Test t

JOIN dbo.IntLookup lu

ON t.IntCol = lu.PKCol;

SET STATISTICS TIME OFF;

GO

SET STATISTICS TIME ON;

DECLARE @Bitbucket VARCHAR(10);

SELECT @Bitbucket = lu.DescCol

FROM dbo.Test t

JOIN dbo.VarCharLookup lu

ON t.VarCharCol = lu.PKCol COLLATE LATIN1_General_Bin;

SET STATISTICS TIME OFF;

Here are the results from my older/slower machine for the code in the condition it is posted.

(1 row(s) affected)

SQL Server Execution Times:

CPU time = 1125 ms, elapsed time = 1218 ms.

(1 row(s) affected)

SQL Server Execution Times:

CPU time = 1125 ms, elapsed time = 1201 ms.

The bottom line here is to make sure the data in the join columns is "right sized" for maximum efficiency.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Pravasis Default port Points: 1455 More actions · Answer 4

Thanks Jeff.

I was exactly looking for something like

The bottom line here is to make sure the data in the join columns is "right sized" for maximum efficiency.

.

Thanks for the detailed clarification. Appreciate it.

Regards,
Pravasis

kapil_kk SSC-Insane Points: 21316 More actions · Answer 5

great explanation Jeff 🙂

_______________________________________________________________
To get quick answer follow this link:
http://www.sqlservercentral.com/articles/Best+Practices/61537/

Eric M Russell SSC Guru Points: 125524 More actions · Answer 6

Pravasis (4/14/2013)
Mansfield (4/11/2013)
1. Joining integer columns (provided that they are correctly indexed) is faster than joining other data types because it is relatively straightforward (see below)
2. It depends. Varchar and nvarchar columns are joined according to rules specified by the columns' collation.
3. It depends. Run the specific query and check the execution plan. If the group by column is the clustered index I would expect SQL Server exploit the natural sort order.
Thanks for your reply. I am still not sure why the joining of int is faster.
If I have an identity column(bigint) and a 4 character character code. Which I should be given preference while joining. Lets assume none of them have index.

A BigInt has a width of 8 bytes, which is twice the width of the Char(4) column. So the Char identifier column may prove faster, especially if it's a large table and SQL Server has to build a hash table in background to facilitate the join.

Either way, the joined column needs to be indexed, which will take advantage of a B-tree style search and record exclusion. Without an index on the joined column, SQL Server will be forced to perform a full table scan and read every record.

It's also important that you join between columns of the same data type. For example, join int to int and char(4) to char(4), otherwise you will end up with type casting at runtime, which is a performace hit.

Also, avoid joining on concatentated columns or functions, because that will most likely result in a non-indexable expression.

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

Jeff Moden SSC Guru Points: 1003894 More actions · Answer 7

Pravasis (4/15/2013)
Thanks Jeff.
I was exactly looking for something like
The bottom line here is to make sure the data in the join columns is "right sized" for maximum efficiency.
.
Thanks for the detailed clarification. Appreciate it.

kapil_kk (4/15/2013)
great explanation Jeff 🙂

Thank you both. Glad I could help.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)