Database Query

  • Hi,

    I have couple of questions related to sql server, though I googled it, did not get any definite answer. I am sure one of you SSC guru would help me to understand SQL Server in a better way :-).

    Why the joining between integer columns are faster?

    How the sql server joins strings internally, does it checks character by character or uses ASCII to compare strings?

    Does the database engine performs an order by on the group by column first to group the data easily?

    Thanks in advance.

    Regards,
    Pravasis

  • 1. Joining integer columns (provided that they are correctly indexed) is faster than joining other data types because it is relatively straightforward (see below)

    2. It depends. Varchar and nvarchar columns are joined according to rules specified by the columns' collation.

    3. It depends. Run the specific query and check the execution plan. If the group by column is the clustered index I would expect SQL Server exploit the natural sort order.

  • Mansfield (4/11/2013)


    1. Joining integer columns (provided that they are correctly indexed) is faster than joining other data types because it is relatively straightforward (see below)

    2. It depends. Varchar and nvarchar columns are joined according to rules specified by the columns' collation.

    3. It depends. Run the specific query and check the execution plan. If the group by column is the clustered index I would expect SQL Server exploit the natural sort order.

    Thanks for your reply. I am still not sure why the joining of int is faster.

    If I have an identity column(bigint) and a 4 character character code. Which I should be given preference while joining. Lets assume none of them have index.

    Regards,
    Pravasis

  • Pravasis (4/14/2013)


    Mansfield (4/11/2013)


    1. Joining integer columns (provided that they are correctly indexed) is faster than joining other data types because it is relatively straightforward (see below)

    2. It depends. Varchar and nvarchar columns are joined according to rules specified by the columns' collation.

    3. It depends. Run the specific query and check the execution plan. If the group by column is the clustered index I would expect SQL Server exploit the natural sort order.

    Thanks for your reply. I am still not sure why the joining of int is faster.

    If I have an identity column(bigint) and a 4 character character code. Which I should be given preference while joining. Lets assume none of them have index.

    Let's NOT assume that "none of them have index" because proper indexing is one of the most important things there is for performance.

    That, not withstanding, a 4 character character-code will join faster than a BIGINT simply because BIGINT has 8 bytes and, even though it has to go through collation checks, the 4 character character_code is half that width.

    If the comparison is done with an INT instead of a BIGINT, there will be a virtual tie if you use Latin1_General_Bin collation on the character-code.

    Here's some test data. Change the "CAST" in each table build to build columns with different types.

    -- DROP TABLE dbo.Test, dbo.IntLookup, VarCharLookup

    GO

    SELECT TOP 1000000

    PKCol = IDENTITY(INT,1,1)

    , IntCol = CAST(ABS(CHECKSUM(NEWID()))%9+1 AS INT)

    , VarCharCol = CAST(REPLICATE(SUBSTRING('ABCDEFGHIJ',ABS(CHECKSUM(NEWID()))%9+1,1),4) AS CHAR(4)) COLLATE LATIN1_General_Bin

    INTO dbo.Test

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2

    ;

    ALTER TABLE dbo.Test ADD PRIMARY KEY CLUSTERED (PKCol);

    CREATE INDEX IX_Test_IntCol ON dbo.Test (IntCol);

    CREATE INDEX IX_Test_VarCharCol ON dbo.Test (VarCharCol);

    SELECT PkCol = ISNULL(CAST(PkCol AS INT),0)

    , DescCol

    INTO dbo.IntLookup

    FROM (

    SELECT 1,'First' UNION ALL

    SELECT 2,'Second' UNION ALL

    SELECT 3,'Third' UNION ALL

    SELECT 4,'Fourth' UNION ALL

    SELECT 5,'Fifth' UNION ALL

    SELECT 6,'Sixth' UNION ALL

    SELECT 7,'Seventh' UNION ALL

    SELECT 8,'Eighth' UNION ALL

    SELECT 9,'Nineth'

    ) d (PkCol, DescCol)

    ;

    ALTER TABLE dbo.IntLookup ADD PRIMARY KEY CLUSTERED (PKCol);

    SELECT PkCol = ISNULL(CAST(PkCol AS CHAR(4)),'') COLLATE LATIN1_GENERAL_BIN

    , DescCol

    INTO dbo.VarCharLookup

    FROM (

    SELECT 'AAAA','First' UNION ALL

    SELECT 'BBBB','Second' UNION ALL

    SELECT 'CCCC','Third' UNION ALL

    SELECT 'DDDD','Fourth' UNION ALL

    SELECT 'EEEE','Fifth' UNION ALL

    SELECT 'FFFF','Sixth' UNION ALL

    SELECT 'GGGG','Seventh' UNION ALL

    SELECT 'HHHH','Eighth' UNION ALL

    SELECT 'IIII','Nineth'

    ) d (PkCol, DescCol)

    ;

    ALTER TABLE dbo.VarCharLookup ADD PRIMARY KEY CLUSTERED (PKCol);

    Here's the test code which dumps the results to a variable to take display times out of the picture.

    SET STATISTICS TIME ON;

    DECLARE @Bitbucket VARCHAR(10);

    SELECT @Bitbucket = lu.DescCol

    FROM dbo.Test t

    JOIN dbo.IntLookup lu

    ON t.IntCol = lu.PKCol;

    SET STATISTICS TIME OFF;

    GO

    SET STATISTICS TIME ON;

    DECLARE @Bitbucket VARCHAR(10);

    SELECT @Bitbucket = lu.DescCol

    FROM dbo.Test t

    JOIN dbo.VarCharLookup lu

    ON t.VarCharCol = lu.PKCol COLLATE LATIN1_General_Bin;

    SET STATISTICS TIME OFF;

    Here are the results from my older/slower machine for the code in the condition it is posted.

    (1 row(s) affected)

    SQL Server Execution Times:

    CPU time = 1125 ms, elapsed time = 1218 ms.

    (1 row(s) affected)

    SQL Server Execution Times:

    CPU time = 1125 ms, elapsed time = 1201 ms.

    The bottom line here is to make sure the data in the join columns is "right sized" for maximum efficiency.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks Jeff.

    I was exactly looking for something like

    The bottom line here is to make sure the data in the join columns is "right sized" for maximum efficiency.

    .

    Thanks for the detailed clarification. Appreciate it.

    Regards,
    Pravasis

  • great explanation Jeff 🙂

    _______________________________________________________________
    To get quick answer follow this link:
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

  • Pravasis (4/14/2013)


    Mansfield (4/11/2013)


    1. Joining integer columns (provided that they are correctly indexed) is faster than joining other data types because it is relatively straightforward (see below)

    2. It depends. Varchar and nvarchar columns are joined according to rules specified by the columns' collation.

    3. It depends. Run the specific query and check the execution plan. If the group by column is the clustered index I would expect SQL Server exploit the natural sort order.

    Thanks for your reply. I am still not sure why the joining of int is faster.

    If I have an identity column(bigint) and a 4 character character code. Which I should be given preference while joining. Lets assume none of them have index.

    A BigInt has a width of 8 bytes, which is twice the width of the Char(4) column. So the Char identifier column may prove faster, especially if it's a large table and SQL Server has to build a hash table in background to facilitate the join.

    Either way, the joined column needs to be indexed, which will take advantage of a B-tree style search and record exclusion. Without an index on the joined column, SQL Server will be forced to perform a full table scan and read every record.

    It's also important that you join between columns of the same data type. For example, join int to int and char(4) to char(4), otherwise you will end up with type casting at runtime, which is a performace hit.

    Also, avoid joining on concatentated columns or functions, because that will most likely result in a non-indexable expression.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Pravasis (4/15/2013)


    Thanks Jeff.

    I was exactly looking for something like

    The bottom line here is to make sure the data in the join columns is "right sized" for maximum efficiency.

    .

    Thanks for the detailed clarification. Appreciate it.

    kapil_kk (4/15/2013)


    great explanation Jeff 🙂

    Thank you both. Glad I could help.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply