Understanding the difference between Join and Where filters - SQL 2008R2

  • I'm new to the TSQL world (coming from visual basic) and am working with large tables (125+ million rows) and having perfomance issues. This sp takes over an hour to run. Is there a better way to optimize this code? Table2 has 40,525,850 rows and Table1 ends up with 125,350,605 rows.

    I'm creating two indexes within the code:

    CREATE NONCLUSTERED INDEX IDX_Table2

    ON Table2 (Batch,Flag,Num,Date1 )

    INCLUDE (Id);

    CREATE NONCLUSTERED INDEX IDX_Table1

    ON Table1 (Batch,Flag,Num);

    INSERT INTO Table5

    SELECT DISTINCT

    CD.Desc AS Desc,

    SD.Num AS Num,

    SD.Batch AS Batch,

    CD.CodeId AS CodeId,

    SD.Type AS Type,

    CH.Id AS Id,

    FROM Table1 SD

    JOIN Table2 CH

    ON CH.Num=SD.Num

    AND CH.Batch= @Batch

    AND CH.Flag= 0

    AND SD.Batch = @Batch

    AND SD.Flag= 0

    JOIN Table3 M

    ON M.Code=SD.Code

    AND (Map=1 OR Map=@Map)

    AND CH.Date1 BETWEEN M.Date2 AND M.Date3

    JOIN Table4 CD

    ON M.CodeId=CD.CodeId

    WHERE SD.Code IS NOT NULL AND RTRIM(LTRIM(SD.Code ))<>'';

    Also, will the following code produce the same result? Or will moving the filters to the where clause make the join take longer? How does SQL process the code?

    INSERT INTO Table5

    SELECT DISTINCT

    CD.Desc AS Desc,

    SD.Num AS Num,

    SD.Batch AS Batch,

    CD.CodeId AS CodeId,

    SD.Type AS Type,

    CH.Id AS Id,

    FROM Table1 SD

    JOIN Table2 CH

    ON CH.Num=SD.Num

    AND CH.Batch= SD.Batch

    JOIN Table3 M

    ON M.Code=SD.Code

    AND (Map=1 OR Map=@Map)

    AND CH.Date1 BETWEEN M.Date2 AND M.Date3

    JOIN Table4 CD

    ON M.CodeId=CD.CodeId

    WHERE SD.Code IS NOT NULL AND RTRIM(LTRIM(SD.Code ))<>''

    AND CH.Flag= 0

    AND SD.Batch = @Batch

    AND SD.Flag= 0

  • 1. Using

    RTRIM(LTRIM(SD.Code ))<>''

    will be very slow - if you've got an index on SD.Code it probably won't be used. Can you tidy the data in the table so you don't need RTRIM(LTRIM() & add an index?

    2. You could consider using

    FROM Table1 SD WITH (NOLOCK) JOIN Table2 WITH (NOLOCK) etc.

    as long as the table isn't being updated - this saves time as no read locks are issued.

    3. Not sure about the pros & cons of including the filters in the WHERE clause...

  • For inner joins there is no difference whatsoever. For outer joins, moving the filter from where to join changes the logic of the query. Hence it's not a matter of performance, it's a matter of which gives you the correct results.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • laurie-789651 (9/24/2012)


    2. You could consider using

    FROM Table1 SD WITH (NOLOCK) JOIN Table2 WITH (NOLOCK) etc.

    as long as the table isn't being updated - this saves time as no read locks are issued.

    And incorrect results are possible, not just dirty reads, duplicate or missing rows. If there are no changes being made, there's little gain from nolock, the overhead of taking locks is not that high, and it introduces the potential for incorrect results when changes are being made.

    See - http://blogs.msdn.com/b/davidlean/archive/2009/04/06/sql-server-nolock-hint-other-poor-ideas.aspx

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • So your saying that this code still produces the same result in with the same efficiency? (I took out another join and moved it to the where clause)

    INSERT INTO Table5

    SELECT DISTINCT

    CD.Desc AS Desc,

    SD.Num AS Num,

    SD.Batch AS Batch,

    CD.CodeId AS CodeId,

    SD.Type AS Type,

    CH.Id AS Id,

    FROM Table1 SD

    JOIN Table2 CH

    ON CH.Num=SD.Num

    JOIN Table3 M

    ON M.Code=SD.Code

    AND (Map=1 OR Map=@Map)

    AND CH.Date1 BETWEEN M.Date2 AND M.Date3

    JOIN Table4 CD

    ON M.CodeId=CD.CodeId

    WHERE SD.Code IS NOT NULL AND RTRIM(LTRIM(SD.Code ))<>''

    AND CH.Batch = @Batch

    AND CH.Flag= 0

    AND SD.Batch = @Batch

    AND SD.Flag= 0

  • You'll have better performance if you change the following statement

    WHERE SD.Code IS NOT NULL AND RTRIM(LTRIM(SD.Code ))<>''

    with

    WHERE SD.Code <> ''

    The results are the same and you can test them if you want.

    Here's a script.

    DECLARE @Tabletable(

    mystringchar(15))

    INSERT @Table

    VALUES( ''),( ' '),( ' '),

    ( 'a'),( ' b'),( 'c '),

    ( ' '), (NULL)

    SELECT * FROM @Table

    WHERE mystring <> ''

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Here is a quick example of how moving the filter from the join to the WHERE clause can have an effect on the result of your queries (if you're using OUTER JOINS):

    Although there is a filter in the first query on the Forename, SQL still includes the other records. This is because SQL server does the inner join internally and applies the filter then adds all missing rows from the left table.

    Hope this makes sense.

    CREATE TABLE #Employee (ID INT IDENTITY(1, 1), Forename VARCHAR(20))

    INSERT INTO #Employee(Forename)

    SELECT 'Abu Dina' UNION ALL

    SELECT 'SQL4n00bs' UNION ALL

    SELECT 'Test' UNION ALL

    SELECT '1337'

    CREATE TABLE #Location (ID INT IDENTITY(1, 1), Employee_ID INT, Location VARCHAR(50))

    INSERT INTO #Location(Employee_ID, Location)

    SELECT 1, 'UK' UNION ALL

    SELECT 2, 'Manchester'

    SELECT a.* , b.*

    FROM #Employee AS a

    LEFT JOIN #Location AS b

    ON a.id = b.employee_id

    and a.forename = 'Abu Dina'

    SELECT a.* , b.*

    FROM #Employee as a

    LEFT JOIN #Location as b

    ON a.id = b.employee_id

    WHERE a.forename = 'Abu Dina'

    DROP TABLE #Employeea

    DROP TABLE #Location

    ---------------------------------------------------------

    It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
    David Edwards - Media lens[/url]

    Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
    Howard Zinn

  • newbie2 (9/24/2012)


    AND RTRIM(LTRIM(SD.Code ))<>'';

    The RTRIM and LTRIM are unnecessary here. SQL ignores trailing spaces when checking string equality, hence '' = ' ', true no matter how many spaces you have.

    That predicate can be reduced to AND SD.Code != ''. That != will also eliminate nulls, so you can remove the SD.Code IS NOT NULL as well.

    Neither of the nonclustered indexes that you've created look optimal. Rather consider

    Table1: Index key (Code, Batch, Flag, Num) Include (Type)

    Table2: Index key (Flag, Batch, Num, Date1) Include (ID)

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • As previously stated this is high potential of the issue.

    WHERE SD.Code IS NOT NULL AND RTRIM(LTRIM(SD.Code ))<>''

    That renders your query nonSARGable. You could change that be simply:

    WHERE SD.Code > ''

    This will still find any value that is not '' and NULL will already be excluded.

    --EDIT--

    Had a desk meeting while posting and got pulled away. Seems that Gail already posted much the same as I did. ๐Ÿ˜›

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 โ€“ Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Thank you all for your help!

  • Here's another option that moves the string manipulation and identification of NULLs to the SELECT statement instead of using it in a JOIN or WHERE clause where the 'SARG-ability' of the statement may be an issue. It sets blanks and NULLs to some value (in this example it's '0') and then the Where clause can exclude that value. This option would need to be tested against other options of course.

    SELECT

    [Desc]

    ,[Num]

    ,[Batch]

    ,[CodeId]

    ,[Type]

    ,[Id]

    FROM

    (

    SELECT DISTINCT

    CD.[Desc] AS [Desc]

    ,SD.[Num] AS [Num]

    ,SD.[Batch] AS [Batch]

    ,CD.[CodeId] AS [CodeId]

    ,SD.[Type] AS [Type]

    ,CH.[Id] AS [Id]

    ,ISNULL(NULLIF(RTRIM(LTRIM(SD.[Code])),''),'0') AS [Code]

    FROM

    Table1 SD

    JOIN

    Table2 CH

    ON CH.Num = SD.Num

    AND CH.Batch = @Batch

    AND CH.Flag = 0

    AND SD.Batch = @Batch

    AND SD.Flag = 0

    JOIN

    Table3 M

    ON M.Code = SD.Code

    AND (Map = 1

    OR Map = @Map)

    AND CH.Date1 BETWEEN M.Date2 AND M.Date3

    JOIN

    Table4 CD

    ON M.CodeId = CD.CodeId

    ) AS Result

    WHERE

    [Code] <> '0'

  • GilaMonster (9/24/2012)


    laurie-789651 (9/24/2012)


    2. You could consider using

    FROM Table1 SD WITH (NOLOCK) JOIN Table2 WITH (NOLOCK) etc.

    as long as the table isn't being updated - this saves time as no read locks are issued.

    And incorrect results are possible, not just dirty reads, duplicate or missing rows. If there are no changes being made, there's little gain from nolock, the overhead of taking locks is not that high, and it introduces the potential for incorrect results when changes are being made.

    See - http://blogs.msdn.com/b/davidlean/archive/2009/04/06/sql-server-nolock-hint-other-poor-ideas.aspx

    A quick little demo to show incorrect results from NOLOCK, even when the data you are reading is static.

    http://www.jasonstrate.com/2012/06/the-side-effect-of-nolock/

  • Steven Willis (9/25/2012)


    Here's another option that moves the string manipulation and identification of NULLs to the SELECT statement instead of using it in a JOIN or WHERE clause where the 'SARG-ability' of the statement may be an issue. It sets blanks and NULLs to some value (in this example it's '0') and then the Where clause can exclude that value. This option would need to be tested against other options of course.

    SELECT

    [Desc]

    ,[Num]

    ,[Batch]

    ,[CodeId]

    ,[Type]

    ,[Id]

    FROM

    (

    SELECT DISTINCT

    CD.[Desc] AS [Desc]

    ,SD.[Num] AS [Num]

    ,SD.[Batch] AS [Batch]

    ,CD.[CodeId] AS [CodeId]

    ,SD.[Type] AS [Type]

    ,CH.[Id] AS [Id]

    ,ISNULL(NULLIF(RTRIM(LTRIM(SD.[Code])),''),'0') AS [Code]

    FROM

    Table1 SD

    JOIN

    Table2 CH

    ON CH.Num = SD.Num

    AND CH.Batch = @Batch

    AND CH.Flag = 0

    AND SD.Batch = @Batch

    AND SD.Flag = 0

    JOIN

    Table3 M

    ON M.Code = SD.Code

    AND (Map = 1

    OR Map = @Map)

    AND CH.Date1 BETWEEN M.Date2 AND M.Date3

    JOIN

    Table4 CD

    ON M.CodeId = CD.CodeId

    ) AS Result

    WHERE

    [Code] <> '0'

    There's nothing wrong with your reasoning, Steven - it looks like a winner. However, I strongly suspect that the execution plan of your query would be identical to this:

    SELECT DISTINCT

    CD.[Desc] AS [Desc]

    ,SD.[Num] AS [Num]

    ,SD.[Batch] AS [Batch]

    ,CD.[CodeId] AS [CodeId]

    ,SD.[Type] AS [Type]

    ,CH.[Id] AS [Id]

    ,ISNULL(NULLIF(RTRIM(LTRIM(SD.[Code])),''),'0') AS [Code]

    FROM

    Table1 SD

    JOIN

    Table2 CH

    ON CH.Num = SD.Num

    AND CH.Batch = @Batch

    AND CH.Flag = 0

    AND SD.Batch = @Batch

    AND SD.Flag = 0

    JOIN

    Table3 M

    ON M.Code = SD.Code

    AND (Map = 1

    OR Map = @Map)

    AND CH.Date1 BETWEEN M.Date2 AND M.Date3

    JOIN

    Table4 CD

    ON M.CodeId = CD.CodeId

    WHERE

    ISNULL(NULLIF(RTRIM(LTRIM(SD.[Code])),''),'0') <> '0'

    โ€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.โ€ - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Yes, those two will be identical.

    While the string manipulation's in the select, that derived column is used in the where, so it's absolutely the same as having the string manipulation in the where.

    Both of these, for example, are not SARGable and will simplify to the same query structure.

    SELECT <columns> FROM SomeTable

    WHERE Substring(SomeColumn,2,3) = 'abc'

    SELECT * FROM (

    SELECT <columns>, Substring(SomeColumn,2,3) as TrimmedString FROM SomeTable

    ) sub

    WHERE sub.TrimmedString = 'abc'

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (9/26/2012)


    Yes, those two will be identical.

    While the string manipulation's in the select, that derived column is used in the where, so it's absolutely the same as having the string manipulation in the where.

    Both of these, for example, are not SARGable and will simplify to the same query structure.

    SELECT <columns> FROM SomeTable

    WHERE Substring(SomeColumn,2,3) = 'abc'

    SELECT * FROM (

    SELECT <columns>, Substring(SomeColumn,2,3) as TrimmedString FROM SomeTable

    ) sub

    WHERE sub.TrimmedString = 'abc'

    Thanks for that explanation.

    Just thinking out loud...could there ever be a benefit to having a calculated column that converts any nulls to something non-null for queries/indexes? I suppose the best solution is to not allow nulls but I import lots of data where I'm stuck with what I get. Would it be better to convert the data to a blank or some other value during insert/update even if the value is truly an unknown? What other options are there for avoiding an IsNull conversion or a 'WHERE col = val [or/and] col [is/is not] null"...or is that even something to avoid?

    ย 

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply