Join Operations – Hash Match

  • Comments posted to this topic are about the item Join Operations – Hash Match

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Fantastic article, Jason!

    I know what a challenge it is to set up a realistic test scenario that illustrates the behavior you're trying to explain, and you did a great job with that. I also learned some new (to me) details about the way the hash match works.

    Bookmarking now for future reference 🙂

  • Thanks Troy. Much appreciated.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Great article Jason! An enjoyable and informative read.

    One question. In the section where you're describing left-deep/right-deep, is it safe to assume you're saying that the context only applies to a Hash Merge? In the table column, you don't refer to left-deep/right-deep like you do in the first two rows. Clearly though there is some impact caused to all the JOIN types by the different predicates and in each case it seems to be in the same direction.

    A very tiny complaint too. You made me search for your first article because you didn't hyperlink it into this one! Shame on you. I found it though and I'm reading it now.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • dwain.c (3/4/2014)


    Great article Jason! An enjoyable and informative read.

    One question. In the section where you're describing left-deep/right-deep, is it safe to assume you're saying that the context only applies to a Hash Merge? In the table column, you don't refer to left-deep/right-deep like you do in the first two rows. Clearly though there is some impact caused to all the JOIN types by the different predicates and in each case it seems to be in the same direction.

    A very tiny complaint too. You made me search for your first article because you didn't hyperlink it into this one! Shame on you. I found it though and I'm reading it now.

    Thanks Dwain.

    Per the left deep right deep and effect on the other join operators - I will have to look at that closer.

    Sorry about that link. It was at the top (of the article) and probably could have been more visible.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • SQLRNNR (3/4/2014)


    Sorry about that link. It was at the top (of the article) and probably could have been more visible.

    Pardon my aging eyes. The link is there I just didn't see it. Complaint withdrawn.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • dwain.c (3/4/2014)


    SQLRNNR (3/4/2014)


    Sorry about that link. It was at the top (of the article) and probably could have been more visible.

    Pardon my aging eyes. The link is there I just didn't see it. Complaint withdrawn.

    No problem. I can see how it would easily be missed. It could be a lot more visible (probably by including the title of the article 😉 ).

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Hi Jason,

    a quick question, because I'm just not getting it straight in my head.

    You said "For any joins, use the first (top) input to build the hash table and the second (bottom) input to probe the hash table"

    Kimberly Tripp said [SQLSkills IE1, module 10 Indexing strategies, Slide 37, Join Strategies]

    "Hash join

    Two-phase operation (build, then probe): Build table(smaller set) and probe table (larger set) allowing SQL to join extremely large sets - in MEMORY (can spill)"

    If you are both correct, which is a distinct possibility, that would indicate that you are talking about different things.

    When you say 'first(top) input', do you mean the first table the from clause, or the ordering in the plan post optimisation, or something else ?

    Can you help me to clear up my confusion between the differences [in my understanding] between your article and Kimberly's course?

    Thanks

    Simon

  • I haven't taken any of the IE courses by SQLSkills so couldn't compare what is actually on the slides and the context.

    But from what you posted it looks like the same thing is being said - just in a different way.

    Both reference the hash being a two phase operation. Both statements reference the hash table as the first table being built (which is the top table in the execution plan) and the second table being the bottom input from the plan and that it probes the first table (in other words probes the hash table that was built from the top input).

    First and second table are according to the execution plan and not the order you had them listed in your query.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • OK, Thanks, got it now.

    'Input' is not necessarily the same as 'table in the from clause'

    Regards

    Simon

  • I just read this for the first time this morning. Very informative and we'll written. Great work Jason.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Thank you Alan.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Thanks for the training.

  • simon.dobner 32307 (5/4/2014)


    OK, Thanks, got it now.

    'Input' is not necessarily the same as 'table in the from clause'

    Regards

    Simon

    This is a great article, and I think this question in the comments adds a lot of value for people new to query optimisation. SQL is supposed to be a declarative language; you tell it what output you want, not how to execute. It *shouldn't* matter what order you put the tables in your join clause, and doesn't, until a certain point where you have lots of tables and other complexities to be considered which leads to lots of possible combination ways that the SQL query engine can re-arrange your query to determine the best possible plan.

    The number of possible plans goes up exponentially with the number of tables and other complexities, so there is a point where the query engine gives up searching for the optimal plan and just goes with the best it could find in that limited time period. If it didn't do that the it's possible that the search for the best plan could take a lot longer than just running a *good enough* plan. It could takes years.

    SQL query plan optimisation is NP hard.

    So what do you do? Well if you see the query plan using a larger table as the hash table on top to hash join to a smaller table then perhaps rearranging the tables ordering in the join clause will help. Maybe, but there's so many factors that go into generating a plan that you can't take table join order in isolation. The query engine certainly starts somewhere, and if it were possible to shortcut the checking of plans so that it finds an optimal one sooner then certainly worth trying. Easier said than done.

    Whatever you do don't go using these join hints to force the type of join. There's very few occasions where you can outfox the query engine like that. What might be optimal for a certain state of data and certain predicates might be disastrous for another set. Data (state) changes. Programmers dream about stateless databases (that's my favourite bad joke ever :p )

    Hash matches are good for unsorted data where one table is large and the other table is small enough to make a practical hash table. If down the track the data changes, or indexes change and then both tables are now similarly sized and pre-sorted then a merge join is likely optimal and if you've told it specifically to hash join then you've just missed out on a massive performance gain happening for you. (with lots of *it depends* caveats like caching of plans, parallelism, etc etc)

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply