• TomThomson (5/1/2014)


    gbritton1 (5/1/2014)


    Before I answered this question, I read

    It reads, in part:

    The hash join has two inputs: the build input and probe input. The query optimizer assigns these roles so that the smaller of the two inputs is the build input.

    then later:

    The hash join first scans or computes the entire build input and then builds a hash table in memory.

    I'm having trouble reconciling the official description with the "correct" answer.

    Without looking anything up, I answered smallest (ie build) input first. When I saw the answer, I constructed a test (using SQL Server 2012) to see what happened. Given one table with 65537 rows and another with 393222 rows (the sizes of teh test tables I built - admittedly not very big, but big enough to refute a silly "correct" answer) I found that whether I wrote the join so that the small table was the first or second referenced in the select clause and whether the small table was teh first or second listed in teh from clause, the actual (not estimated but actual) execution plan took the smaller table as first (ie build) table.

    So I don't just have trouble reconciling the official correct answer with the documentation, I have clear and solid experimantal evidence that the officially correct answer is just plain wrong.

    Tom - See my earlier post... there is no need for the answer to be wrong ( run test using HASH join hint) to match with your tests so far. When writing a hash join using the Hint you put the smaller to the right to make it the build table. Try you experiment with the Hash Join hint and see what happens.

    Your experiment proves what the Query Optimizer does, but not how it is done.

    Of course I would not know about any of this if I had not learned all of this by having to work with an Execution Plan that for some reason had chosen the Larger (2 million records) set as the Build list years ago.