Need help with a simple query from a one to many table relation.

  • ChrisM@Work (10/8/2015)


    Greg Edwards-268690 (10/8/2015)


    NOT EXISTS, from what I understand, is a simple Boolean check.

    Aggregates must first gather all results.

    I am not at all surprised it is generally faster.

    Many times I found it to take a user query that churned, sometimes never finishing, into one that returned results very quickly.

    A very good item to have in your toolbox.

    Absolutely - but I've also seen Scott's suggestion of a derived table with aggregation to leave an EXISTS check in the dust. It depends on the shape of the data. EXISTS almost always wins when a) the searched column is a long way from unique or b) you have a small number of probes into a much larger probed table.

    Exactly. In this case, when I looked at this, my knee-jerk reaction was no way is this faster. IO was going to be much higher. That being said, I suspect that it may perform on the actual data set as opposed to the example provided.

    Once again, there are few absolutes and it really depends.

    Michael L John
    If you assassinate a DBA, would you pull a trigger?
    To properly post on a forum:
    http://www.sqlservercentral.com/articles/61537/

  • ChrisM@Work (10/8/2015)


    Greg Edwards-268690 (10/8/2015)


    NOT EXISTS, from what I understand, is a simple Boolean check.

    Aggregates must first gather all results.

    I am not at all surprised it is generally faster.

    Many times I found it to take a user query that churned, sometimes never finishing, into one that returned results very quickly.

    A very good item to have in your toolbox.

    Absolutely - but I've also seen Scott's suggestion of a derived table with aggregation to leave an EXISTS check in the dust. It depends on the shape of the data. EXISTS almost always wins when a) the searched column is a long way from unique or b) you have a small number of probes into a much larger probed table.

    Exactly. A simple boolean check could take billions of I/Os, and an aggregate almost none, it just depends on the specific table and how it is set up.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Deleted... Basically said the same thing as Drew...

    Note to self: Click the next page button BEFORE posting...

  • ScottPletcher (10/7/2015)And, yes, with the appropriate index, the NOT EXISTS becomes better. But not as the tables were presented.

    I agree with you. As originally posted, a NOT EXISTS should have to table scan the entire tblRecipeAllergens table. Trivially fixed with the index Michael L John applied.

    "knutpet" - since there is a relatively small list of Allergens and they recur a lot, I would recommend creating a separate "Allergens" table as:

    create table Allergens (

    AllergenID int identity(1,1),

    AllergenName varchar(32) not null);

    Then - use the AllergenID in the tblRecipeAllergen table instead of the Allergen name. That will make that table (and associated indexes) much more compact and faster to search, as the search will be on integer values instead of string matches. Assuming that there are less than 32,000 distinct allergens, you could even declare AllergenID as a "smallint" in both tables to further reduce the table sizes.

    I am not sure how many distinct allergens exist, but that table is probably so small as to not need any indexes, although you could define AllergenID as the primary key for safety.

  • Even more important would be properly (uniquely) clustering the tblRecipeAllergen table on ( Recipe_number, Allergen ) (and removing AllergenId from the table, as it's not needed).

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Hi everybody.

    I seen the different solution and have now try them in the real environment.

    As I said in the beginning, My sql skills are not the best, but i know how to use the different tools, write simple sql code and ++

    And I have deep respect for you guys who can find solution so fast.

    My skills are more related towards Classic Asp, VB from many years ago.

    I have try the solution from John Mitchell, ChrisM and ScottPletcher, and see big variation in speed.

    I try them on a table with 32.000 recipe and linked to a allergen/recipe table with 323.000 records. This table is made from a stored procedure on the fly.

    I have try different variation of recipe group and allergen, and find the following after several test.

    The solution from John Mitchell takes an average of 1,06 sec to run on my chosen criteria.

    The solution from ChrisM takes an average of 0,9 sec to run on my chosen criteria. Make use of Not Exists

    The solution from Scott Pletcher takes an average of 0,12 sec to run on my chosen criteria. Make use of Group By and the case-then-else function.:-)

    I have just tested the speed, nothing else.

    I'm not familiar with SQL Query Profiler and have just check the speed of a query when it runs in the query windows in management Studio and check the elapsed time.

    Each query I have run for 10 times, and use the exactly same data and criteria.

    I also seen the advice from Kim Crosser, and some of them are implemented.

    I got an allergentable with just 28 records, and use the Id of the allergen to relate to ingredient in the recipe.

    I also think about writing the allergen into the recipe table, but i feel that this is a solution that is not the best. It will cause 28 new columns.

    Next week I will implement this on my website, and then have some more speed test and get some input from my readers. A solution like this for people with allergy and diet problem, this is helpful. My biggest challenge is to update and maintain the data.

    For me it is interesting to see how much the speed can be influence of using Not Exists, Group By, Indexes ++

    And I think that this can bee pretty important for other one who build up query like this. When query should run on a website, speed is the most important factor

    For the record:

    My server is a Dell PowerEdge R420 with 64 GB RDIMM Ram and Intel Xeon E5-2440 2.40GHz. Windows 2012 Server with standard SQL 2012 server

    MVH

    Knut

Viewing 6 posts - 16 through 20 (of 20 total)

You must be logged in to reply to this topic. Login to reply