Help with understanding how SQL behaves with inner joins

  • Hi All,

    I am joining two tables using one column between them and i have a where condition on one of the 2 tables.

    When i check the execution plan , it's index seek for the table with where condition but it's and index scan for other table.

    The other table is big one so that hurts performance and i am trying to optimize it as much as possible.

    Is there a way to rewrite the query again to fix that or can changing the sql that has no where condition to a scalar valued function to avoid scanning for the whole index on that table.

    I will be grateful for ideas about improving that.

    Thanks in advance.

    Nader

  • That plan is probably the best available to SQL Server to execute that query.

    Please post:

    1) table and index definitions

    2) the exact query you are running

    3) the actual execution plan for the query. You can enable capturing the actual execution plan by pressing CTRL+M before executing.

    Keep in mind that "scan" does not always equate to bad: in many situations it can be much better than "seek".

    -- Gianluca Sartori

  • If you're not using any calculations or functions in your join conditions, and you don't have an implicit conversion going on, you're probably reading the whole table or most of it. An index scan is correct in those cases.

    For detailed performance help, read this article: http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Hi All,

    Thanks for your replies.

    I have attached the scripts as requested for your kind review.

    Best Regards

    Nader

  • DOes your real City tale only have Malaysian cities in it? I am guessing not, but in this small example there are only 695 rows in the city table and you are asking for * (all columns) so SQL Server is not going to bother doing an index seek then turn around and do a lookup to get the rest of the columns from the table to satisfy your select-column-list. In short, the QO thinks it is cheaper to scan the entire table, getting all columns for your query in the process. There is a tipping point on the number of rows your query will return and also on the columns where a scan makes more sense.

    If this example is too small a scope since you cannot post your entire City table here, post an actual execution from your system so we can see the counts and what the QO is deciding for you.

    A few comments on your query form that may help you on this issue but will definitely help you in general down the line:

    1. Always qualify your table names with the schema name. This saves time for the engine and helps it find your data faster.

    2. Avoid * when returning data. Only select the columns you need to give the best chance of using indexes and also to reduce data being retrieved and returned that won't be used.

    3. Use ANSI-style joins instead of joining tables via the WHERE-clause. It will make it easier to spot problems with your logic since the join-predicates and filtering-predicates will be separated.

    select cty.CityName

    from dbo.Country1 cnt

    join dbo.City1 cty on cnt.CountryID = cty.CountryID

    where cnt.CountryName = 'malaysia';

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Hi Orlando,

    Thanks for your reply.

    I have to generate example because the real data is too big and contains some sensitive data, unfortunately i am not allowed to send.

    The example i sent is same as real one.

    I attached the execution plan for real one in this post.

    Thanks again.

    Nader

  • Can you post the execution plans please? Pictures of them are pretty useless as all the interesting stuff is in the properties of the operators.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I hope it's ok this time.

    Thanks

  • What i need to check is the index scan on Patient table.

    This is a big table and accessed by most functions in system.

    Is there anyway to reduce the count of reads from that table to the number of records coming from first sub query?

  • Try to replace the cartesian products (cross joins) with inner joins in your querys. It could make your query faster.

    See this https://technet.microsoft.com/en-us/library/ms191472(v=sql.105).aspx

    Igor Micev,My blog: www.igormicev.com

  • I did that now but still on patient table it's an index scan operation and big number of reading on table.

  • nadersam (2/16/2016)


    What i need to check is the index scan on Patient table.

    This is a big table and accessed by most functions in system.

    Is there anyway to reduce the count of reads from that table to the number of records coming from first sub query?

    The query is horribly written but appears to be functional.

    The patient table is accessed using the index [_dta_index_PATIENT_5_336420668__K10_25_26_27_28] which appears to be suboptimal. Ideally you want PatientID as the leading edge of the index. Rather than changing this index or creating a new one, I recommend you post the scripts for all of the indexes on the patient table - it's quite likely that an index could be tweaked.

    Presumably there's a computed column patengname on the patient table. Your index should INCLUDE the columns used by the expression.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • At the beginning i had this query as a view with normal joins but noticed the index scan on patient table and high reads also.

    I thought if i change the view to an sql that utilizes the sub queries things might be better.

    My idea was to join the patient table on the outcome of first sub query(filtered table by where condition)

    instead of joining on the whole table.

    Yes i am already using the included column feature and the index used is already like that.

    I attached the indexes on that table.

    You mentioned that the sql is horribly written, can you tell me why please.

    Thanks

    Nader

  • Your Patient table is horribly overindexed. That's what you get when you accept all the recommendations from the database tuning advisor.

    I recommend that you look at the utilization of those indexes and try to delete unused ones and merge the others as much as possible.

    That said, the execution plan does not seem unreasonable, based on your query and your indexes.

    You have no filter predicate on the patient table, so reading the whole table and merging it with the results of the common table expression may not be completely off track.

    What would probably help here is an index on (patient_id), include(patengname). Maybe a smaller index can speed up the query.

    -- Gianluca Sartori

  • More suggestions:

    1) Get rid of those NOLOCK hints: they won't make the query faster, but will potentially return incorrect data (reading rows multiple times or not at all).

    2) Are you sure you need that DISTINCT in the common table expression "OD"? If the cardinality of the two tables is not 1:1, you may want to use a subquery to return DISTINCT storekey, store_code from ASTOLOC.

    3) ODORDERS returns a moderate amount of rows, but you may squeeze some more perf out of it with a filtered index with filter predicate STATUS = 0. I would go down this route as a last resort.

    -- Gianluca Sartori

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply