Accessing lookup tables

  • Hi I'm creating sql views that access lookup tables to return the full description from a table that holds a list of values. In each sql query there can be 10 or more or less of these lookup fields.

    There is two ways I know of to do this, one is to have a select statement in the select statement the other is to use a left outer join to the lookup table in the from section of the query.

    In my limited knowledge of these things they both look the same and take about the same time to execute, the record count is anything from 2,000 to 30,000 records. Which way is the best way to do this or does it really matter?

    Bruce

  • By 'select in select' do you mean sub queries? In that case it is better to use joins that sub queries. You may not be noticing any difference currently due to less number of rows.

    You can use following statements before running both queries to see the cost

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    ----------------------------------------------------------------------------------------------------------------------------------------------------
    Roshan Joe

    Jeff Moden -Forum Etiquette: How to post data/code on a forum to get the best help[/url]

  • Be very cautious in what you're doing. From my reading of it, you're creating a view that does a whole series of JOIN operations (and a JOIN is the better way to go in most cases than a correlated sub-query) between a main table and a bunch of lookup tables. Presumably you'd do this for each of your main tables. Then you would use the views in subsequent queries in order to not have to write the JOIN operations to the lookup tables. Correct?

    If so, this is a common practice, frequently referred to as a common code smell, that leads to issues. Each query, and a view is a query, not a table, has to be approached by the optimizer and dealt with. When you have lots and lots of JOIN operations, each of these has to be examined for how they're going to be best dealt with for the query in question. The issue arises when you're not using all those tables.

    Let's simplify things for an example. Let's say we have two views. Each view joins 10 tables. Then, we're going to join the views together in a new query, so we're hitting 20 tables. Now we'll assume two different approaches. Approach #1, we use every column from both views. In this case, the work the optimizer does is fully justified because we're using all the columns, so it's not resolving JOIN operations that aren't needed and it doesn't have to go through the simplification process. If you always query this way, well, you might be moving a lot of unnecessary data, but you're not intentionally hurting the optimization process. Approach #2, we're only going to select some of the columns. Now the optimizer has to go through the process of simplification. It has to decide which joins it needs and which ones it doesn't. It has to do this for every possible combination of the joins in question. This usually leads to an optimization timeout. This means that the optimizer was unable to find a good enough plan for your query and your performance will seriously suffer.

    I would absolutely not condone this approach. Is it a pain to have to write the JOIN to each lookup table each time you need one? Yes. Is it better for performance purposes in SQL Server to do this? Yes. T-SQL just doesn't lend itself to the kind of code reuse that other types of languages can do.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • joeroshan (11/29/2016)


    By 'select in select' do you mean sub queries? In that case it is better to use joins that sub queries. You may not be noticing any difference currently due to less number of rows.

    You can use following statements before running both queries to see the cost

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    I no longer advocate for using STATISTICS IO. It puts a serious load on the system and can negatively impact performance so that you get inaccurate TIME measurements. Instead, especially for 2012 and greater, I advocate for using Extended Events. They put much less load on the system. Yes, you have to set it up and look somewhere other than Messages for output, but you'll get accurate measures.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Grant Fritchey (11/29/2016)


    joeroshan (11/29/2016)


    By 'select in select' do you mean sub queries? In that case it is better to use joins that sub queries. You may not be noticing any difference currently due to less number of rows.

    You can use following statements before running both queries to see the cost

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    I no longer advocate for using STATISTICS IO. It puts a serious load on the system and can negatively impact performance so that you get inaccurate TIME measurements. Instead, especially for 2012 and greater, I advocate for using Extended Events. They put much less load on the system. Yes, you have to set it up and look somewhere other than Messages for output, but you'll get accurate measures.

    Thanks for reminding that again Grant. I remember you mentioning that in your session on performance tuning at SQL Saturday Cambridge which I was lucky enough to attend. Need to kick old habits 🙂

    ----------------------------------------------------------------------------------------------------------------------------------------------------
    Roshan Joe

    Jeff Moden -Forum Etiquette: How to post data/code on a forum to get the best help[/url]

  • I'm still working out your diagnoses Grant, thanks for that. FYI this is an example of one of the queries with less lookups and less tables, these are all tables and as you can see the lookups are all in one table... so far. Notice the first query using joins still have a subquery in it. The table structure in this database is very unusual which means a lot of hardcoding is necassary unfortunately.

    select distinct cast(attr.assnbri as integer) Asset_Number, ar.long_descr, attr1.seln_code1 Main_Use,attr1.seln_code2 Diameter_mm,attr1.seln_code3 Material,

    lnmaint.fdescr Lining_Material,lntec.fdescr Lining_Technique, lntype.fdescr Lining_Type,

    attr1.val_num1 Length_m,attr1.val_alpha2 Upstream_MH_UID,attr1.val_num3 USIL_mAHD_UID,attr1.val_alpha4 Downstream_MH_UID, attr1.val_num5 DSIL_mAHD_UID,

    (select fdescr from SELN_CODE where SELN_TYPE = 'AM_TRNCH' and SELN_CODE=attr1.val_alpha8) Trenchless_Method,

    format(attr1.val_datei9,'dd/MM/yyyy') Camera_Inspection,

    uflds.val_alpha17 Drawing_Ref,

    own.fdescr Owner_by, own2.fdescr Controlled_by,maint.fdescr Maintained_by,

    format(attr1.val_num1,'0.00') Length_m, format(attr1.val_num2,'0.00') Alignment_m,format(attr1.val_num3,'0.00') Depth,format(uflds.val_datei16,'dd/MM/yyyy') Construction_Date

    from ASSET ar inner join ASSET_ATTR attr on ar.ASSNBRI=attr.ASSNBRI

    left outer join ASSET_ATTR as attr1 on attr.assnbri=attr1.assnbri

    left outer join ASSET_ATTR as attr2 on attr.assnbri=attr2.assnbri

    left outer join SELN_CODE lnmaint on lnmaint.SELN_CODE=attr1.seln_code8 and lnmaint.SELN_TYPE = 'AM_LINEM'

    left outer join SELN_CODE lntec on lntec.SELN_CODE=attr1.seln_code9 and lntec.SELN_TYPE = 'AM_LINET'

    left outer join SELN_CODE lntype on lntype.SELN_CODE=attr1.seln_code10 and lntype.SELN_TYPE = 'AM_LINTY'

    left outer join SELN_CODE own on own.SELN_CODE=attr2.val_alpha1 and own.SELN_TYPE = 'AM_OWNER'

    left outer join SELN_CODE own2 on own2.SELN_CODE=attr2.val_alpha2 and own2.SELN_TYPE = 'AM_OWNER'

    left outer join SELN_CODE maint on maint.SELN_CODE=attr2.val_alpha3 and maint.SELN_TYPE = 'AM_MAINT'

    left outer join ASSET_UF as uflds on attr.assnbri=uflds.assnbri

    where (attr.reg_name='WATER') and (attr1.ATTR_UNIQUE_ID = '3F1E7E49AE27434DADE55AC92175F7545B263815'

    and attr2.attr_unique_id = '66F79E38CBFD422F8B2A70590D029996FC6B4CB5')

    select distinct cast(attr.assnbri as integer) Asset_Number, ar.long_descr, attr1.seln_code1 Main_Use,attr1.seln_code2 Diameter_mm,attr1.seln_code3 Material,

    (select fdescr from SELN_CODE where SELN_TYPE = 'AM_LINEM' and SELN_CODE=attr1.seln_code8) Lining_Material,

    (select fdescr from SELN_CODE where SELN_TYPE = 'AM_LINET' and SELN_CODE=attr1.seln_code9) Lining_Technique,

    (select fdescr from SELN_CODE where SELN_TYPE = 'AM_LINTY' and SELN_CODE=attr1.seln_code10) Lining_Type,

    attr1.val_num1 Length_m,attr1.val_alpha2 Upstream_MH_UID,attr1.val_num3 USIL_mAHD_UID,attr1.val_alpha4 Downstream_MH_UID, attr1.val_num5 DSIL_mAHD_UID,

    (select fdescr from SELN_CODE where SELN_TYPE = 'AM_TRNCH' and SELN_CODE=attr1.val_alpha8) Trenchless_Method,

    format(attr1.val_datei9,'dd/MM/yyyy') Camera_Inspection,

    uflds.val_alpha17 Drawing_Ref,

    (select fdescr from SELN_CODE where SELN_TYPE = 'AM_OWNER' and SELN_CODE=attr2.val_alpha1) Owner_by,

    (select fdescr from SELN_CODE where SELN_TYPE = 'AM_OWNER' and SELN_CODE=attr2.val_alpha2) Controlled_by,

    (select fdescr from SELN_CODE where SELN_TYPE = 'AM_MAINT' and SELN_CODE=attr2.val_alpha3) Maintained_by,

    format(attr1.val_num1,'0.00') Length_m, format(attr1.val_num2,'0.00') Alignment_m,format(attr1.val_num3,'0.00') Depth,format(uflds.val_datei16,'dd/MM/yyyy') Construction_Date

    from ASSET ar inner join ASSET_ATTR attr on ar.ASSNBRI=attr.ASSNBRI

    left outer join ASSET_ATTR as attr1 on attr.assnbri=attr1.assnbri

    left outer join ASSET_ATTR as attr2 on attr.assnbri=attr2.assnbri

    --left outer join ASSET_ATTR as attr3 on attr.assnbri=attr3.assnbri

    left outer join ASSET_UF as uflds on attr.assnbri=uflds.assnbri

    where (attr.reg_name='WATER') and (attr1.ATTR_UNIQUE_ID = '3F1E7E49AE27434DADE55AC92175F7545B263815'

    and attr2.attr_unique_id = '66F79E38CBFD422F8B2A70590D029996FC6B4CB5')

    I imagine there's a lot smarter way of doing this.

    Bruce

  • this is the fastest after some changes to get rid of the distinct clause:

    select cast(ar.assnbri as integer) Asset_Number, ar.long_descr, attr1.seln_code1 Main_Use,attr1.seln_code2 Diameter_mm,attr1.seln_code3 Material,

    (select fdescr from SELN_CODE where SELN_TYPE = 'AM_LINEM' and SELN_CODE=attr1.seln_code8) Lining_Material,

    (select fdescr from SELN_CODE where SELN_TYPE = 'AM_LINET' and SELN_CODE=attr1.seln_code9) Lining_Technique,

    (select fdescr from SELN_CODE where SELN_TYPE = 'AM_LINTY' and SELN_CODE=attr1.seln_code10) Lining_Type,

    attr1.val_num1 Length_m,attr1.val_alpha2 Upstream_MH_UID,attr1.val_num3 USIL_mAHD_UID,attr1.val_alpha4 Downstream_MH_UID, attr1.val_num5 DSIL_mAHD_UID,

    (select fdescr from SELN_CODE where SELN_TYPE = 'AM_TRNCH' and SELN_CODE=attr1.val_alpha8) Trenchless_Method,

    format(attr1.val_datei9,'dd/MM/yyyy') Camera_Inspection,

    uflds.val_alpha17 Drawing_Ref,

    (select fdescr from SELN_CODE where SELN_TYPE = 'AM_OWNER' and SELN_CODE=attr2.val_alpha1) Owner_by,

    (select fdescr from SELN_CODE where SELN_TYPE = 'AM_OWNER' and SELN_CODE=attr2.val_alpha2) Controlled_by,

    (select fdescr from SELN_CODE where SELN_TYPE = 'AM_MAINT' and SELN_CODE=attr2.val_alpha3) Maintained_by,

    format(attr1.val_num1,'0.00') Length_m, format(attr1.val_num2,'0.00') Alignment_m,format(attr1.val_num3,'0.00') Depth,format(uflds.val_datei16,'dd/MM/yyyy') Construction_Date

    from REG_ASSET ar inner join REG_ASSET_ATTR attr1 on ar.ASSNBRI=attr1.ASSNBRI and (attr1.ATTR_UNIQUE_ID = '3F1E7E49AE27434DADE55AC92175F7545B263815')

    left outer join REG_ASSET_ATTR attr2 on ar.ASSNBRI=attr2.ASSNBRI and (attr2.ATTR_UNIQUE_ID = '66F79E38CBFD422F8B2A70590D029996FC6B4CB5')

    left outer join REG_ASSET_UF as uflds on ar.assnbri=uflds.assnbri

    where (attr1.reg_name='WATER')

    Bruce

  • Oh wow, it's a muck table[/url]. If you ever get a shot at redesign... this should be public enemy #1.

    The first query is the preferred method. Not the second. It'll work, but it's likely to lead to some poor choices by the optimizer. Even though it's a muck table, you're still going to want to treat a JOIN as a JOIN and not try to force it into correlated sub-queries.

    However, don't take my word for it. Generate an execution plan for each query and compare them. Which one is using the indexes better, fewer scans, more seeks, etc. That'll give you a lot of knowledge as to which query is preferable.

    In general, looking through the query, I didn't spot any traditional issues (beyond the single lookup table, that's such a problematic design method). It looks fundamentally sound. Are you hitting problems beyond just the need to keep referencing that single table (which is a problem)?

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Ah, I didn't see the second post there, sorry. So, the correlated queries are running faster? Hmmm.... I'd want to see both execution plans to understand why.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Thanks Grant, I like what you say it makes sense, as you can see it's ugly. I've attached a couple of txt files containing xml of the execution plan for each query.

    I'm glad the problem is common I thought it was just me. currently reading the article Lookup Table Madness, unfortunately I'm stuck with this DB.

    And thanks Joe for input.

    It appears I should be using the joins. The views I will create will be accessed from another Server\DB and joined using spatial joins to bring attributes into mapping.

    Bruce

  • It looks like they are not MUCK tables as they leave generalisation to the application in that they use an identifier for each unique list in one large (dare I say it) lookup table. This does enable the front end user to be as general or specific as they wish.

    Bruce

  • The plan with join looks better since you are getting a seek for the Asset table. But both cases the left join on the asset_attr table is going for a scan as opposed to the inner join on the table with same predicates , which has a seek. It might be because of the fact that estimated number of rows are more compared to table's total row count. Or, this may be because of stale statistics.

    Can you please post actual plan? If the estimated and actual number of rows are very much different, you may need to update statistics.

    ----------------------------------------------------------------------------------------------------------------------------------------------------
    Roshan Joe

    Jeff Moden -Forum Etiquette: How to post data/code on a forum to get the best help[/url]

  • Both queries went to Full optimization which is good. No timeouts. You have several scans and a key lookup on [DB1].[dbo].[ASSET_ATTR]. Those are my concerns. It means that the WHERE clause just isn't filtering. In one case, the entire table is estimated as being returned. In the second table, about 50,000 rows out of 187,000 (or thereabouts, going by memory) are being returned. Filtering the data is going to lead to better performance. 13,000 rows, unless you're doing some kind of aggregation on the client (and that might not be a great choice) is a lot of data to be moving around. Humans don't look at 13,000 rows of information.

    Presumably the actual queries that you're going to run against these views is going to supply filtering information? If so, I wouldn't focus so much on tuning the views and would instead focus on ensuring that the queries against the views are tuned adequately. You will get a different execution plan, and therefore different behavior, when you add a WHERE clause to the query against the view.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Joe - it might be better to do both inner joins on the asset to attribute table which returns a match of 14,538 records a 40,355 difference from returning all 54,893 records using a left join which is the total row count.

    I can't post the actual .sqlplan file as I change the db and table names because this is a public forum. send your email address and I will send them. the files I supplied are the execution plan exported to xml and saved as a txt file with only the name changes so they should reflect the true plan.

    Grant - I hope the WHERE clause filters the asset table which limits the attribute table joins which as above could be changed to an inner join on both joins by the look of things. Because of the (lack of) speed, not only on the queries but also network and virtual servers, at the GIS end the query is changed into a featureclass (table) updated each night, to speed up mapping possibly accessed anywhere in the world. Regardless the WHERE clause will be initiated once loaded into mapping applications.

    Does this give an explanation why the subqueries are working faster... should I go that way?

    thanks

    Bruce

  • Bruce-12445 (11/30/2016)


    Joe - it might be better to do both inner joins on the asset to attribute table which returns a match of 14,538 records a 40,355 difference from returning all 54,893 records using a left join which is the total row count.

    I can't post the actual .sqlplan file as I change the db and table names because this is a public forum. send your email address and I will send them. the files I supplied are the execution plan exported to xml and saved as a txt file with only the name changes so they should reflect the true plan.

    Hi, altering to inner join is up to you based on your requirement. I understand that you cannot share actual plan. But if you can look at operation costing 60% and look at estimated rows vs actual rows to see your statistics are alright.

    And as Grant said, if your frequent queries have additional filters, then you can focus more on them.

    ----------------------------------------------------------------------------------------------------------------------------------------------------
    Roshan Joe

    Jeff Moden -Forum Etiquette: How to post data/code on a forum to get the best help[/url]

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply