SQL & the JOIN Operator

  • David Walker-278941 (10/7/2009)


    Jeff Moden (10/7/2009)


    Actually, they ARE becoming required. The use of table names instead of aliases has been deprecated.

    Even if they weren't, short table names may work fine but not longer ones... they really clutter up the works so far as readability.

    What really needs to be taught is the correct type of table aliasing. Table aliases should mean something that a simple table name may not be able to infer as well as being shorter.

    I call BS on your "deprecated" comment. Deprecated by whom? And table aliases do not "clutter" anything up. Words are easier for the human brain to read than abbreviations; I think that's well known from reading studies. We read words more often than abbreviations of those words.

    Still, BS on table names becoming "deprecated".

    Interesting you used an abbreviation in your argument against abbreviations.:w00t:

    Now for my "mini rant". I apologize in advance for the tone.:-D

    I call BS on your "Words are easier for the human brain to read than abbreviations". We use abbreviations in common language all the time and never think twice. Do you go by Dave, or David? Would you be more likely to say "Microsoft Structured Query Language Server", or "MS SQL Server"? Do you write "Okay", or "OK"? I'll stop here, I think you get my point.

    When using abbreviations in technical papers, it's common courtesy to spell phrases out the first time with abbreviations included in parentheses (parens) to introduce them. Isn't it interesting how similar that is to the way aliases are used in a query!

    Speaking of queries, the ones in the article are simple examples, but when you're pulling many columns and joining on multiple keys, repeating full table names over and over gets tedious to read and write. I actually would prefer "xv" over "ExceptionVisits" if I was pulling many columns and/or had complicated selection criteria. Of course I don't know what "sg", "rr", "rp", and "rpv" refer to, but if I worked with the database for a few days and the aliases were used consistently I'd know exactly what they were. If I was slow to pick it up and the aliases REALLY bothered me, I'd simply do a text substitution in my favorite editor to generate a query with table names repeated ad nausea.

    As I said before though, it really is a matter of personal preference.

  • Andy - I would agree.

    Only thing that bothers me is when people use meaningless abbreviations like

    select a.somefield, b.somefield

    from table1 a, table2data b

    A and B are not too informative....

  • Thanks for a lucid article. I'm from the "old school" so this cemented what I had picked up ad hoc.

    I have a quick question however. I have, not infrequently, had to write queries joining A to B then joining B to C. What is the syntax for that construct?

  • SuperDBA-207096 (10/7/2009)


    Andy - I would agree.

    Only thing that bothers me is when people use meaningless abbreviations like

    select a.somefield, b.somefield

    from table1 a, table2data b

    A and B are not too informative....

    Hence Jeff's post on "What really needs to be taught is the correct type of table aliasing. Table aliases should mean something that a simple table name may not be able to infer as well as being shorter.":-)

    Andy - great tone, style and content!:hehe:







    **ASCII stupid question, get a stupid ANSI !!!**

  • Richard Gibbins (10/7/2009)


    Thanks for a lucid article. I'm from the "old school" so this cemented what I had picked up ad hoc.

    I have a quick question however. I have, not infrequently, had to write queries joining A to B then joining B to C. What is the syntax for that construct?

    It is the same. You are just joining to the result set. Here is an example. I need to querry the system tables (SQL 2000) to get the column names and data types for a table. If you must know I'm writing code that writes code.

    SELECT

    syscolumns.colid AS 'Column Number',

    '[' + syscolumns.name + ']' AS 'Column Name',

    systypes.name +

    CASE WHEN systypes.xusertype IN (165,167,175,231,239) THEN

    '(' + Convert(varchar(10),syscolumns.length/2) +')'

    ELSE

    ''

    END AS 'Data Type'

    FROM sysobjects

    INNER JOIN syscolumns ON sysobjects.id = syscolumns.id

    INNER JOIN systypes ON syscolumns.xusertype = systypes.xusertype

    WHERE

    sysobjects.xtype = 'u'

    AND sysobjects.name = 'Customer'

    ORDER BY syscolumns.colid

    The querry is against sysobjects to get information to get information from the table (your A). The first INNER JOIN get information about each column in the table (your A to B). The seccond INNER JOIN gets information about the data type for a particular column (your B to C). does that help?

    ATBCharles Kincaid

  • Charles Kincaid (10/7/2009)


    '[' + syscolumns.name + ']' AS 'Column Name',

    Please...please please use QUOTENAME for things like this 🙂

  • Chris.Strolia-Davis (10/7/2009)


    SQL Noob (10/7/2009)


    is there any performance difference doing a union all compared to a join when you need to get all the rows from two or more tables? i have a database where i do union all on some data in 20 tables or so when running a report and it seems to take a long time

    I'm not certain I understand your question.

    A UNION ALL combines the data from two similar sets into a single set with all members of each set.

    A JOIN matches the data in one set with the data in another set.

    Based on my current knowledge of SQL, I believe it might be theoretically possible to replicate a UNION ALL in a JOIN (although even in theory I think it would be dependent on your table structure), but it would be atypical and in all likelihood it would take longer to run than a straight "union all".

    I have not tested this theory.

    I'll admit, even as I attempt to consider an alternative that involves a JOIN, I still envision needing a UNION ALL in the query and a requirement for each record to have a unique id across all tables.

    i have a database where i dump logs into from 50 or so domain controllers, sql servers and other servers. each server has one or more tables. to get all failed logons from the last 2 days i query all the tables and link them by union all statements.

    would it be faster to just find something to join it on? i'm even looking at combining a lot of the tables into 3-5 tables and a few lookup tables instead of using union all

  • davidpenton (10/7/2009)


    Please...please please use QUOTENAME for things like this 🙂

    Sorry. I had copied this querry from the net and fixed the joins. The original just had all the tables in the WHERE separated by commas. I reworked it to use the INNER JION so when asked about the syntax I thought immediately about my work in progress.

    ATBCharles Kincaid

  • Charles Kincaid (10/7/2009)

    FROM sysobjects

    INNER JOIN syscolumns ON sysobjects.id = syscolumns.id

    INNER JOIN systypes ON syscolumns.xusertype = systypes.xusertype

    The querry is against sysobjects to get information to get information from the table (your A). The first INNER JOIN get information about each column in the table (your A to B). The seccond INNER JOIN gets information about the data type for a particular column (your B to C). does that help?

    Yes, thanks, it does. I was expecting a more explicit syntax since, as far as I can tell, the whole point of the join operator is to make explicit what was previously implied. With this syntax, we revert to "read carefully" mode.

    Richard

  • You can add that the use of the words INNER and OUTER is not necessary.

  • Speaking of queries, the ones in the article are simple examples, but when you're pulling many columns and joining on multiple keys, repeating full table names over and over gets tedious to read and write...

    I think it really just depends on the situation. I often abbrevate (much to the chagrin of my boss) because, like you, I don't find it feasible to string together 8-9 fields with REALLY long names as is often necessary for one of our databases. It's one of those "that's great in practice, but how does it work in theory" arguments...

    IMHO. 🙂

  • Good Article,

    I have a question on Joins.Can anyone help me.

    When I run below query it is running for 40 mins as it is using Table Spool.

    Query:

    select a.*,b.*

    from table a

    left join table b

    on a.id = b.id

    and a.date1 between b.date1 and b.date2

    or b.date2 is null

    Modifing this query is running in 5 mins.

    select a.*,b.*

    from table a

    left join table b

    on a.id = b.id

    and (a.date1 between b.date1 and b.date2 or b.date2 is null)

    But I am not sure whether both queries results same output.

    Can you explain how does it works while we use multiple join conditions using venn diagrams.

    (considering order of precedence)

    Thanks,

    Varun.

  • Inner join Result set Key1 value last tuple should be 8 not 6;-)

  • David Walker-278941 (10/7/2009)


    Jeff Moden (10/7/2009)


    Actually, they ARE becoming required. The use of table names instead of aliases has been deprecated.

    Even if they weren't, short table names may work fine but not longer ones... they really clutter up the works so far as readability.

    What really needs to be taught is the correct type of table aliasing. Table aliases should mean something that a simple table name may not be able to infer as well as being shorter.

    I call BS on your "deprecated" comment. Deprecated by whom? And table aliases do not "clutter" anything up. Words are easier for the human brain to read than abbreviations; I think that's well known from reading studies. We read words more often than abbreviations of those words.

    Still, BS on table names becoming "deprecated".

    My humble apologies... I couldn't understand your outrage until I read my own post from the quote you included. I was typing too fast and left out a critical word making your comment partially correct... I should have said...

    Actually, they ARE becoming required. The use of table names instead of aliases has been [font="Arial Black"]partially [/font]deprecated.

    To satisfy the intellectually curious and prevent further phat phingering on my part, please see the following link...

    http://msdn.microsoft.com/en-us/library/ms143729.aspx

    There, you will find the following entry (I've pivoted it here for readability on this forum).

    Category: Transact-SQL

    Deprecated feature: Three-part and four-part column references in SELECT list

    Replacement: Two-part names is the standard-compliant behavior.

    Feature name: More than two-part column name

    Feature ID: 3

    I also take exception to table names being easier to read than aliases for several reasons:

    1. To stay consistent with what most accept as a best practice, you would also have to include the owner (in 2k and below) or the schema (in 2k5 and above) name which would further add to the clutter if the table names where already lengthy and of widely varying length. That would also mean that to include table names as part of the SELECT list, you would have to omit (according to the deprecation) the third part which would be the owner or schema name.

    2. Of course, as has already been pointed out, certain structures require table aliasing. Self joins, correlated sub-queries, derived tables (inline views), CTE's (also a form of inline view), and identically named tables with different owners/schemas certainly fall into that category. With that in mind, consistency of code is also considered by many to be a best practice (some will {thankfully} even alias a single table query in to make it easier on others if they need to add another table in a future modification).

    3. Personal preference on what is considered to be readable certainly comes into play. However, if someone in my shop prefers something like the following ...

    SELECT dbo.some_long_table_name.some_column1,

    dbo.some_longer_table_name.some_column5,

    dbo.some_other_table_name.some_column3,

    dbo.some_other_table_name.yada_yada + dbo.some_other_table_name.something + dbo.some_other_table_name.discount + dbo.some_other_table_name.something_else AS Amount,

    dbo.some_long_table_name.rhubarb,

    dbo.some_long_table_name.macarroni,

    dbo.some_longer_table_name.state,

    dbo.some_other_table_name.state_of_mind,

    dbo.some_long_table_name.state_of_affairs,

    dbo.some_longer_table_name.furlongs_per_fortnight

    dbo.some_other_table_name.total_cost,

    dbo.some_other_table_name.hammer_for,

    dbo.some_other_table_name.more_rhubarb,

    dbo.some_long_table_name.do_dahs,

    dbo.some_longer_table_name.widget_ratio,

    dbo.some_other_table_name.sawdust_pump

    FROM dbo.some_long_table_name

    INNER JOIN dbo.some_longer_table_name

    ON dbo.some_long_table_name.some_long_column_name = dbo.some_longer_table_name.some_long_column_name

    INNER JOIN dbo.some_other_table_name

    ON dbo.some_long_table_name.short_column1 = dbo.some_other_table_name.short_column1

    AND dbo.some_long_table_name.short_column2 = dbo.some_other_table_name.short_column2

    AND (dbo.some_long_table_name.short_column3 = dbo.some_other_table_name.short_column3 OR @parameter IS NULL)

    AND dbo.some_long_table_name.short_column4 = dbo.some_other_table_name.short_column4

    … over something like the following …

    SELECT cust.some_column1,

    invhdr.some_column5,

    invdtl.some_column3,

    invdtl.yada_yada + invdtl.something + invdtl.discount + invdtl.something_else AS Amount,

    cust.rhubarb,

    cust.macarroni,

    invhdr.state,

    invdtl.state_of_mind,

    cust.state_of_affairs,

    invhdr.furlongs_per_fortnight

    invdtl.total_cost,

    invdtl.hammer_for,

    invdtl.more_rhubarb,

    cust.do_dahs,

    invhdr.widget_ratio,

    dbo.some_other_table_name.sawdust_pump

    FROM dbo.some_long_table_name AS cust

    INNER JOIN dbo.some_longer_table_name AS invhdr

    ON cust.some_long_column_name = invhdr.some_long_column_name

    INNER JOIN dbo.some_other_table_name AS invdtl

    ON cust.short_column1 = invdtl.short_column1

    AND cust.short_column2 = invdtl.short_column2

    AND (cust.short_column3 = invdtl.short_column3 OR @parameter IS NULL)

    AND cust.short_column4 = invdtl.short_column4

    … some serious re-educating as to the high value of properly aliased tables will commence in very short order... even if they don't like abbreviations. 😉 Not only are the aliases shorter "words" to read, they reduce the clutter AND they add meaning especially when the tables names may not (usually due to poor naming conventions).

    4. Even when tables are well named, table aliases can lend to readability by providing meaning that wouldn't ordinarily be there especially on certain self joins. For example...

    SELECT inv.IvoiceID,

    inv.Amount,

    adj1.Amount AS Refund,

    adj2.Amount AS Discount

    FROM dbo.InvoiceDetail inv

    LEFT JOIN dbo.Adjustments adj1 ON inv.InvoiceID = adj1.InvoiceID AND adj1.AdjustmentType = 1

    LEFT JOIN dbo.Adjustments adj2 ON inv.InvoiceID = adj2.InvoiceID AND adj2.AdjustmentType = 2

    ... makes for some pretty well formed code but the aliases could use some tweaking to make the ON clauses easier to understand without having to look back at the tables themselves...

    SELECT inv.IvoiceID,

    inv.Amount,

    adjref.Amount AS Refund,

    adjdisc.Amount AS Discount

    FROM dbo.InvoiceDetail inv

    LEFT JOIN dbo.Adjustments adjref ON inv.InvoiceID = adjref.InvoiceID AND adjref.AdjustmentType = 1

    LEFT JOIN dbo.Adjustments adjdisc ON inv.InvoiceID = adjdisc.InvoiceID AND adjdisc.AdjustmentType = 2

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • sql.varun (10/7/2009)


    Good Article,

    I have a question on Joins.Can anyone help me.

    When I run below query it is running for 40 mins as it is using Table Spool.

    Query:

    select a.*,b.*

    from table a

    left join table b

    on a.id = b.id

    and a.date1 between b.date1 and b.date2

    or b.date2 is null

    Modifing this query is running in 5 mins.

    select a.*,b.*

    from table a

    left join table b

    on a.id = b.id

    and (a.date1 between b.date1 and b.date2 or b.date2 is null)

    But I am not sure whether both queries results same output.

    Can you explain how does it works while we use multiple join conditions using venn diagrams.

    (considering order of precedence)

    Thanks,

    Varun.

    Convert both queries to create a Temp table as a result set using SELECT/INTO and then do a full outer join to compare the results.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 31 through 45 (of 98 total)

You must be logged in to reply to this topic. Login to reply