Unusual use of the WHERE clause

  • I am working an exercise problem where the solution provided has a WHERE clause used in a way I have not seen before.  The question asks,  "to show the name and numbers of all salesmen who had more than one customer."

    Although the SQL online documentation shows an example of using WHERE with a comparison operator, I don't understand how it  is being used below.  I understand the notation "columnName" + comparison operator + expression, but I can't make sense of how it is used below.  Let's say the subquery returns a count of 3.  The WHERE clause would now evaluate to 1 < 3.  How would you read this?

    SELECTsalesman_id,name 
    FROMsalesman a
    WHERE1 < (SELECT COUNT(*) FROM customer WHERE salesman_id=a.salesman_id);

    Additionally in SSMS, I can right click on a table and select Script Table as, then select CREATE or INSERT INTO.  Is there a way to do something similar where I can quickly generate a script that has INSERT INTO which also lists the values in the table?  By doing this, anybody can easily recreate a table.

    Edit:  Suppose I wanted to see the output after each individual step/execution.  Can the debugger show me each individual row that is returned until I step through the entire query?  I started reading about execution plans, but I don't think that will give me what I am looking for.  Any suggestions?

     

  • Select Columns from Table where 1 is smaller than the COUNT on the subset of resultsets from the subquery in the where clause.

    Or in other words: Show only results where you have more than one resultset available.

  • DinoRS, if I am reading that query correctly, isn't it saying "show only results where you have less than 1 result set available" not "more"?

    The WHERE clause is saying "WHERE 1 < (a set of data)".  The subquery is likely going to return more than 1 value UNLESS you only have 1 salesman_id value.

    Michael.Leach2015 - think of that last part of the where as a subquery returning a single INT column which is a count of customers grouped by the salesman_id but only where the salesman_id exists in the salesman table.  The subquery will return 1 integer per salesman_id that exists in both tables.

    That query though I think will always return nothing won't it?  I am trying to think of a case where a COUNT(*) on customer will return 0 when comparing a salesman_id to the salesman table.

     

    As for your second question about using the GUI to re-generate a table, I am not aware of any built-in way to do this, but I wouldn't want to do it with any large tables anyways.  If the script gets too long, you will start seeing Out Of Memory exceptions thrown in SSMS.  These are SSMS errors though, not SQL Server errors so you don't need to worry about the SQL instance crashing or anything like that.

    But if you are looking for a quick and easy way to create a duplicate of a table, the EASIEST (although not recommended) way would be something like:

    SELECT *
    INTO table1_bak
    FROM table1;

    The reason I don't like this method is the datatypes are being determined entirely by the SQL engine this way.  My preferred method is to use CREATE TABLE to build up the new table, then select into the new table.  Plus, SELECT * is good to avoid where possible.

    That being said, I will use the above type of query if I am needing to fix some bad data in the database and want to have a backup of the original table prior to me making my changes JUST in case I break something, but even then I prefer to create the new table and INSERT INTO it.  On a test/dev system I may use some bad habits (like SELECT * INTO), but I wouldn't run something like that on live.

    I also don't use the GUI for much apart from intellisense.  I tend to avoid using a lot of the GUI features as I find them to be troublesome at times.  The scripting table as (or procedure or trigger or whatever) usually works fine, but I find at times the right-click menus in SSMS take forever to populate and in the time it takes for the menu to pop up, I could have just done a TSQL query to get the same result.  I have one system that if I right click on a trigger, it takes 3-5 minutes before the menu pops up and all I can do is sit and wait for it to pop up as it freezes SSMS.  I have another system where if I connect to it in the Object Explorer, it will show the name and take 2 minutes before it allows me to expand and after expanding, takes another 2 minutes before the agent option will show up and expanding that?  you guessed it... another 2 minutes.  Mind you, if I match the SSMS version to the SQL Instance version, this latency goes away, but then I end up having 6 copies of SSMS running depending on what I am connecting to which also becomes a pain in the behind.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Mr. Brian Gale wrote:

    The WHERE clause is saying "WHERE 1 < (a set of data)".  The subquery is likely going to return more than 1 value UNLESS you only have 1 salesman_id value.

    Michael.Leach2015 - think of that last part of the where as a subquery returning a single INT column which is a count of customers grouped by the salesman_id but only where the salesman_id exists in the salesman table.  The subquery will return 1 integer per salesman_id that exists in both tables.

    That query though I think will always return nothing won't it?  I am trying to think of a case where a COUNT(*) on customer will return 0 when comparing a salesman_id to the salesman table.

    Regarding the part where you said WHERE 1 < (a set of data) how would you read that?  I think that is the part that I am not getting.  1 what?  1 column?  1 row?  Something else?

    There are more than one salesman IDs.  Also, some of those repeat when the tables are linked because a salesman can have more than one customer.  The attached pic shows what these two tables (rows 1-4 and 5-9) look like if they were joined.  The other pic shows the result of the query I am trying to understand.

    Mr. Brian Gale wrote:

    As for your second question about using the GUI to re-generate a table, I am not aware of any built-in way to do this, but I wouldn't want to do it with any large tables anyways.  If the script gets too long, you will start seeing Out Of Memory exceptions thrown in SSMS.  These are SSMS errors though, not SQL Server errors so you don't need to worry about the SQL instance crashing or anything like that.

    I was asking so that I can include the script in future posts, so that anybody reading my posts can quickly recreate the tables with the exact data.  If I wanted to show the actual data values I am using for any given example, what is the best way to convey that when I post something?  Should I include a screenshot, upload an Excel file with individual values in each cell?  Is there another method people prefer?

     

    Attachments:
    You must be logged in to view attached files.
  • Also, my brain was thinking about that wrong.  What DinoRS put is correct in that "Or in other words: Show only results where you have more than one resultset available.".  That was a case of me answering before having coffee.

    The WHERE is doing a literal comparison of 1 to a set of data.  the "1" is the literal numerical value of 1.  So looking at the results of your INNER JOIN, you can see that salesman_ID 5001 has 2 rows and salesman_id 5002 has 2 rows, the rest have 1.  So you are checking for WHERE 1 < (set of values).  If salesman_id = 5001 or 5002, then it comes back as WHERE 1 < 2, but when it checks salesman_id 5003, that comes back as WHERE 1 < 1 which is false so it gets excluded.

     

    For posting on here, my approach is to manually generate the values in SSMS.  Generally, I can simplify my tables as the problem is only on a handful of columns and then script it out manually for a small sample set isn't bad.  If you have so much data that you can't type it out manually for the forum to help, chances are nobody on the forum will be willing to run the script to create the data and try to help.  USUALLY you can get a good idea of the sample set of data with about 3-5 rows from a table and making fake data makes it even easier.  In your example, you would only need to include 2 columns from Salesman and 1 column from customer to create a workable sample data set.  Something like:

    DECLARE @salesman TABLE (salesman_id, name)
    DECLARE @customer TABLE (salesman_id)
    INSERT INTO @salesman
    VALUES (1, 'A'),
    (2, 'B'),
    (3, 'C')
    INSERT INTO @customer
    VALUES (1),
    (2),
    (3),
    (1)

    That took me maybe 2 minutes to write as I checked my email in the  middle of it and that gives us a sample of the data that is good enough to work with.  The query would then return 1 row with an ID of 1 and a name of 'A'.  If I delete the 4th row from @customer (which based on the table design results in deleting rows 1 and 4 as there is no way to distinguish between the 2 different values for 1), we would get no results.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Mr. Brian Gale wrote:

    Also, my brain was thinking about that wrong.  What DinoRS put is correct in that "Or in other words: Show only results where you have more than one resultset available.".  That was a case of me answering before having coffee.

    The WHERE is doing a literal comparison of 1 to a set of data.  the "1" is the literal numerical value of 1.  So looking at the results of your INNER JOIN, you can see that salesman_ID 5001 has 2 rows and salesman_id 5002 has 2 rows, the rest have 1.  So you are checking for WHERE 1 < (set of values).  If salesman_id = 5001 or 5002, then it comes back as WHERE 1 < 2, but when it checks salesman_id 5003, that comes back as WHERE 1 < 1 which is false so it gets excluded.

    For posting on here, my approach is to manually generate the values in SSMS.  Generally, I can simplify my tables as the problem is only on a handful of columns and then script it out manually for a small sample set isn't bad.  If you have so much data that you can't type it out manually for the forum to help, chances are nobody on the forum will be willing to run the script to create the data and try to help.  USUALLY you can get a good idea of the sample set of data with about 3-5 rows from a table and making fake data makes it even easier.  In your example, you would only need to include 2 columns from Salesman and 1 column from customer to create a workable sample data set.  Something like:

    DECLARE @salesman TABLE (salesman_id, name)
    DECLARE @customer TABLE (salesman_id)
    INSERT INTO @salesman
    VALUES (1, 'A'),
    (2, 'B'),
    (3, 'C')
    INSERT INTO @customer
    VALUES (1),
    (2),
    (3),
    (1)

    That took me maybe 2 minutes to write as I checked my email in the  middle of it and that gives us a sample of the data that is good enough to work with.  The query would then return 1 row with an ID of 1 and a name of 'A'.  If I delete the 4th row from @customer (which based on the table design results in deleting rows 1 and 4 as there is no way to distinguish between the 2 different values for 1), we would get no results.

    I think I see it now.  Thank you for reply.

  • .

  • The query is doing following:

    • Get counts for every salesman_id from customer table (SQL Server automatically GROUP BY by it as you are not returning any columns, so it takes ALL records that matches WHERE clause)
    • Compare if it is greater than 1 and if it is then return the record

    For example, if you do:

    SELECT * FROM salesman WHERE 1 < 3

    It would return all records from salesman as 1 is always less than 3. In your query it is not always true, as COUNT can be different for different salesmen.

    I agree that it is VERY unusual syntax and there are a lot of different ways how you can re-write it. Here are few examples you that should return the same result:

    SELECT
    salesman_id
    ,name
    FROM salesman a
    JOIN (SELECT
    COUNT(*) qty
    ,salesman_id
    FROM customer
    GROUP BY salesman_id) c
    ON c.salesman_id = a.salesman_id
    WHERE 1 < c.qty;

    SELECT
    salesman_id
    ,name
    FROM salesman a
    CROSS APPLY (SELECT
    COUNT(*) qty
    FROM customer C
    WHERE c.salesman_id = a.salesman_id) c
    ON c.salesman_id = a.salesman_id
    WHERE 1 < c.qty;

    Does it make sense? You can also check the actual not filtered results (is it what you meant by "step by step"?):

    SELECT
    salesman_id
    ,name
    , c.qty
    FROM salesman a
    LEFT JOIN (SELECT
    COUNT(*) qty
    ,salesman_id
    FROM customer
    GROUP BY salesman_id) c
    ON c.salesman_id = a.salesman_id

     

    For the 2nd question about scripting table with data, you need to do following:

    • Right click on database Tasks-->Generate Scripts
    • Choose Objects --> Select specific database objects
    • Pick your table and hit next
    • Click Advanced
    • Pick General --> Types of data to script --> Schema and data
    • Next Next Next....
  • wow, kinda amazed I didn't overthink this one 😉 it might look unusual but not more than

    • WHERE 1 = 2
    •  HAVING 1 < 2
    •  GROUP BY 1, 2
  • I've seen this "constant" "equality test / compare" "expression" used in other languages, and its a thing due to (in some languages) the inherent ambiguity between "=" the test and "=" the assignment. If you have a constant on the left of the comparison expression, it cannot be assigned to and its a bit of a safety thing.

    As an example (somewhat similar), there was a kernel patch for linux that had the usual buncha crazy c code but one of the expressions that was hard to catch looked like this (paraphrasing from memory)

    if (uuid = 0)
    {
    /* do something */
    }

    which essentially said if the user had root privileges then do the "do something", HOWEVER, what really happened was uuid was ASSIGNED the value 0 and thus privileges were escalated ALWAYS. In this case, c's comparison for equality is always "==" instead of "=" but non the less, the above is valid syntax as the result of an assignment is the value assigned (I think, could be wrong).

    I know the code probably isn't right, just posting from memory.

    So as a result of this, some coders like to put the constant part of the comparison to the left to avoid an accidental assignment.

     

     

  • drop table if exists #salesman;
    go
    create table #salesman(salesman_id int, salesman varchar(20));
    go

    insert #salesman(salesman_id, salesman) values
    (1, 'Tedd'),
    (2, 'Fred'),
    (3, 'Ed');

    drop table if exists #customer;
    go
    create table #customer(customer_id int, salesman_id int, customer varchar(20));
    go

    insert #customer(customer_id, salesman_id, customer) values
    (1, 1, 'Jay'),
    (2, 2, 'Ray'),
    (3, 2, 'Fay'),
    (4, 3, 'May'),
    (5, 3, 'Day'),
    (6, 3, 'Bay');


    SELECTsalesman_id,salesman
    FROM#salesman a
    WHERE1 < (SELECT COUNT(*) FROM #customer WHERE salesman_id=a.salesman_id);

    /* this is the same query */
    select s.salesman_id, s.salesman
    from
    #salesman s
    join
    #customer c on s.salesman_id=c.salesman_id
    group by s.salesman_id, s.salesman
    having count(*)>1;

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • Another way

    select s.salesman_id, s.salesman
    from
    #salesman s
    cross apply
    (SELECT COUNT(*) cnt FROM #customer c WHERE c.salesman_id=s.salesman_id) xa
    where
    xa.cnt>1;

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • michael.leach2015 wrote:

    I am working an exercise problem where the solution provided has a WHERE clause used in a way I have not seen before.  The question asks,  "to show the name and numbers of all salesmen who had more than one customer."

    Although the SQL online documentation shows an example of using WHERE with a comparison operator, I don't understand how it  is being used below.  I understand the notation "columnName" + comparison operator + expression, but I can't make sense of how it is used below.  Let's say the subquery returns a count of 3.  The WHERE clause would now evaluate to 1 < 3.  How would you read this?

    SELECTsalesman_id,name 
    FROMsalesman a
    WHERE1 < (SELECT COUNT(*) FROM customer WHERE salesman_id=a.salesman_id);

    DinoRS hit the nail on the head for this one.

    DinoRS wrote:

    Or in other words: Show only results where you have more than one resultset available.

    The only things I can add to that is that when more than one table is involved, all columns names should be properly prefix with the related table alias just to avoid future confusion by readers and, if changes are made, by the optimizer itself.

    The other thing I'd do is to (always) use the two part naming convention for objects in queries.  It doesn't take much, adds "bullet proofing" if someone decides to add other schemas with identical table names, and helps the code perform just a little better because it doesn't have go though the throws of checking if the person or thing executing the code has its own schema and, if not, checking to see if there's something in the master database and, if not, finally checking if the object is in the dbo schema.

    In other words, I'd rewrite the code as follows....

     SELECT a.salesman_id
    ,a.name
    FROM dbo.salesman a
    WHERE 1 < (SELECT COUNT(*) FROM dbo.customer c WHERE c.salesman_id = a.salesman_id)
    ;
    michael.leach2015 wrote:

    Additionally in SSMS, I can right click on a table and select Script Table as, then select CREATE or INSERT INTO.  Is there a way to do something similar where I can quickly generate a script that has INSERT INTO which also lists the values in the table?  By doing this, anybody can easily recreate a table.

    The method you described is how most people do such things to experiment with provided of course that they do it in a different database or change the schema or object name and any constraint names (which must be unique in a given database).

    If you want data with that, you can right click on the database, select [tasks], and drill down to [generate scripts] where it will allow to to generate both the code to create the table and a couple of related things and the data.  I never generate the data that way because it generates it in a very RBAR fashion where every row is a full-blown INSERT/VALUES statement that takes a fair bit to run compared to fuller sets of data.

    Of course, you could always do a SELECT/INTO in a couple of different forms depending on your needs.  Of course, that won't create any of the indexes that are on the table.

    michael.leach2015 wrote:

    Edit:  Suppose I wanted to see the output after each individual step/execution.  Can the debugger show me each individual row that is returned until I step through the entire query?  I started reading about execution plans, but I don't think that will give me what I am looking for.  Any suggestions?

    I can't actually help there.  I never use the debugger.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • michael.leach2015 wrote:

    Although the SQL online documentation shows an example of using WHERE with a comparison operator, I don't understand how it  is being used below.  I understand the notation "columnName" + comparison operator + expression, but I can't make sense of how it is used below.  Let's say the subquery returns a count of 3.  The WHERE clause would now evaluate to 1 < 3.  How would you read this?

    SELECT salesman_id,name 
    FROM salesman a
    WHERE 1 < (SELECT COUNT(*) FROM customer WHERE salesman_id=a.salesman_id);

    1 < 3 would evaluate to true, so the row would be returned.

  • Are we being confused by the order of the operation?

    WHERE 1 < @x

    is functionally the same as

    WHERE @x > 1

    The second form is more intuitive to me, but YMMV.

     

     

    • This reply was modified 3 years, 9 months ago by  GaryV.

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply