The Empty Set vs No Result

  • So, I've got somewhat of an academic question, but one that might have some practical implications... depending on the answer.

    So suppose I've got the following table...

    CREATE TABLE Foo

    (

    FooId INT PRIMARY KEY,

    Bar VARCHAR(50),

    UserId INT

    )

    And another table like this...

    CREATE TABLE Users

    (

    UserId INT PRIMARY KEY,

    UserName VARCHAR(50)

    )

    Now suppose, I've got a stored procedure that contains a simple query. From a practical standpoint, is there any difference between

    DECLARE @userid INT;

    SELECT @userid = UserId FROM users WHERE userName = 'myUserName';

    IF (@UserId Is Not Null)

    BEGIN

    SELECT Bar

    FROM Foo

    WHERE UserId = @userid

    END

    ... and ...

    SELECT Bar

    FROM Foo F

    INNER JOIN Users U

    ON F.UserId = U.UserId

    WHERE U.userName = 'myUserName';

    From a purely academic standpoint, there is a big difference. As the first one might not return any result, but the second one will always return a result. That result just might be an empty set. That I get.

    But from a practical standpoint, is there any difference? In performance? To ADO.NET?

    I know what I would expect, but I'm curios what others think or have seen.

    Thanks,
    MKE Data Guy

  • This reminds me of a situation I encountered where a stored procedure was querying an Analysis Services cube, storing rows in a temp table and then returning results to Reporting Services. Sometimes (on certain dates / regions) the cube would not return any results. Worse, when it happens it doesn't even return the column headers, so the insert into the temp table doesn't then insert a null record or nothing at all, it just has an error.

    In other cases, not all measures are returned, and rather than the empty measure column returned as NULL, they are just left out of the result set completely so you have a column mismatch e.g. the insert is expecting 10 columns but sometimes there's only 9.

    In some database development I've seen the use of Return Values in stored procs in communicate success or failure back to the front end code. Better is to use TRY/CATCH and RAISERROR which is the fix I used on the previous Analysis Services example. Specifically I didn't raise errors because often that has its own problems, I just used try to test the returned rows and catch to insert either null row or limited columns as required.

    I guess some sort of message always needs to be passed back to the calling code, and whether you are relying on the database provider/connection handler to do that implicitly or you handle it yourself, not returning anything at all is going to result in a time-out at best and an endless wait at worst, as opposed to an empty set which is an actual a "thing" that you can do something with.

    Like your question though, that's an academic answer because mostly a non-result will in practice be handled for you and "something" will be returned.

  • So, I did a little playing around myself, and discovered that as far as performance goes there is zero difference. At least with the size data set that I was using.

    As for ADO.NET, if you use a DataAdapter to fill a DataSet, ADO.NET won't create anything if there's no result returned. But, if you return an empty set, ADO.NET is at least able to build a DataTable. Which means, at least in theory, you won't have to check if a table was created, only that it has rows... A little less work on the .NET side.

    I really like the idea that something is always returned. Even if it is just an empty set. Consistency is so nice. 😀 But really, there's not a whole lot of difference either way.

    Thanks,
    MKE Data Guy

  • I'm glad you did the testing, and I agree with your conclusion. Testing for rows will be way more consistent and robust than testing for nothingness.

    Sounds almost existential

  • Knowledge Draftsman (5/15/2013)


    So, I did a little playing around myself, and discovered that as far as performance goes there is zero difference. At least with the size data set that I was using.

    How many rows was that? I ask because I can see the first query taking twice as long if something is found.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff,

    So, it was a fairly small dataset, one table had a couple hundred rows. The other had 3 rows. And, I can certainly see that as the row count got bigger that you might see a difference appear.

    When, I did my testing I started using SQL Server Profiler. Not seeing any difference there, I wrote a .NET program so I could count Ticks to see if I'd see any difference with that level of precision. I was expecting that because you're doing two queries in the first procedure, it would take longer. Maybe not twice as long, but longer still. I ran the procedures 10 to 15 times each. I knew that the first time the procedure would be compiled and an execution plan stored. So, I wanted to run several times to see what it would really be like. And like I said, there was zero difference... Not even one tick different.

    Thanks,
    MKE Data Guy

  • When writing a procedure I always assume that some of the data will be bad or non-existant. Thus, I almost always set up defaults so SOMETHING is returned to the client app. Something like this

    DECLARE UserID AS INT

    SELECT

    ROW_NUMBER() OVER (PARTITION BY UserID ORDER BY UserName) AS RowNum

    ,ISNULL(NULLIF(UserId,''),0 AS UserID

    ,ISNULL(NULLIF(UserName,''), as UserName

    FROM

    Users

    WHERE

    UserID = @user-id

    So if nothing is returned I still get at least one row although it may be empty:

    RowNum UserID UserName

    1 0 [blank]

     

  • Knowledge Draftsman (5/14/2013)


    DECLARE @userid INT;

    SELECT @userid = UserId FROM users WHERE userName = 'myUserName';

    IF (@UserId Is Not Null)

    BEGIN

    SELECT Bar

    FROM Foo

    WHERE UserId = @userid

    END

    Sounds like you already figured out for yourself that the set-based JOIN method is preferred over the conditional branch (and Jeff has pointed out the scaling issue in performing two queries when you could do just one), but there's another way to demonstrate that the IF method contains a code smell. That is that you could extend this construct ad absurdum: let's assume that apart from it's relation to Users, Foo also has a relation to Project, Inventory, MarketingCampaign, etc. As the relations extend, the more branches you would have to construct and the more individual queries you would have to run to prove existence before you actually retrieved the desired result.

    If the requirement was that the procedure should return nothing (as opposed to the empty set) if any of the dependencies didn't exist, you're probably correct that you'd need an IF construct; but you could construct your query with the INNER JOIN method, SELECT the results into a table variable and return the contents of that table variable only if it contains rows. In this way you'd leverage the more scalable query construct. Still, I'm with you that I'd prefer the procedure to produce either the appropriate contents or the empty set as this seems to provide the app layer with a more consistent result (the other way feels a bit like using 0 to represent a NULL value).

  • Thanks everybody!

    I actually saw the IF construct in several stored procedures that I was asked to review, and it made me wonder why somebody would do that. Cause, I'd probably just do the JOIN construct, and not even think to use the IF. I guess it stuck out to me 'cause it had a kind of funky smell. 🙂

    Anyway I'm going to suggest that we (the company I work for) change the IF construct to the JOIN construct. The .NET code is already checking for a null dataset, null table and empty rows collection. So, it shouldn't hurt anything on the .NET side. It'll just be making checks it doesn't need to anymore.

    Thanks,
    MKE Data Guy

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply