Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

The Empty Set vs No Result Expand / Collapse
Author
Message
Posted Tuesday, May 14, 2013 7:47 AM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Wednesday, September 10, 2014 5:48 AM
Points: 318, Visits: 107
So, I've got somewhat of an academic question, but one that might have some practical implications... depending on the answer.

So suppose I've got the following table...

CREATE TABLE Foo
(
FooId INT PRIMARY KEY,
Bar VARCHAR(50),
UserId INT
)

And another table like this...

CREATE TABLE Users
(
UserId INT PRIMARY KEY,
UserName VARCHAR(50)
)

Now suppose, I've got a stored procedure that contains a simple query. From a practical standpoint, is there any difference between

DECLARE @UserId INT;
SELECT @UserId = UserId FROM users WHERE userName = 'myUserName';

IF (@UserId Is Not Null)
BEGIN
SELECT Bar
FROM Foo
WHERE UserId = @UserId
END

... and ...

SELECT Bar
FROM Foo F
INNER JOIN Users U
ON F.UserId = U.UserId
WHERE U.userName = 'myUserName';

From a purely academic standpoint, there is a big difference. As the first one might not return any result, but the second one will always return a result. That result just might be an empty set. That I get.

But from a practical standpoint, is there any difference? In performance? To ADO.NET?

I know what I would expect, but I'm curios what others think or have seen.


Thanks,
MKE Data Guy
Post #1452582
Posted Wednesday, May 15, 2013 12:03 AM


SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Thursday, December 18, 2014 4:24 AM
Points: 495, Visits: 895
This reminds me of a situation I encountered where a stored procedure was querying an Analysis Services cube, storing rows in a temp table and then returning results to Reporting Services. Sometimes (on certain dates / regions) the cube would not return any results. Worse, when it happens it doesn't even return the column headers, so the insert into the temp table doesn't then insert a null record or nothing at all, it just has an error.

In other cases, not all measures are returned, and rather than the empty measure column returned as NULL, they are just left out of the result set completely so you have a column mismatch e.g. the insert is expecting 10 columns but sometimes there's only 9.

In some database development I've seen the use of Return Values in stored procs in communicate success or failure back to the front end code. Better is to use TRY/CATCH and RAISERROR which is the fix I used on the previous Analysis Services example. Specifically I didn't raise errors because often that has its own problems, I just used try to test the returned rows and catch to insert either null row or limited columns as required.

I guess some sort of message always needs to be passed back to the calling code, and whether you are relying on the database provider/connection handler to do that implicitly or you handle it yourself, not returning anything at all is going to result in a time-out at best and an endless wait at worst, as opposed to an empty set which is an actual a "thing" that you can do something with.

Like your question though, that's an academic answer because mostly a non-result will in practice be handled for you and "something" will be returned.
Post #1452928
Posted Wednesday, May 15, 2013 6:11 AM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Wednesday, September 10, 2014 5:48 AM
Points: 318, Visits: 107
So, I did a little playing around myself, and discovered that as far as performance goes there is zero difference. At least with the size data set that I was using.

As for ADO.NET, if you use a DataAdapter to fill a DataSet, ADO.NET won't create anything if there's no result returned. But, if you return an empty set, ADO.NET is at least able to build a DataTable. Which means, at least in theory, you won't have to check if a table was created, only that it has rows... A little less work on the .NET side.

I really like the idea that something is always returned. Even if it is just an empty set. Consistency is so nice. But really, there's not a whole lot of difference either way.


Thanks,
MKE Data Guy
Post #1453037
Posted Wednesday, May 15, 2013 5:59 PM


SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Thursday, December 18, 2014 4:24 AM
Points: 495, Visits: 895
I'm glad you did the testing, and I agree with your conclusion. Testing for rows will be way more consistent and robust than testing for nothingness.

Sounds almost existential
Post #1453295
Posted Wednesday, May 15, 2013 8:55 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 5:57 PM
Points: 35,777, Visits: 32,451
Knowledge Draftsman (5/15/2013)
So, I did a little playing around myself, and discovered that as far as performance goes there is zero difference. At least with the size data set that I was using.


How many rows was that? I ask because I can see the first query taking twice as long if something is found.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1453309
Posted Thursday, May 16, 2013 6:16 AM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Wednesday, September 10, 2014 5:48 AM
Points: 318, Visits: 107
Jeff,

So, it was a fairly small dataset, one table had a couple hundred rows. The other had 3 rows. And, I can certainly see that as the row count got bigger that you might see a difference appear.

When, I did my testing I started using SQL Server Profiler. Not seeing any difference there, I wrote a .NET program so I could count Ticks to see if I'd see any difference with that level of precision. I was expecting that because you're doing two queries in the first procedure, it would take longer. Maybe not twice as long, but longer still. I ran the procedures 10 to 15 times each. I knew that the first time the procedure would be compiled and an execution plan stored. So, I wanted to run several times to see what it would really be like. And like I said, there was zero difference... Not even one tick different.


Thanks,
MKE Data Guy
Post #1453468
Posted Friday, May 17, 2013 7:48 AM
SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Sunday, September 29, 2013 1:24 AM
Points: 429, Visits: 1,721
When writing a procedure I always assume that some of the data will be bad or non-existant. Thus, I almost always set up defaults so SOMETHING is returned to the client app. Something like this


DECLARE UserID AS INT

SELECT
ROW_NUMBER() OVER (PARTITION BY UserID ORDER BY UserName) AS RowNum
,ISNULL(NULLIF(UserId,''),0 AS UserID
,ISNULL(NULLIF(UserName,''), as UserName
FROM
Users
WHERE
UserID = @UserID


So if nothing is returned I still get at least one row although it may be empty:


RowNum UserID UserName
1 0 [blank]


 
Post #1454012
Posted Friday, May 17, 2013 8:06 AM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Today @ 3:34 PM
Points: 271, Visits: 1,717
Knowledge Draftsman (5/14/2013)

DECLARE @UserId INT;
SELECT @UserId = UserId FROM users WHERE userName = 'myUserName';

IF (@UserId Is Not Null)
BEGIN
SELECT Bar
FROM Foo
WHERE UserId = @UserId
END



Sounds like you already figured out for yourself that the set-based JOIN method is preferred over the conditional branch (and Jeff has pointed out the scaling issue in performing two queries when you could do just one), but there's another way to demonstrate that the IF method contains a code smell. That is that you could extend this construct ad absurdum: let's assume that apart from it's relation to Users, Foo also has a relation to Project, Inventory, MarketingCampaign, etc. As the relations extend, the more branches you would have to construct and the more individual queries you would have to run to prove existence before you actually retrieved the desired result.

If the requirement was that the procedure should return nothing (as opposed to the empty set) if any of the dependencies didn't exist, you're probably correct that you'd need an IF construct; but you could construct your query with the INNER JOIN method, SELECT the results into a table variable and return the contents of that table variable only if it contains rows. In this way you'd leverage the more scalable query construct. Still, I'm with you that I'd prefer the procedure to produce either the appropriate contents or the empty set as this seems to provide the app layer with a more consistent result (the other way feels a bit like using 0 to represent a NULL value).
Post #1454024
Posted Friday, May 17, 2013 8:29 AM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Wednesday, September 10, 2014 5:48 AM
Points: 318, Visits: 107
Thanks everybody!

I actually saw the IF construct in several stored procedures that I was asked to review, and it made me wonder why somebody would do that. Cause, I'd probably just do the JOIN construct, and not even think to use the IF. I guess it stuck out to me 'cause it had a kind of funky smell.

Anyway I'm going to suggest that we (the company I work for) change the IF construct to the JOIN construct. The .NET code is already checking for a null dataset, null table and empty rows collection. So, it shouldn't hurt anything on the .NET side. It'll just be making checks it doesn't need to anymore.


Thanks,
MKE Data Guy
Post #1454038
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse