Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««2829303132»»»

There Must Be 15 Ways To Lose Your Cursors… Part 2: Just Put It in a S Expand / Collapse
Author
Message
Posted Thursday, July 22, 2010 1:44 PM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Monday, April 14, 2014 4:18 PM
Points: 1,276, Visits: 1,132
Tom.Thomson (7/22/2010)
peter-757102 (7/22/2010)
I find this sort of thing many more times disturbing/dangerous then using a quirky update deliberatly and with full knowledge and documentation. Why? Because I suspect hardly anyone will know they have to protect their expressions to such a silly extend and the world therefore must be full of non-working code. In fact I expect that ** nearly all ** string processing code in use today to be sensitive to this problem. I knew expression order in the where clause sometimes must be enforced, but never that this problem extended beyond the where clause and into the select clause. Nor have I ever read of this problem anywhere!

I find this a total mess and hope it is just an SQL Server 2008 implementation bug that can be fixed and not standard behavior. Because if it is standard, I think the people responsible for setting the standard are in dire need of some practical re-education. And I express myself quite mildly here.


Are you sure this bug (if it's not a bug I will give up using T-SQL as it's not a useful language) is restricted to string processing code? Why not simple arithmetic too, so that I have to write
select case when T1.y <> 0.0 
then T1.x/T1.y
else null
end as ratio
from T1
where IsNull(T1.y,0.0) <> 0.0

or something equally ridiculous?

There is an interesting statement in BoL at http://msdn.microsoft.com/en-us/library/ms190623.aspx
BoL
A SELECT statement defines only the following:

* The format of the result set. This is specified mostly in the select list. However, other clauses such as ORDER BY and GROUP BY also affect the final form of the result set.

* The tables that contain the source data. This is specified in the FROM clause.

* How the tables are logically related for the purposes of the SELECT statement. This is defined in the join specifications, which may appear in the WHERE clause or in an ON clause following FROM.

* The conditions that the rows in the source tables must satisfy to qualify for the SELECT statement. These are specified in the WHERE and HAVING clauses.

(similar statements are included in BoL for SQLS 2008, SQLS2005, SQLS 2000)

It's noteworthy that the select statement does not, according to this BoL entry, specify how the values in the result set relate to the values in the tables from which the results are derived! If that were true, the language would serve no useful function - but evidently someone at MS believes it to be true.

Perhaps the bug you describe is related to this bizarre belief ("qualify for the SELECT statement" is open to all sorts of interpretation, isn't it?). But I hope it's just an accidental bug.


Sorry to hear you're leaving T-SQL Unfortunately you probably won't find much love in any dialect of SQL. The problem (?) is that the SQL standard doesn't define physical processing steps. All that is left to the vendors. The standard simply says here's the keywords, here's the definition of the keywords, here's what you put in and here's what you get out. It defines logical processing order, but not the physical steps required to get from A to B. Every vendor uses their own physical implementation optimizations to get results back quickly, and that often means minimizing scans.

Let's walk through this briefly and see what steps would be required to implement the original query to perform it in logical order steps:

select
left( 'abc', T1.v )
from
T1
where
T1.v >= 0
;

Now let's say we have 10 million rows in T1 and 5 million of them match the where clause (T1.v >= 0). Here's what we need to do:

1. Scan the 10 million row table (look at every single row) and compare to 0.
2. Dump the 5 million rows that match the WHERE clause into a temp table or other intermediate storage.
3. Scan the 5 million rows that match the WHERE clause and perform the projection against them (left( 'abc', T1.v )).

We just scanned 15 million rows and stored 5 million rows in temporary storage to fulfill this query. Now let's look at one way we can optimize this:

1. Scan the 10 million row table.
2. Execute the WHERE clause and the projection at the same time on each row since both clauses are using the same columns.

Obviously the so-called "buggy" execution plan requires the developer to be more careful, but gracefully handling invalid function parameter and divide-by-zero errors are not new problems in programming and shouldn't come as a surprise to most devs. Obviously if you wanted to force the physical order of execution you can always forcefully serialize the intermediate results yourself:

CREATE TABLE #t
(
v INT
);

INSERT INTO #t (v)
SELECT T1.v
FROM T1
WHERE T1.v >= 0;

SELECT LEFT( 'abc', #t.v)
FROM #t;

Mike C
Post #957537
Posted Thursday, July 22, 2010 3:56 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 4:00 PM
Points: 35,967, Visits: 30,258
Mike C (7/22/2010)
Obviously the so-called "buggy" execution plan requires the developer to be more careful, but gracefully handling invalid function parameter and divide-by-zero errors are not new problems in programming and shouldn't come as a surprise to most devs.


Hmmm... sounds just like the rules for a quirky update.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

"Change is inevitable. Change for the better is not." -- 04 August 2013
(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #957633
Posted Thursday, July 22, 2010 3:57 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 4:00 PM
Points: 35,967, Visits: 30,258
peter-757102 (7/22/2010)
Jeff Moden (7/22/2010)
Peter,

I've not tried this particular example but I'm pretty sure that you don't actually need the index hint when you use an ORDER BY to get the correct order. The MAXDOP 1 is the important part.


Check the link Jeff, you see that it is required to make it function as expected as the data is retrieved from a non-clustered covering index instead of the clustered key. In itself not a problem, where it not that the concatenation is processed BEFORE the order by takes effect, resulting in just one tiny string instead of a concatenation of all strings.


Thanks, Peter. I'll check.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

"Change is inevitable. Change for the better is not." -- 04 August 2013
(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #957635
Posted Thursday, July 22, 2010 4:17 PM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Monday, April 14, 2014 4:18 PM
Points: 1,276, Visits: 1,132
Jeff Moden (7/22/2010)
Mike C (7/22/2010)
Obviously the so-called "buggy" execution plan requires the developer to be more careful, but gracefully handling invalid function parameter and divide-by-zero errors are not new problems in programming and shouldn't come as a surprise to most devs.


Hmmm... sounds just like the rules for a quirky update.


What's that? If you divide two numbers make sure the dividend is not zero? . Most devs learned that rule in grade school :)
Post #957648
Posted Thursday, July 22, 2010 4:38 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 4:00 PM
Points: 35,967, Visits: 30,258
Mike C (7/22/2010)
Jeff Moden (7/22/2010)
Mike C (7/22/2010)
Obviously the so-called "buggy" execution plan requires the developer to be more careful, but gracefully handling invalid function parameter and divide-by-zero errors are not new problems in programming and shouldn't come as a surprise to most devs.


Hmmm... sounds just like the rules for a quirky update.


What's that? If you divide two numbers make sure the dividend is not zero? . Most devs learned that rule in grade school :)


Nah... I was talking about the developer being careful.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

"Change is inevitable. Change for the better is not." -- 04 August 2013
(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #957657
Posted Thursday, July 22, 2010 4:58 PM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Monday, April 14, 2014 4:18 PM
Points: 1,276, Visits: 1,132
Jeff Moden (7/22/2010)
Mike C (7/22/2010)
Jeff Moden (7/22/2010)
Mike C (7/22/2010)
Obviously the so-called "buggy" execution plan requires the developer to be more careful, but gracefully handling invalid function parameter and divide-by-zero errors are not new problems in programming and shouldn't come as a surprise to most devs.


Hmmm... sounds just like the rules for a quirky update.


What's that? If you divide two numbers make sure the dividend is not zero? . Most devs learned that rule in grade school :)


Nah... I was talking about the developer being careful.


Ahhh, now I understand I suppose, like the quirky update, with the right combination of query hints, options, hardware, and indexes you could probably find a way to force the query optimizer to execute the WHERE clause first. At least until the optimizer comes up with a new plan. Knowing that the physical order of execution within a query is not guaranteed would make me hesitant to trust that solution though. I think I'd rather go with the slightly simpler solution, like this:

select
left( 'abc', NULLIF(T1.v, 0) )
from
T1
where
T1.v >= 0
;

Or this:

select T1.x / NULLIF(T1.y, 0) as ratio
from T1
where T1.y <> 0

But then again I tend to enjoy the simple things in life.
Post #957665
Posted Thursday, July 22, 2010 7:16 PM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: Today @ 8:37 AM
Points: 8,287, Visits: 8,738
Mike C (7/22/2010)
Sorry to hear you're leaving T-SQL Unfortunately you probably won't find much love in any dialect of SQL. The problem (?) is that the SQL standard doesn't define physical processing steps. All that is left to the vendors. The standard simply says here's the keywords, here's the definition of the keywords, here's what you put in and here's what you get out.

I think you are missing the point. The (silly) arithmetic example isn't what it's about, what it's about is the elimination of semantics from the T-SQL language. The BoL entry I quoted says quite clearly and plainly that the select statement + the state of the database when it is executed do NOT between them define what you get out. For example if I execute the following code
set transaction isolation level serializable
begin tran
create table #T (a int check (a=1) default (1), b int check (b=2) default(2))
insert #T (TABLOCKX) values (1,2)
select T.a+T.b as c from #T T
commit tran

I will get a result set of some rows with a single column, probably named c. It's not really clear that in the row(s) I get in that result set the single column c will have type int. It's far from clear that I'll get just one row. in fact the BoL page says that the select statement doesn't specify that. It is explicitly stated in that BoL page that this select statement is NOT guaranteed to return me the value 3 in column of the single (if I do get just one) row, since there the select statement specifies no relation between the values in the source tables and the values in the result set, except perhaps as affected by where and having clauses, if that is conceivable (unlikely, except maybe for aggregates): certainly not by the select clause or its output list, and it's the output list that matters here.

When this kind of idiocy appears in BoL we all should be terrified: that may be the MS philopsophy behind which parts of the select statement's apparent semantics (enshrined in the ANSI/ISO SQL standards, I believe, but apparently not in T-SQL) which the optimser (for SQL, if not for T-SQL) is required not to destroy. Or it may be just that someone who hasn't a clue has been writing chunks of BoL - and from your comment on this issue (and on that raised by another which prompted me to comment) I think you've failed to understand all of (1) just what that quotation says and (2) what most of us believe the semantics of SQL is (per ANSI and per ISO, despite what MS sometimes asserts to the contrary on silly BoL pages like the one I quoted).

I don't much care about the "quirky update". If one is adequately careful it works, provided one is careful to include adequate sanity checks in the code to ensure that it still works even if MS does something completely insane with the optimiser (as, according to Pete-<numbers> - sorry , I forgot which numbers - they have, although this particular insanity doesn't affect quirky update any more than it affects ordinary selects). I can't see any problem in using it (I'm a pessimist: SQL is such a defective language and no-one is going to fix it so MS will leave this rather silly extension in, with restrictions of course, instead of implementing a proper reduction operator; neither MS nor the standards bodies care about extended relational algebra operators, particularly not about the functional ones; but they will never be able to afford to destroy quirky update until they introduce some of those operators, so we can carry on using it with care). Of course "adequately careful" is fairly extreme - just as it is for fully specified, fully supported, and not even deprecated MS stuff (I've been bitten, sorry to sound sour about MS consistency).


Tom
Post #957701
Posted Thursday, July 22, 2010 9:16 PM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Monday, April 14, 2014 4:18 PM
Points: 1,276, Visits: 1,132
Tom.Thomson (7/22/2010)
Mike C (7/22/2010)
Sorry to hear you're leaving T-SQL Unfortunately you probably won't find much love in any dialect of SQL. The problem (?) is that the SQL standard doesn't define physical processing steps. All that is left to the vendors. The standard simply says here's the keywords, here's the definition of the keywords, here's what you put in and here's what you get out.

I think you are missing the point. The (silly) arithmetic example isn't what it's about, what it's about is the elimination of semantics from the T-SQL language. The BoL entry I quoted says quite clearly and plainly that the select statement + the state of the database when it is executed do NOT between them define what you get out. For example if I execute the following code
set transaction isolation level serializable
begin tran
create table #T (a int check (a=1) default (1), b int check (b=2) default(2))
insert #T (TABLOCKX) values (1,2)
select T.a+T.b as c from #T T
commit tran

I will get a result set of some rows with a single column, probably named c. It's not really clear that in the row(s) I get in that result set the single column c will have type int. It's far from clear that I'll get just one row. in fact the BoL page says that the select statement doesn't specify that. It is explicitly stated in that BoL page that this select statement is NOT guaranteed to return me the value 3 in column of the single (if I do get just one) row, since there the select statement specifies no relation between the values in the source tables and the values in the result set, except perhaps as affected by where and having clauses, if that is conceivable (unlikely, except maybe for aggregates): certainly not by the select clause or its output list, and it's the output list that matters here.

When this kind of idiocy appears in BoL we all should be terrified: that may be the MS philopsophy behind which parts of the select statement's apparent semantics (enshrined in the ANSI/ISO SQL standards, I believe, but apparently not in T-SQL) which the optimser (for SQL, if not for T-SQL) is required not to destroy. Or it may be just that someone who hasn't a clue has been writing chunks of BoL - and from your comment on this issue (and on that raised by another which prompted me to comment) I think you've failed to understand all of (1) just what that quotation says and (2) what most of us believe the semantics of SQL is (per ANSI and per ISO, despite what MS sometimes asserts to the contrary on silly BoL pages like the one I quoted).

I don't much care about the "quirky update". If one is adequately careful it works, provided one is careful to include adequate sanity checks in the code to ensure that it still works even if MS does something completely insane with the optimiser (as, according to Pete-<numbers> - sorry , I forgot which numbers - they have, although this particular insanity doesn't affect quirky update any more than it affects ordinary selects). I can't see any problem in using it (I'm a pessimist: SQL is such a defective language and no-one is going to fix it so MS will leave this rather silly extension in, with restrictions of course, instead of implementing a proper reduction operator; neither MS nor the standards bodies care about extended relational algebra operators, particularly not about the functional ones; but they will never be able to afford to destroy quirky update until they introduce some of those operators, so we can carry on using it with care). Of course "adequately careful" is fairly extreme - just as it is for fully specified, fully supported, and not even deprecated MS stuff (I've been bitten, sorry to sound sour about MS consistency).


I agree, I am completely missing your point. It seems like you have more than one point to make in there, but I'm not getting any of them. Let me see if I can try to understand your points:

1. You're upset about code formatting by developers that makes the result of SQL unclear. Fortunately that's easily fixable, though it requires a commitment from the developers who write code. BTW, I believe the output from your code sample above will actually be an error message.

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

BEGIN TRANSACTION;

CREATE TABLE #T
(
a INTEGER DEFAULT (1),
b INTEGER DEFAULT (2),

CONSTRAINT Allow_Nothing_But_1_In_a CHECK (a = 1),
CONSTRAINT Allow_Nothing_But_2_In_b CHECK (b = 2)
);

INSERT INTO #T (a, b) WITH (TABLOCKX)
VALUES (1,2);

SELECT (T.a + T.b) AS c -- This query returns 1 column; an INTEGER + an INTEGER yields an INTEGER
FROM #T T;

COMMIT TRANSACTION;

2. You're upset about mistakes and problematic documentation. I often find mistakes in technical documentation -- spent nearly a week picking apart the SQLCMD utility documentation in BOL when 2005 was first released, for instance. Fortunately there are a couple of ways to constructively deal with that issue. http://connect.microsoft.com and the Feedback button on the BOL and MSDN pages come to mind. There's also the web forums, which MS' documentation team often frequents soliciting feedback.

3. You don't like SQL in general. That's perfectly understandable -- there are a lot of people who don't like SQL, and that's a personal choice. Of course every query language, programming language, scripting language, markup language, or any other form of computer-based language has idiosyncracies and issues. Whether it's data type casting (implicit, explicit), lack of features, etc., there will always be room for improvement. MS accepts feature requests on their Connect website also.
Post #957727
Posted Friday, July 23, 2010 3:36 AM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Wednesday, April 16, 2014 3:35 AM
Points: 319, Visits: 2,151
I very much like SQL even while over the time I encounter or read about quite a few quirks. The problem I have with the case I brought forward is that it complicates code even for basic simple functionality which is the opposite of what I like so much about SQL. The real life case was a bit more complex, but my example demonstrated the principle quite nicely.

In its heart SQL is of course a data retrieval language and it does that excelent IMHO, often far beyond my expectations. But we frequently go out of this boundary and include all sorts of processing beyond simple test and include processing in select clauses as well. It is here that quirks show up that can spoil the party if you rightfully desire always perfectly working code. It can get complex and ugly (unreadable) real fast if you got many conditions to be met.

The SQL principle of describing what you want and let the server figure out how to get those results to you is perfectly fine. But it could be so much better if those exceptions where handled more graceful and never spoiled the end result at all. With this I mean that an exception in a row that does not contribute to the end result must be kept hidden IMHO. This would boost the usefulness and robustness of the language many times over without complicating work for the query writer.

Anyway, its an interesting sub-discussion and a bit off-topic here ;). I really have to make a simple as possible, working example case to demonstrate the issue. The code in question had run for over a year without any issue and suddenly there it was, broken for no obvious reason. Lucky for me, it was generic functionality and I am used to encapsulate this sort of logic in functions. I was able to fix this in one spot instead of re-writing and complicating many separate queries.
Post #957842
Posted Friday, July 23, 2010 6:15 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 4:00 PM
Points: 35,967, Visits: 30,258
Mike C (7/22/2010)
Jeff Moden (7/22/2010)
Mike C (7/22/2010)
Jeff Moden (7/22/2010)
Mike C (7/22/2010)
Obviously the so-called "buggy" execution plan requires the developer to be more careful, but gracefully handling invalid function parameter and divide-by-zero errors are not new problems in programming and shouldn't come as a surprise to most devs.


Hmmm... sounds just like the rules for a quirky update.


What's that? If you divide two numbers make sure the dividend is not zero? . Most devs learned that rule in grade school :)


Nah... I was talking about the developer being careful.


Ahhh, now I understand I suppose, like the quirky update, with the right combination of query hints, options, hardware, and indexes you could probably find a way to force the query optimizer to execute the WHERE clause first. At least until the optimizer comes up with a new plan. Knowing that the physical order of execution within a query is not guaranteed would make me hesitant to trust that solution though.


No... you don't understand, Mike. People jump through all sorts of hoops with indexes and whether or not to use include and whether or not to use covering indexes. A great amount of time is spent explaining clustered indexes and the fact that non clustered indexes are appended with the clustered index and how to defrag and rebuild indexes. Why? To trick the optimizer into doing things faster and that's all I've done with the quirky update. You say it's not quaranteed but even a SELECT with a WHERE clause isn't guaranteed. SQL SERVER 2000 had to have a hot fix after sp2 because if parallelism occurred during an UPDATE that had a WHERE clause with an IS NULL in it, the IS NULL was ignored and all rows were updated. I know this because it happened to the folks at work. NOTHING is guaranteed to work so use everything that you can. Or not... use a cursor if it makes you nervous. Just stop bad mouthing folks that have found things that work that you can't break if you follow some simple rules... guaranteed or not and whether they go against your grain or not.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

"Change is inevitable. Change for the better is not." -- 04 August 2013
(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #957902
« Prev Topic | Next Topic »

Add to briefcase «««2829303132»»»

Permissions Expand / Collapse