Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««2829303132»»

There Must Be 15 Ways To Lose Your Cursors… Part 2: Just Put It in a S Expand / Collapse
Author
Message
Posted Friday, July 23, 2010 6:44 AM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Wednesday, September 24, 2014 1:20 PM
Points: 1,276, Visits: 1,135
Jeff Moden (7/23/2010)
Mike C (7/22/2010)
Jeff Moden (7/22/2010)
Mike C (7/22/2010)
Jeff Moden (7/22/2010)
Mike C (7/22/2010)
Obviously the so-called "buggy" execution plan requires the developer to be more careful, but gracefully handling invalid function parameter and divide-by-zero errors are not new problems in programming and shouldn't come as a surprise to most devs.


Hmmm... sounds just like the rules for a quirky update.


What's that? If you divide two numbers make sure the dividend is not zero? . Most devs learned that rule in grade school :)


Nah... I was talking about the developer being careful.


Ahhh, now I understand I suppose, like the quirky update, with the right combination of query hints, options, hardware, and indexes you could probably find a way to force the query optimizer to execute the WHERE clause first. At least until the optimizer comes up with a new plan. Knowing that the physical order of execution within a query is not guaranteed would make me hesitant to trust that solution though.


No... you don't understand, Mike. People jump through all sorts of hoops with indexes and whether or not to use include and whether or not to use covering indexes. A great amount of time is spent explaining clustered indexes and the fact that non clustered indexes are appended with the clustered index and how to defrag and rebuild indexes. Why? To trick the optimizer into doing things faster and that's all I've done with the quirky update. You say it's not quaranteed but even a SELECT with a WHERE clause isn't guaranteed. SQL SERVER 2000 had to have a hot fix after sp2 because if parallelism occurred during an UPDATE that had a WHERE clause with an IS NULL in it, the IS NULL was ignored and all rows were updated. I know this because it happened to the folks at work. NOTHING is guaranteed to work so use everything that you can. Or not... use a cursor if it makes you nervous. Just stop bad mouthing folks that have found things that work that you can't break if you follow some simple rules... guaranteed or not and whether they go against your grain or not.


I'm in total agreement with you 99.9% of the time on just about every topic Jeff, and you know that well. The fact that people spend a "lot of time" tweaking indexes to improve performance is not the same as spending time tweaking indexes to "change the content of your end result"; at least not in my mind. The two concepts to me seem disconnected.

I'm not particularly interested in convincing you that indexes (or lack of them) should not change the content of your results. Or number of processors, etc. If you believe the content of your output should be determined by your indexes, index hints, etc. then go for it.

At any rate, I respect your opinion and since you feel I'm "bad mouthing folks", I'm outta here and done with this thread.

Out here.

Mike C
Post #957914
Posted Friday, July 23, 2010 9:45 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Friday, September 5, 2014 2:00 PM
Points: 2,160, Visits: 2,191
Tom.Thomson (7/22/2010)
[quote]Mike C (7/22/2010)
set transaction isolation level serializable
begin tran
create table #T (a int check (a=1) default (1), b int check (b=2) default(2))
insert #T (TABLOCKX) values (1,2)
select T.a+T.b as c from #T T
commit tran

I will get a result set of some rows with a single column, probably named c. It's not really clear that in the row(s) I get in that result set the single column c will have type int. It's far from clear that I'll get just one row. in fact the BoL page says that the select statement doesn't specify that. It is explicitly stated in that BoL page that this select statement is NOT guaranteed to return me the value 3 in column of the single (if I do get just one) row, since there the select statement specifies no relation between the values in the source tables and the values in the result set, except perhaps as affected by where and having clauses, if that is conceivable (unlikely, except maybe for aggregates): certainly not by the select clause or its output list, and it's the output list that matters here.


I don't understand, that is a basic SELECT statement with no quirky update or aggregation, so since you have one record in the table you will get one record returned. (Well if the syntax errors are fixed first.) If you have 2 records in the table you would get two records back with a single column that contained a value of 3. (Unless one of the records contains a NULL in either column in which case the corresponding row will return NULL for the value of the C column.)
Post #958057
Posted Friday, July 23, 2010 12:48 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 4:57 PM
Points: 7,745, Visits: 9,493
UMG Developer (7/23/2010)
I don't understand, that is a basic SELECT statement with no quirky update or aggregation, so since you have one record in the table you will get one record returned. (Well if the syntax errors are fixed first.) If you have 2 records in the table you would get two records back with a single column that contained a value of 3. (Unless one of the records contains a NULL in either column in which case the corresponding row will return NULL for the value of the C column.)

That of copurse was precisely my point - we know what theoutput from that will be so that passage in BoL that says that I can't know that the result has a single column containing "3" is nonsense, so whoever wrote that passager clearly wasn't thinking staright about what is guaranteed despite the optimzer's reordering of operations. The worry is that this sloppy thinking may at some point impact development in MS, resulting in some nasty bugs.

Maybe I should have used an even simpler example, like this: if that BoL passage were correct we could not be sure that the result set from "select 1 as c" has the value 1 in the single field c of its single row.


Tom
Post #958180
Posted Friday, July 23, 2010 1:24 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 4:57 PM
Points: 7,745, Visits: 9,493
Mike C (7/22/2010)
Let's walk through this briefly and see what steps would be required to implement the original query to perform it in logical order steps:

select
left( 'abc', T1.v )
from
T1
where
T1.v >= 0
;


Now let's say we have 10 million rows in T1 and 5 million of them match the where clause (T1.v >= 0). Here's what we need to do:

1. Scan the 10 million row table (look at every single row) and compare to 0.
2. Dump the 5 million rows that match the WHERE clause into a temp table or other intermediate storage.
3. Scan the 5 million rows that match the WHERE clause and perform the projection against them (left( 'abc', T1.v )).

We just scanned 15 million rows and stored 5 million rows in temporary storage to fulfill this query. Now let's look at one way we can optimize this:

1. Scan the 10 million row table.
2. Execute the WHERE clause and the projection at the same time on each row since both clauses are using the same columns.


I notice that you refer to a projection followed by a transformation (or mapping - people use both terms) as a projection; that's potentially dangerous thinking when discussing optimisation, since transformations (mappings) can be very expensive, while projections are fairly cheap.

In your "optimized" version we have 10 million invocations of left instead of 5 million. Seems to me to be too many! Surely the optimizer can handle transformations at the same time as restrictions (or would "select sum(T1.v*5) from T1 where T1.v>=0" create a table in interim storage to store the restriction's result in, and then scan that table to compute the multiplication and do the sum - if so it's inferior to an optimizer I saw a fresh young graduate throw together in a few months back in the 80s)? If it can do that in a single scan without a temporary table for intermediate results why should it not do the example using left that way too?

What I expect the system to do is:
scan the 10 million row table, executing the projection and the restriction [the where clause] on each row (you can't execute the where clause without doing the projection, since it examines c, the only column in the projection), and executing the transformation and passing the transformed row to the output if (only if) the row is included by the where clause.

No extra scans, no temporary tables, and no unneccessary transformations. And no violation of the declarative semantics of relational algebra.

Anyway, I expect the semantics to be such that the result of the select is not modified by rows which do not "qualify for the select" ("qualify for the select" is Microsoft's terminology, not mine). Maybe that's because I expect to see at least lip service paid to relational algebra.

I can't make a similar argument for the "quirky update", and I would prefer to see SQL add some concepts from functional algebra with good clear semantics to the select and update statements, and then T-SQL could adopt those to replace the current quirky update . But I don't see it happening.

My worry about the passage I quoted from the BoL documentation is twofold: (a) it's wrong (outrageously wrong); (b) it may indicate a tendency in Microsdoft to believe that clever tricks in the optimiser justify overriding the semantics of SQL, and if so that's very dangerous.


Tom
Post #958221
Posted Saturday, July 24, 2010 1:52 PM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Friday, September 12, 2014 7:03 PM
Points: 89, Visits: 481
ganotedp (7/20/2010)
"Business Logic"?!? That's a highly, um, "elastic" concept. I'm with Alex Papadimoulis on the topic:
http://thedailywtf.com/Articles/The-Mythical-Business-Layer.aspx



With due respect to Alex, it is not as elastic as you think. A good barometer for whether a given piece of logic belongs in the database is whether it can be written without cursors and without special CLR routines. If it cannot, the odds are high that it does not belong in the database. A database, specifically a OLTP database, is best at storing data, enforcing data integrity and serving data. It is not designed for complicated calculations. It is not made to be a reporting tool (as opposed to simply serving data to a reporting tool). Just as Excel can be misused as a database, a database can be misused as a repository for complicated calculations. It is a testament to T-SQL that the language is sufficiently powerful to be misused for these purposes.
Post #958424
Posted Saturday, July 24, 2010 2:59 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 4:57 PM
Points: 7,745, Visits: 9,493
Thomas-282729 (7/24/2010)
[quote]With due respect to Alex, it is not as elastic as you think. A good barometer for whether a given piece of logic belongs in the database is whether it can be written without cursors and without special CLR routines. If it cannot, the odds are high that it does not belong in the database. A database, specifically a OLTP database, is best at storing data, enforcing data integrity and serving data. It is not designed for complicated calculations. It is not made to be a reporting tool (as opposed to simply serving data to a reporting tool). Just as Excel can be misused as a database, a database can be misused as a repository for complicated calculations. It is a testament to T-SQL that the language is sufficiently powerful to be misused for these purposes.

The relational calculus with propositional calculus and transactional control, which is what a language supporting the relational model is supposed to be, is Turing Complete as soon as you add the "while <boolean expression> <compound statement>" construct; so it is sufficiently powerful to be used for anything that is computable. With its while loops SQL in general (and T-SQL in particular) although it only pays lip service to the idea of supporting relational algebra with predicate calculus, is indeed Turing complete. That isn't at all remarkable - there have been very very few programming languages which weren't and it isn't any sort of testament to T-SQL.

The term "business logic" is sufficiently vague in meaning that it can include many things. Any reasonable use of the term includes things which clearly are properties of the business data model - for example check constraints and uniqueness constraints. Forbidding the database to implement business logic would forbid database designers from normalising their schemata: after all progressing from 2NF to 3NF is the process of ensuring that all simple functional dependencies are enforced by the schema and its keys and constraints, and functional dependencies are clearly about business logic. So I have to disagree strongly with the idea that there should be no business logic in the database.

edit: first version didn't mention while loop - I was being a bit sloppy.


Tom
Post #958434
Posted Saturday, July 24, 2010 5:20 PM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Friday, September 12, 2014 7:03 PM
Points: 89, Visits: 481
Tom.Thomson (7/24/2010)
Thomas-282729 (7/24/2010)
[quote]With due respect to Alex, it is not as elastic as you think. A good barometer for whether a given piece of logic belongs in the database is whether it can be written without cursors and without special CLR routines. If it cannot, the odds are high that it does not belong in the database. A database, specifically a OLTP database, is best at storing data, enforcing data integrity and serving data. It is not designed for complicated calculations. It is not made to be a reporting tool (as opposed to simply serving data to a reporting tool). Just as Excel can be misused as a database, a database can be misused as a repository for complicated calculations. It is a testament to T-SQL that the language is sufficiently powerful to be misused for these purposes.

The relational calculus with propositional calculus, which is what a language supporting the relational model is supposed to be is Turing Complete; so it is sufficiently powerful to be used for anything that is computable. SQL in general (and T-SQL in particular) although it only pays lip service to the idea of supporting relational algebra with predicate calculus, is indeed Turing complete. That isn't at all remarkable - there have been very very few programming languages which weren't and it isn't any sort of testament to T-SQL.

The term "business logic" is sufficiently vague in meaning that it can include many things. Any reasonable use of the term includes things which clearly are properties of the business data model - for example check constraints and uniqueness constraints. Forbidding the database to implement business logic would forbid database designers from normalising their schemata: after all progressing from 2NF to 3NF is the process of ensuring that all simple functional dependencies are enforced by the schema and its keys and constraints, and functional dependencies are clearly about business logic. So I have to disagree strongly with the idea that there should be no business logic in the database.


Much like Alex's article, I think you are confusing data integrity with business logic which are both needed for any database driven solution. Check constraints and uniqueness constraints relate to the integrity of the data. Rules such as "in march, sales from reps with a certain sale volume threshold and for products of certain types should have an interest rate discount based the users credit score accounting for a series of values some of which come from services" are difficult to implement and properly test in a database.

Yes, the term "business logic" is vague. DRI in and of itself is arguably a form of business logic. Each set of tools for supporting business logic have their strengths and weaknesses. Databases are great at set-based logic. They are poor at complex calculations or calculations which require iterative solutions. For example, it is very difficult to create multi-threaded solutions in a database. They are simply not designed for that. Further, databases are substantially more expensive to scale out than middle-tier libraries or services.

I never said that absolutely no business logic should be in the database. Instead, I suggested that if you are required to implement an iterative solution in the database using cursors, it does not belong in the database. The database is not the end-all-be-all repository for business logic. There are some forms of logic that simply do not belong in the database. Sometimes the answer for how to do something in T-SQL is not to do it T-SQL.
Post #958443
Posted Sunday, July 25, 2010 12:58 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 4:20 PM
Points: 35,265, Visits: 31,755
Mike C (7/23/2010)
I'm in total agreement with you 99.9% of the time on just about every topic Jeff, and you know that well. The fact that people spend a "lot of time" tweaking indexes to improve performance is not the same as spending time tweaking indexes to "change the content of your end result"; at least not in my mind. The two concepts to me seem disconnected.

I'm not particularly interested in convincing you that indexes (or lack of them) should not change the content of your results. Or number of processors, etc. If you believe the content of your output should be determined by your indexes, index hints, etc. then go for it.

At any rate, I respect your opinion and since you feel I'm "bad mouthing folks", I'm outta here and done with this thread.

Out here.

Mike C


Man... didn't mean for it to come to that especially since you're absolutely correct... we do agree on 99.9% of everything. Sorry, Mike... my fault for being a bit over-zealous here because of the really terrible bad-mouthing I've taken in the past on the subject. Didn't mean to snap at one of the good guys.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #958509
Posted Wednesday, August 11, 2010 10:45 AM


Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Sunday, May 19, 2013 8:22 AM
Points: 17, Visits: 148
SQL Developers definitely think differently! In my mind's eye, it's more of a 3D approach, while RBAR is pretty "flat".

BTW, the need for set-based thought processes isn't peculiar to SQL Server. Tom Kyte has a "mantra" that, though tailored for Oracle, can be modified to apply SQL Server development. His mantra:

o You should do it in a single SQL statement if at all possible.
o If you cannot do it in a single SQL Statement, then do it in PL/SQL.
o If you cannot do it in PL/SQL, try a Java Stored Procedure.
o If you cannot do it in Java, do it in a C external procedure.
o If you cannot do it in a C external routine, you might want to seriously
think about why it is you need to do it!
Post #967605
Posted Saturday, August 14, 2010 11:18 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 4:20 PM
Points: 35,265, Visits: 31,755
Heh... Tom is definitely one of the good guys.

Part of his last statement in his mantra is actually part of the first statement in mine...

"you might want to seriously think about why it is you need to do it!"



--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #969405
« Prev Topic | Next Topic »

Add to briefcase «««2829303132»»

Permissions Expand / Collapse