More RBAR and "Tuning" UPDATEs

  • Another excellent article Jeff.

  • Hi all,

    First of all what a great article, I think you guys (Jeff and all the other contributors here) have really helped clarify an area that to many DBAs appears to be black magic.

    One of the interests of all DBAs is in making sure that the code is as portable as possible i.e. doesn't matter if we use Oracle (9i +) or SQL 2005 plus so in my humble opinion the best solution here is to use the table -> query -> alias with inner join method.

    I agree with the idea that indexing has to considered whenever designing a query and certainly avoid the RBAR syndrome.

    Again well done and thanks for putting together what ha sbeen a most enlightening article.

    🙂

  • Jeff, as I read your article I worked through it and saw the improvement your change made; and then Hugo sample worked great too. To read the comment that good indexes made them both faster again :w00t: , indexes!

    The same indexes on the original sample made it the FASTEST was just a reminder that there are many way to make a difference.

    Thank you both and the rest for making this forum a useful place to visit.

    😀
    ACN is the world's largest direct seller of telecommunications and essential services.
    http://helpu.acndirect.com/
    ACN Digital Phone Service customers automatically enjoy unlimited calling to 60 landline
    destinations around the world, including India, Mexico and the UK!
    :hehe:

  • This was indeed an excellent article. And I must second the fact that RBAR would catch on a lot faster if it was pronounced 'AR-BAR'.

    ---
    Timothy A Wiseman
    SQL Blog: http://timothyawiseman.wordpress.com/

  • Hey, before I get started on this, "thank you" to all of the folks that took time to provide

    some feedback, whether it was pro or con. That's what makes this forum so great...

    exchange of ideas. It's a shame that some folks have to get nasty or condescending, but

    that's kind of a lesson all by itself. 😉

    Ok... here we go...

    Heh... dang it... I knew that Microsoft example was going to bite me for what I was trying

    to show, but I used it anyway because I got in a hurry.

    Yes, in this particular case, the simple addition of an index, created by the Index Tuning

    Wizard, did in fact cause the original version to beat out the two rewrites. :blush: But,

    now you know that if you have a correlated sub-query and an index doesn't help, now you

    know how to attack the problem for a rewrite.

    While we're on the subject of subqueries... someone said that the article said sub-

    queries are RBAR and are bad. I didn't say that at all. Go back and look... I said

    correlated sub-queries are RBAR and are bad. No matter how you

    swing it, each correlated subquery gets executed once for each row of the outer query.

    Single correlations sometimes do, infact, run faster than a join in the presence of a good

    and proper index and Hugo did a good job of demonstrating that.

    I also said that code portability is a myth and some folks pretty much hammered on that

    for quite some time. They also hammered on using SQL Server's proprietary Update that

    has a FROM clause because it's not ANSI. Then they wrote that it shouldn't ever be used

    because Microsoft is saying it'll be deprecated. HEH, THEN they turned around and

    admitted to using it a couple of paragraphs later. So, I ask you... out of all you folks

    that claim that portable code through the use of ANSI only commands should be the goal

    of every DBA and Developer, how many of you actually pull that off? None?

    Maybe a couple of GUI programmers who never have to write batch code? That's kinda

    what I thought. The non-Ansi extensions of each RDBMS engine are just too valuable

    and too powerful to not use. I know of no ANSI only "believers" that actually write

    ANSI only compliant SQL 100% of the time. They're just like the rest of us and will use

    whatever tools are necessary to get the job done. So, give that tired ol' lecture a break,

    please. 😉

    And, if Microsoft ever deprecates the FROM clause in the UPDATE statement, I want you

    to know that me and about a million other people who write T-SQL in countries around

    the world are gonna march on Redmond and give Bill Gates and his crew the worst "pink

    belly" they've ever had for screwing up several billion stored procedures.

    While we're at it, some folks worry about the documented "fact" that the UPDATE FROM

    will cause "indeterminent results". Let me just say that a correctly laid out UPDATE FROM

    with the Target Table in the FROM clause has never caused an error for me. It's always

    done exactly what I've wanted. Maybe if you use the "monster" update I spoke of in the

    article like it did for me, but never a properly laid out Update.

    Ok, let's get down to an example that'll really show why I generally don't allow correlated sub-

    queries in my shop... especially on Updates...

    First, you need to make a Tally table which we'll use to build some other test data. You

    can find the code at the following URL... change the object of the TOP clause to

    1,000,000 rows please. Here's the URL for how to build one...

    http://www.sqlservercentral.com/scripts/Advanced+SQL/62486/

    ... and, yeah, I got this one right. 😛

    Ok, let's build the test data (SourceTable) and a table that we'll update (TargetTable)...

    --===================================================================

    -- Build the test tables and the data using the Tally table

    --===================================================================

    CREATE TABLE TargetTable

    (

    Account CHAR(1) NOT NULL PRIMARY KEY CLUSTERED,

    TheSum INTEGER NOT NULL DEFAULT 0,

    TheAvg INTEGER NOT NULL DEFAULT 0,

    TheCnt INTEGER NOT NULL DEFAULT 0

    )

    CREATE TABLE SourceTable

    (

    SourceTableID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,

    Account CHAR(1) NOT NULL,

    Amount INTEGER NOT NULL

    )

    INSERT INTO TargetTable (Account)

    SELECT CHAR(N)

    FROM dbo.Tally

    WHERE N BETWEEN 65 AND 90 --ASCII "A" = 65, ASCII "B" = 90

    INSERT INTO SourceTable (Account,Amount)

    SELECT CHAR((N%26)+65) AS Account,

    (N%7) AS Amount

    FROM dbo.Tally

    --===== Show the sample content of each table

    SELECT * FROM TargetTable

    SELECT TOP 100 * FROM SourceTable

    Ok... Now the code for the correlated subquery method and the UPDATE FROM method

    using a derived table (same as a CTE but works in all versions of SQL SERVER 😛 )

    --===== The Correlated Subquery Method

    PRINT 'Correlated Subquery...'

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    UPDATE TargetTable

    SET TheSum = (SELECT SUM(SourceTable.Amount)

    FROM SourceTable

    WHERE TargetTable.Account = SourceTable.Account),

    TheAvg = (SELECT AVG(SourceTable.Amount)

    FROM SourceTable

    WHERE TargetTable.Account = SourceTable.Account),

    TheCnt = (SELECT COUNT(SourceTable.Amount)

    FROM SourceTable

    WHERE TargetTable.Account = SourceTable.Account)

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    PRINT REPLICATE('=',70)

    --===== The UPDATE FROM Method

    PRINT 'Update From...'

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    UPDATE TargetTable

    SET TheSum = d.TheSum,

    TheAvg = d.TheAvg,

    TheCnt = d.TheCnt

    FROM dbo.TargetTable tt

    INNER JOIN

    (SELECT st.Account,SUM(st.Amount) AS TheSum,AVG(st.Amount) AS TheAvg,COUNT(st.Amount) AS TheCnt

    FROM dbo.SourceTable st

    GROUP BY st.Account) d

    ON tt.Account = d.Account

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    PRINT REPLICATE('=',70)

    Ok... run the code several times to get some timings before we add an index... you'll find

    that the UPDATE FROM runs in about 2/5ths the amount of time as the correlated

    subquery. And take a peek at the read counts for each... Hmmm... the read counts for

    the correlated subquery is exactly 3 times the number for the UPDATE FROM... any guess

    why that might be? (Wait for it, folks) 😀

    Now, let's add the index that's most beneficial to both queries... I cheated and created

    this index using the Index Tuning Wizard on the correlated sub-query, just to be fair.

    CREATE INDEX [SourceTable3] ON [dbo].[SourceTable]([Account], [Amount])

    Run the code several times again. Notice the timings and notice the reads. Not exactly

    a 3:1 ratio anymore because of the Index, but stall close to 3:1 ratio. And the UPDATE

    FROM still wins.

    Now... wanna see something interesting? I've remove the calculation for "TheCnt" from

    both queries in the following code... run it again several times...

    --===== The Correlated Subquery Method

    PRINT 'Correlated Subquery...'

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    UPDATE TargetTable

    SET TheSum = (SELECT SUM(SourceTable.Amount)

    FROM SourceTable

    WHERE TargetTable.Account = SourceTable.Account),

    TheAvg = (SELECT AVG(SourceTable.Amount)

    FROM SourceTable

    WHERE TargetTable.Account = SourceTable.Account)

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    PRINT REPLICATE('=',70)

    --===== The UPDATE FROM Method

    PRINT 'Update From...'

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    UPDATE TargetTable

    SET TheSum = d.TheSum,

    TheAvg = d.TheAvg

    FROM dbo.TargetTable tt

    INNER JOIN

    (SELECT st.Account,SUM(st.Amount) AS TheSum,AVG(st.Amount) AS TheAvg

    FROM dbo.SourceTable st

    GROUP BY st.Account) d

    ON tt.Account = d.Account

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    PRINT REPLICATE('=',70)

    Notice that the times, scans and reads went down for the correlated sub-query but

    stayed the same for the UPDATE FROM? Lets remove "TheAvg" from both and see what

    happens...

    --===== The Correlated Subquery Method

    PRINT 'Correlated Subquery...'

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    UPDATE TargetTable

    SET TheSum = (SELECT SUM(SourceTable.Amount)

    FROM SourceTable

    WHERE TargetTable.Account = SourceTable.Account)

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    PRINT REPLICATE('=',70)

    --===== The UPDATE FROM Method

    PRINT 'Update From...'

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    UPDATE TargetTable

    SET TheSum = d.TheSum

    FROM dbo.TargetTable tt

    INNER JOIN

    (SELECT st.Account,SUM(st.Amount) AS TheSum

    FROM dbo.SourceTable st

    GROUP BY st.Account) d

    ON tt.Account = d.Account

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    PRINT REPLICATE('=',70)

    Notice that the same thing happened. The times, scans, and reads went down for the

    correlated sub-query but stayed mostly the same for the UPDATE FROM... in fact, the

    correlated sub-query now beats the UPDATE FROM in every total category.

    And THAT's a big part of what I was trying to get across in the article, folks. But I took a

    shortcut and blew it. 1 correlated sub-query does not necessarily make bad code so far

    as performance goes. But, their effect is cumulative... 1 isn't bad, 2 get's beat pretty

    easily by the UPDATE FROM and a derived table (or CTE), 3 is getting a lot worse... etc,

    etc. And, nothing you do with an index is gonna fix that!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • p.s. All the times, scan, and read counts for my previous post were on SQL Server 2k... the optimizer in 2k5 was a little more kind to the correlated sub-query... but not a whole lot.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff, good article and good response. I also agree that the tone of some of the comments was unnecessary and inappropriate.

    That said, there are two things that I would like to reply to. I will reply to the performance of Updates below and I will talk about the question of Standards in a later reply.

    Jeff Moden (3/14/2008)


    ... I said correlated sub-queries are RBAR and are bad. No matter how you swing it, each correlated subquery gets executed once for each row of the outer query.

    All I can say is “Not in My Experience.” Although Sql2000 did have some problems in this area (performance “bugs”, IMHO) these were mostly addressed in Sql2005. And if you think about the set-logic of what is being expressed, there really is no good reason in many cases why correlated subqueries in UPDATE should be significantly slower than outer joins (there is one major exception which I will get to in a minute). In fact, since they only have to return 1 column instead of all of them, there is every reason to think that they could be faster than joins (as they are in some other environments).

    There is certainly no reason why the optimizer should have to abandon set-oriented query tasks and that is not what I see in the query plans. I have some simple examples that demonstrate this:

    --=== set up the environment ==--

    select * into Objs from master..sysobjects

    select * into Schemas from master.sys.schemas

    select * into SchemasPk from master.sys.schemas

    ALTER TABLE dbo.SchemasPk ADD CONSTRAINT PK_SchemasPk PRIMARY KEY CLUSTERED (schema_id)

    GO

    --=== test queries: ===

    -- correlated subquery:

    update Objs set schema_ver = (Select principal_id From Schemas Where [schema_id]=[uid])

    -- outer join:

    update Objs set schema_ver = principal_id From Objs left join Schemas On [schema_id]=[uid]

    -- correlated subquery w/ lookup key:

    update Objs set schema_ver = (Select principal_id From SchemasPk Where [schema_id]=[uid])

    -- outer join w/ lookup key:

    update Objs set schema_ver = principal_id From Objs left join SchemasPk On [schema_id]=[uid]

    --=== cursor example w lookup key: ===

    DECLARE test_cursor CURSOR FOR

    Select [uid] From Objs

    FOR Update of schema_ver

    Declare @uid int, @Schema int

    OPEN test_cursor

    FETCH NEXT FROM test_cursor INTO @uid

    WHILE (@@FETCH_STATUS = 0)

    BEGIN

    Update Objs

    Set schema_ver = (Select principal_id From SchemasPk Where [schema_id]=@uid)

    Where Current OF test_cursor

    FETCH NEXT FROM test_cursor INTO @uid

    END

    CLOSE test_cursor

    DEALLOCATE test_cursor

    As you can see subqueries and joins are very close with each being faster in one of the two cases. And neither is at all like the RBAR cursor example which is twice as slow. In fact, if you factor out the cost of the UPDATE itself, it is 10 times as slow!

    There is one common case where Joins are clearly superior to Subqueries though: when there are multiple subqueries with the same correlation predicate that could be satisfied with a single join. All of the examples that you give are in this category and in these cases I do agree, the join method is clearly superior and should be used.

    But for the other cases either is fine and the subquery method does have some advantages:

    1.Easier for beginners to learn and understand

    2.Conformance to standards

    Jeff Moden (3/14/2008)


    And THAT's a big part of what I was trying to get across in the article, folks. But I took a

    shortcut and blew it. 1 correlated sub-query does not necessarily make bad code so far

    as performance goes. But, their effect is cumulative... 1 isn't bad, 2 get's beat pretty

    easily by the UPDATE FROM and a derived table (or CTE), 3 is getting a lot worse... etc,

    etc. And, nothing you do with an index is gonna fix that!

    We are almost in agreement here. When those multiple subqueries could be merged into one Join, then you are definitely right: Subqueries are slower, though not really RBAR as I have understood your use of that term in the past.

    However, if those multiple subqueries could only be replaced with an equal number of joins, then it is not so clear which is better.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • All I can say is “Not in My Experience.” Although Sql2000 did have some problems in this area (performance “bugs”, IMHO) these were mostly addressed in Sql2005.

    Thanks for that tip, Barry. I'll check it out.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Barry, you're correct... the single column update example yields nearly identical results for both query types in 2k5.

    [font="Courier New"]Correlated Subquery...

    Table 'TargetTable'. Scan count 1, logical reads 54, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'SourceTable'. Scan count 1, logical reads 1490, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    (26 row(s) affected)

    (1 row(s) affected)

    SQL Server Execution Times:

    CPU time = 891 ms, elapsed time = 994 ms.

    ======================================================================

    Update From...

    Table 'TargetTable'. Scan count 0, logical reads 52, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'SourceTable'. Scan count 1, logical reads 1490, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    (26 row(s) affected)

    (1 row(s) affected)

    SQL Server Execution Times:

    CPU time = 890 ms, elapsed time = 1003 ms.

    ======================================================================

    [/font]

    That being said, it doesn't look like the single correlated sub-query has any performance advantage over the UPDATE FROM and it looses in the 2 and 3 column foot races. Except for chasing some attempt at code portability or maybe some newbie to SS having an easier time of it, I really don't see an advantage in 2k5 to ever using a correlated subquery, not even for single column updates. In 2k, it does edge out the join method on a properly indexed table.

    I actually can't wait for the MERGE command to finally come out. I hope they do it right... I use it in Oracle to avoid sub-queries instead of Updates even if the query isn't meant to be an "upsert". I find they're easier to trouble shoot because they use joins instead of correlation. I dunno... maybe I'm just wierd :hehe: ... I find it's a lot easier to read, troubleshoot, and write joins than I do with correlated subqueries. Hard for me to believe that some newbie who's never written SQL would find a correlated sub-query easier to understand than a join.

    Anyway, thanks again for the feedback. You and the rest of the folks have been awesome as always.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • As to the matter of conformance to standards...

    Jeff Moden (3/14/2008)


    I also said that code portability is a myth and some folks pretty much hammered on that

    for quite some time.

    For the purposes of discussion it is useful to distinguish between Standards Compliance which is a practice and Portability which is a potential outcome and benefit of that practice. This is important because there are other potential benefits.

    Jeff Moden (3/14/2008)


    ...HEH, THEN they turned around and admitted to using it a couple of paragraphs later. So, I ask you... out of all you folks that claim that portable code through the use of ANSI only commands should be the goal of every DBA and Developer, how many of you actually pull that off? None?

    Maybe a couple of GUI programmers who never have to write batch code? That's kinda what I thought.

    This does not fairly represent the position of those of us who support SQL standards, Jeff, as it confuses striving for our goal with attaining our goal. I have never known anyone who seriously thought that we should only ever use standard SQL. We cannot, because we know the reality. SQL is not like C/C++, Java, etc., because unlike them no one truly implements the darn standards.

    Sure, some vendors are better than others, but even the best of them are woefully short, IMHO and that leaves us with large holes in what can be done with standard SQL in any specific implementation. Worse, many standards that are implemented are not optimized by the vendor as well as their own extensions (which is what happened to columnar subqueries in Updates before Sql2005).

    So, the position of myself and most others is "Don't use non-complaint SQL, except where you need to." because we all recognize that sooner or later we have to use non-compliant SQL to fulfill the objectives set for us. The differences that we have are primarily over where we draw that line.

    The non-Ansi extensions of each RDBMS engine are just too valuable and too powerful to not use.

    As above, that is only half of the story. The other half is that because the vendors (not just Microsoft) are not implementing the Standards completely enough or of sufficient performance, that makes their vendor-specific extensions seem more valuable.

    I know of no ANSI only "believers" that actually write ANSI only compliant SQL 100% of the time. They're just like the rest of us and will use whatever tools are necessary to get the job done.

    There is a huge difference between "necessary" and "preferred". When there is a choice, we should choose to use the standard tools.

    So, give that tired ol' lecture a break, please. 😉

    *sigh* Sorry, Jeff, I can no more give it up than I could give up trying to stamp out Cursors (and the people that I oversee say the same thing about that).

    And, if Microsoft ever deprecates the FROM clause in the UPDATE statement, I want you to know that me and about a million other people who write T-SQL in countries around the world are gonna march on Redmond and give Bill Gates and his crew the worst "pink belly" they've ever had for screwing up several billion stored procedures.

    Been there, done that, with more people and when Microsoft (or a predecessor) was much weaker. It is shockingly ineffective. Also, it won't be a million people. Remember, because of the poor documentation, most people have been writing them wrong all along.

    Don't get me wrong, Jeff, 25 years ago I agreed with you and in fact made many of the very same arguments. And if it comes to that, I will be right out there with you. It just won't work. Why? Because while we are out there with our torches the powers that be will be looking out at us and saying to each other: "Who are they kidding? It is still far cheaper for them to change this one thing in their code than it would be for them to convert to another vendor."

    And why can they say this? Because:

    1) None of the SQL-based products are conformant enough to achieve full portability.

    2) This is because they don't feel enough pressure from us to make their products truly conformant.

    3) and this is because we do not even use the standards that they have implemented.

    And this all highlights the most important aspect of this: Vendor-specific extensions benefit the Vendor far more than they benefit us (because it locks us into their products). Standards benefit us.

    And this is true even within the same vendor, over time. For instance, I started using the INFORMATION_SCHEMA tables as soon as they were available to me, in preference to the system objects wherever that was possible. Now, almost a decade later, every single clause I wrote that used INFORMATION_SCHEMA tables still works on the latest release of SQL Server (2005) and I expect them to still work on 2008.

    On the other hand, I have had to rewrite thousands of lines of SQL that touched system objects or other Microsoft-specific extensions because their names changed or their columns changed or their functionality changed or their location changed or whatever. INFORMATION_SCHEMA still works exactly as it originally did because it is a standard and Microsoft can't "twiddle" with it the way that they twiddle with their own stuff.

    So that is why I say:

    - [font="Courier New"]coalesce()[/font] is better than [font="Courier New"]isnull()[/font], and

    - [font="Courier New"]cast()[/font] is better than [font="Courier New"]convert()[/font], unless you need the format_style argument, and

    - column subqueries are better than FROM..JOINS in UPDATES, unless there is a performance difference that is significant enough to matter to you.

    So if your UPDATE with Joins executes in 0.1 seconds and the same one with subqueries takes 10 seconds, then by all means, use the Joins.

    But if the Join version takes 0.10 seconds and the subquery version takes 0.11 seconds, then as far as I am concerned that is a clear win for the standards-compliant SQL.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Dang, Barry... good stuff but awfully long for saying that folks preach the standards right up to the point where the standards don't work for them and then they violate it. 😛 Heh... kinda like saying "Thou shalt not commit adultery (unless you're really horney.)" 😀

    Me? I'll tell you up front that whatever RDBMS I use, I'm going to use the tool that I think gets the job done with the best performance and scalability all the time and that usually doesn't involve anything close to ANSI Standard code even if it can be done using that standard because it'll usually be slower. And, no, I'm not talking about the difference between .10 and .11 seconds... on large batch stored procedures, I'm taking about the difference between 24 hours run time and 15 minutes for the same thing.

    I've only been in 2 migrations from SQL Server to Oracle and the resulting code conversions, but I can say that any attempt at writing ANSI code, usually at the expense of performance or scalability, is usually not worth it, especially considering how rare such migrations have been. The conversion of the code wasn't what took the time... it was the regression testing and that would have to be done even a 100% ANSI code migration (which simply doesn't exist). So, I don't even try.

    A good example is what Microsoft wrote themselves... take a look at the Information_Schema views and tell me there's enough information to write the likes of sp_SpaceUsed. I got tired of the lame-o output and wrote a version myself and it does the whole database at once. Do I expect that code to be able to withstand the test of time from 6.5 to 2008? No... just 2000 but, as it turns out, they put views in 2005 that are the equivelent of the system tables in 2000 so it still works just fine in 2k5. 😉 If they don't put the same system views in, I can write a simple view from whatever their current system is and my code still works.

    A couple of things we do agree on is that Microsoft, like any company of their stature, does take the user requests into consideration a lot less than the users would like (although the uprising in support of FoxPro several years back seemed to work). Of course, there's a lot of appathy on the part of users, too... look at the Poll for SP3 on 2k5... There's almost 3/4 of a million subscribers on the site and just a bit more than, what, a couple hundred votes on the Poll? That's disgusting.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (3/15/2008)


    Dang, Barry... good stuff but awfully long for saying that folks preach the standards right up to the point where the standards don't work for them and then they violate it. 😛 Heh... kinda like saying "Thou shalt not commit adultery (unless you're really horney.)" 😀

    Yes, but that's desire not necessity.

    And, no, I'm not talking about the difference between .10 and .11 seconds... on large batch stored procedures, I'm taking about the difference between 24 hours run time and 15 minutes for the same thing.

    Then we are in agreement, I would do the same thing.

    The conversion of the code wasn't what took the time... it was the regression testing and that would have to be done even a 100% ANSI code migration (which simply doesn't exist).

    Compare that to where C/C++ programmers have been for over 15 years: they can write new code targeted to run on multiple different platforms right from the start AND expect it to work. And their testing is all automated. That's where I want to be. And after 20+ years of SQL and standards, that is where we should be. It would vastly improve both the value and the importance of the work that we do.

    A good example is what Microsoft wrote themselves... take a look at the Information_Schema views and tell me there's enough information to write the likes of sp_SpaceUsed.

    No, but then that is not really what INFORMATION_SCHEMA was targeted at by the ANSI committee. It's really about DDL and metadata, not about server and database administration. And so, it is entirely possible to drive an entire ERD system or a DW code generator from just the INFORMATION_SCHEMA views (and I have done the later). On the other hand, my operations automation procedures have to rely on the system objects and procedures, just like sp_SpaceUsed.

    A couple of things we do agree on is that Microsoft, like any company of their stature, does take the user requests into consideration a lot less than the users would like (although the uprising in support of FoxPro several years back seemed to work).

    That one did suprise a lot of people. Mostly because so many others had failed under similar circumstances.

    Of course, there is nothing more empowering for users and customers than strong support for standards. If just two viable vendors were fully ANSI compliant, the OMB could start to push for compliance as a requirement for Federal purchases and THAT would change the whole ball game. But the big vendors recognize that that is not in their best intrest and so nothing happens.

    Of course, there's a lot of appathy on the part of users, too... look at the Poll for SP3 on 2k5... There's almost 3/4 of a million subscribers on the site and just a bit more than, what, a couple hundred votes on the Poll? That's disgusting.

    Exactly the problem that we have with standards.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Compare that to where C/C++ programmers have been for over 15 years: they can write new code targeted to run on multiple different platforms right from the start AND expect it to work

    So, there's absolutely nothing in C/C++ that is ever outside the standards amongst different vendors? I find that very difficult to believe in this very competative world.

    Standards are nice... but if you never have anything that's non-standard, then you have nothing new to become standard in the future. I'm going to keep pushing the envelope. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (3/15/2008)


    Compare that to where C/C++ programmers have been for over 15 years: they can write new code targeted to run on multiple different platforms right from the start AND expect it to work

    So, there's absolutely nothing in C/C++ that is ever outside the standards amongst different vendors? I find that very difficult to believe in this very competative world.

    Standards are nice... but if you never have anything that's non-standard, then you have nothing new to become standard in the future. I'm going to keep pushing the envelope. 😉

    I am not willing to shell out money to get my hands on the actual SQL standard from ANSI, but considering some of the dreck that's been coming out of a lot of the so-called "working groups" (XPATH/XQUERY or CSS, to name a few) - I have to say that a healthy amount of skepticism ought to be applied to those "standards". I don't know how many of those drafts have come out where they just can't manage to put out a consistent document, and I keep hearing the same criticism about the "standard" SQL: heavy on syntax rules, but light on the semantics backing the syntax up.

    I mean - really. At what point did ANSI become the one and only authority on what SHOULD be in structured query language? (they're not since to the best of my knowledge - there are still 2 bodies trying to put out standards on SQL). And - who's out "proving" that the so-called experts are in fact proposing what really IS the very best way?

    Are they really that much better at figuring this out then the rest of us? Or is this another occasion where we're getting theory shoved at us that has no chance at a decent implementation based on how hardware and software works these days?

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • I am surprised (being introduced to other DBMS before SS) that the location of the predicate relating the summarised aggregate sub query and the table for update had any effect on the query execution.

    Whether expressed in the where clause of the correlated sub query referring back to the table being updated or returned from the derived sub query and specified in a join would have any effect on the query execution. For me they are different semantics for the same logical update.

    Thanks for pointing out that when the join predicate is more appropriate than a correlated sub query where predicate when the column is not covered by an appropriate index.

Viewing 15 posts - 31 through 45 (of 76 total)

You must be logged in to reply to this topic. Login to reply