Question regarding the difference between two different ways to populate a new table in TSQL

  • Hi. I was recently told that the following block of code:

    select a.customerid,

    a.customerName

    into NewTable

    from(

    select customerid,

    customername

    from database.dbo.customer

    union

    select customerid,

    customername

    from database_1.dbo.customer

    ) a

    is better performing and preferred to:

    select customerid,

    customername

    into NewTable

    from database.dbo.customer

    union

    select customerid,

    customername

    from database_1.dbo.customer

    Can anyone comment on this and help me understand why putting the query in a subquery is better performing or preferable to just putting the results into a table? Thank you.

  • How did you test the preformance?

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • I checked the execution plans of each, and they are the same. I am of the opinion that the subquery is not going to improve performance of the query, and the execution plans seem to prove me correct. I prefer the straight "select into" version because it is less code and, in my opinion, is a cleaner version.

    I was wondering if anyone in the community knows why I would be told, by a database architect, that the subquery version of this query would be better performing. I was forced to change all of my queries to use the subquery version, rather than a straight select into. I was not given a reason that satisfied me, so I am hoping someone in this community can tell me why I would be directed in this manner.

    Thank you for your time.

  • Cathy DePaolo (8/22/2014)


    I checked the execution plans of each, and they are the same. I am of the opinion that the subquery is not going to improve performance of the query, and the execution plans seem to prove me correct. I prefer the straight "select into" version because it is less code and, in my opinion, is a cleaner version.

    I was wondering if anyone in the community knows why I would be told, by a database architect, that the subquery version of this query would be better performing. I was forced to change all of my queries to use the subquery version, rather than a straight select into. I was not given a reason that satisfied me, so I am hoping someone in this community can tell me why I would be directed in this manner.

    Thank you for your time.

    Because that is the syntax the architect is comfortable with. There is no valid reason that one is preferred over the other. Those two queries are functionally equivalent and it is no surprise that the execution plans are identical.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Cathy DePaolo (8/22/2014)


    I checked the execution plans of each, and they are the same. I am of the opinion that the subquery is not going to improve performance of the query, and the execution plans seem to prove me correct. I prefer the straight "select into" version because it is less code and, in my opinion, is a cleaner version.

    I was wondering if anyone in the community knows why I would be told, by a database architect, that the subquery version of this query would be better performing. I was forced to change all of my queries to use the subquery version, rather than a straight select into. I was not given a reason that satisfied me, so I am hoping someone in this community can tell me why I would be directed in this manner.

    Thank you for your time.

    Your architect is incorrect as others have pointed out and the time you spent changing all of your relevant queries was pointlessly wasted.

    The exercise isn't a complete waste of time however - you're hopefully encouraged by the responses here to ask the next self-appointed expert who comes along to prove their crazy-*** theory before you waste good company time implementing it. You might also question what significant opportunities for improvement this particular database architect was failing to see.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ChrisM@Work (8/22/2014)


    Cathy DePaolo (8/22/2014)


    I checked the execution plans of each, and they are the same. I am of the opinion that the subquery is not going to improve performance of the query, and the execution plans seem to prove me correct. I prefer the straight "select into" version because it is less code and, in my opinion, is a cleaner version.

    I was wondering if anyone in the community knows why I would be told, by a database architect, that the subquery version of this query would be better performing. I was forced to change all of my queries to use the subquery version, rather than a straight select into. I was not given a reason that satisfied me, so I am hoping someone in this community can tell me why I would be directed in this manner.

    Thank you for your time.

    Your architect is incorrect as others have pointed out and the time you spent changing all of your relevant queries was pointlessly wasted.

    The exercise isn't a complete waste of time however - you're hopefully encouraged by the responses here to ask the next self-appointed expert who comes along to prove their crazy-*** theory before you waste good company time implementing it. You might also question what significant opportunities for improvement this particular database architect was failing to see.

    +1000

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Cathy DePaolo (8/22/2014)


    I checked the execution plans of each, and they are the same. I am of the opinion that the subquery is not going to improve performance of the query, [font="Arial Black"]and the execution plans seem to prove me correct.[/font] I prefer the straight "select into" version because it is less code and, in my opinion, is a cleaner version.

    I was wondering if anyone in the community knows why I would be told, by a database architect, that the subquery version of this query would be better performing. I was forced to change all of my queries to use the subquery version, rather than a straight select into. I was not given a reason that satisfied me, so I am hoping someone in this community can tell me why I would be directed in this manner.

    Thank you for your time.

    Gosh, be careful now. In this case, it works out, but the Execution plans never [font="Arial Black"]prove [/font]anything having to do with performance. Not even the ACTUAL execution plans. The numbers that show up in an execution plan are frequently just estimates and the "% of batch" number can be 100% "wrong" (if not wrong, then very deceiving.

    First, build the following test table so we have some known example data to work with...

    /**********************************************************************************************************************

    Purpose:

    Create a voluminous test table with various types of highly randomized data.

    --Jeff Moden

    **********************************************************************************************************************/

    --===== Conditionally drop the test table to make reruns easier

    IF OBJECT_ID('tempdb..#JBMTest','U') IS NOT NULL

    DROP TABLE #JBMTest

    ;

    --===== Create and populate a 1,000,000 row test table.

    -- "SomeID" has a range of 1 to 1,000,000 unique numbers

    -- "SomeInt" has a range of 1 to 50,000 non-unique numbers

    -- "SomeLetters2" has a range of "AA" to "ZZ" non-unique 2 character strings

    -- "SomeMoney has a range of 0.00 to 100.00 non-unique numbers

    -- "SomeDateTime" has a range of >=01/01/2000 and <01/01/2020 non-unique date/times

    -- "SomeDate" has a range of >=01/01/2000 and <01/01/2020 non-unique "whole dates"

    -- "SomeName" contains random characters at random lengths from 2 to 20 characters

    SELECT TOP 1000000

    SomeID = IDENTITY(INT,1,1),

    SomeInt = ABS(CHECKSUM(NEWID())) % 50000 + 1,

    SomeLetters2 = CHAR(ABS(CHECKSUM(NEWID())) % (ASCII('Z')-ASCII('A')+1) +ASCII('A'))

    + CHAR(ABS(CHECKSUM(NEWID())) % (ASCII('Z')-ASCII('A')+1) +ASCII('A')),

    SomeMoney = CAST(RAND(CHECKSUM(NEWID())) * 100 AS DECIMAL(9,2)), --Note rounding

    SomeDateTime = RAND(CHECKSUM(NEWID())) * DATEDIFF(dd,'2000','2020') + CAST('2000' AS DATETIME),

    SomeDate = ABS (CHECKSUM(NEWID())) % DATEDIFF(dd,'2000','2020') + CAST('2000' AS DATETIME),

    SomeName = RIGHT(NEWID(),ABS(CHECKSUM(NEWID())) % 19 + 2)

    INTO #JBMTest

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2

    ;

    ALTER TABLE #JBMTest

    ADD CONSTRAINT PK_#JBMTest PRIMARY KEY CLUSTERED (SomeID)

    ;

    Then, from the same window, run this code with the actual execution plan turned on where the second snippet actually takes twice as long as the first.

    --===== Execution plan says these are the same...

    -- Are they?

    SET STATISTICS TIME,IO ON;

    SELECT TOP 10

    *

    FROM #JBMTest

    ORDER BY SomeDate;

    SET STATISTICS TIME OFF;

    PRINT '=================================================='

    SET STATISTICS TIME,IO ON;

    SET ROWCOUNT 10

    SELECT *

    FROM #JBMTest

    ORDER BY SomeDate

    SET ROWCOUNT 0

    SET STATISTICS TIME OFF;

    PRINT '=================================================='

    Both pieces of code return exactly the same information from the test table. The Actual Execution plan says that both will take "50% of batch". But if you look at the "Messages" tab, you see the true story.

    (10 row(s) affected)

    Table '#JBMTest____________________________________________________________________________________________________________000000018739'. Scan count 5, logical reads 6831, physical reads 0, read-ahead reads 12, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    [font="Arial Narrow"] CPU time = 764 ms, elapsed time = 291 ms.[/font]

    ==================================================

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    (10 row(s) affected)

    Table '#JBMTest____________________________________________________________________________________________________________000000018739'. Scan count 1, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    [font="Arial Black"] CPU time = 1716 ms, elapsed time = 1584 ms.[/font]

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    ==================================================

    The "% of batch% can actually be 100% wrong as in the following code example. If you don't already have a Tally table, this code will build one for you. If you already have one, then skip that part...

    This is to build the Tally table...

    --===================================================================

    -- Create a Tally table from 1 to 11000

    --===================================================================

    --===== Create and populate the Tally table on the fly.

    SELECT TOP 11000

    IDENTITY(INT,1,1) AS N

    INTO dbo.Tally

    FROM Master.sys.ALL_Columns ac1

    CROSS JOIN Master.sys.ALL_Columns ac2

    ;

    --===== Add a CLUSTERED Primary Key to maximize performance

    ALTER TABLE dbo.Tally

    ADD CONSTRAINT PK_Tally_N

    PRIMARY KEY CLUSTERED (N) WITH FILLFACTOR = 100

    ;

    --===== Allow the general public to use it

    GRANT SELECT ON dbo.Tally TO PUBLIC

    ;

    Using the Tally table above, this next bit of code shows that the execution plan can be quite deceiving. It appears to say that the first query will take 0% of the batch and the second query takes 100% when nearly the exact opposite is true.

    /****************************************************************************************

    Purpose:

    This code demonstrates that the estimated and actual execution plans in SQL Server can

    be 100% INCORRECT and that the execution plan should only be relied on to provide hints

    as to what may be wrong with a query rather than an absolute indication. This code runs

    in SQL Server 2005 and up only.

    The code creates a temp table for 10 years worth of dates starting with 2000-01-01 using

    two different methods. The first method uses a recursive CTE and the second method uses

    an on-the-fly creation of a "Tally" table. The output of each method is directed to a

    "throw-away" variable to take display delays out of the picture.

    Please see the separate documentation included in this package for addition explanations

    and a copy of the execution actual execution plan.

    --Jeff Moden (28 Sep 2009)

    ****************************************************************************************/

    SET NOCOUNT ON

    --=======================================================================================

    -- Recursive method shown by (Name with-held)

    --=======================================================================================

    PRINT '========== Recursive method =========='

    --===== Turn on some performance counters ===============================================

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    DECLARE @Bitbucket DATETIME --Holds display output so display times aren't measured.

    --===== Execute the code being tested ===================================================

    DECLARE @DateVal DATETIME

    SET @DateVal = '2000-01-01'

    ;with mycte as

    (

    select @DateVal AS DateVal

    union all

    select DateVal + 1

    from mycte

    where DateVal + 1 < DATEADD(yy, 30, @DateVal)

    )

    select @Bitbucket = d.dateval

    from mycte d

    OPTION (MAXRECURSION 0)

    --===== Turn off the performance counters and print a separator =========================

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    PRINT REPLICATE('=',90)

    GO

    --=======================================================================================

    -- Tally table method by Jeff Moden

    --=======================================================================================

    PRINT '========== Tally table method =========='

    --===== Turn on some performance counters ===============================================

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    DECLARE @Bitbucket DATETIME --Holds display output so display times aren't measured.

    --===== Execute the code being tested ===================================================

    DECLARE @StartDate AS DATETIME

    SET @StartDate = '2000-01-01'

    SELECT TOP (DATEDIFF(dd,@StartDate,DATEADD(yy,30,@StartDate)))

    @Bitbucket = @StartDate-1+t.N

    FROM dbo.Tally t

    ORDER BY N

    --===== Turn off the performance counters and print a separator =========================

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    PRINT REPLICATE('=',90)

    GO

    Here are the run results. Notice that what was marked as 0% of batch takes almost all of the time and that which was marked as 100% of batch takes almost zero time.

    ========== Recursive method ==========

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    Table 'Worktable'. Scan count 2, logical reads 65749, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    [font="Arial Black"] CPU time = 93 ms, elapsed time = 109 ms.[/font]

    ==========================================================================================

    ========== Tally table method ==========

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    Table 'Tally'. Scan count 1, logical reads 20, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    [font="Arial Black"] CPU time = 0 ms, elapsed time = 2 ms.[/font]

    ==========================================================================================

    The reason for that disparity in that last example is that the execution plan is for just the first iteration of the Recursive CTE.

    The ONLY way to prove that one method is better than the other is to run them and measure them.

    As to the examples you gave, I join the others in asking what proof the DA has than one should be better than the other. As a man much wiser than I once said, "A single test is worth a thousand expert opinions". With that in mind, here's a test for what you posted...

    First, here's the code to build 2 test DBs and populate them with test data...

    --=====================================================================================================================

    -- Create two test DBs. These may be dropped in their entirety when done testing

    --=====================================================================================================================

    /***********************

    DROP DATABASE TestDB;

    DROP DATABASE TestDB1

    DROP TABLE #TestData

    ***********************/

    CREATE DATABASE TestDB;

    CREATE DATABASE TestDB1;

    GO

    --=====================================================================================================================

    -- Create the test data in each DB

    --=====================================================================================================================

    --===== Create a common collection of data so that the random names are the same for each ID later on.

    SELECT TOP 1000000

    CustomerID = IDENTITY(INT,1,1)

    ,CustomerName = CAST(NEWID() AS VARCHAR(36))

    INTO #TestData

    FROM master.sys.all_columns ac1

    CROSS JOIN master.sys.all_columns ac2

    ;

    --===== Randomly distribute half into one database

    SELECT TOP 500000

    *

    INTO TestDB.dbo.Customer

    FROM #TestData

    ORDER BY NEWID()

    ;

    --===== Randomly distribute half into the other database.

    -- Note that this will be some overlap with the other database.

    SELECT TOP 500000

    *

    INTO TestDB1.dbo.Customer

    FROM #TestData

    ORDER BY NEWID()

    ;

    --===== Do a little cleanup

    DROP TABLE #TestData

    ;

    And here's the code to test the two methods...

    --=====================================================================================================================

    -- Test each method from the original post for performance

    --=====================================================================================================================

    SET STATISTICS TIME,IO ON

    ;

    --===== The altered method

    select a.customerid,

    a.customerName

    into NewTable

    from(

    select customerid,

    customername

    from TestDB.dbo.customer

    union

    select customerid,

    customername

    from TestDB1.dbo.customer

    ) a

    ;

    SET STATISTICS TIME,IO OFF

    ;

    DROP TABLE dbo.NewTable

    SET STATISTICS TIME,IO ON

    ;

    --===== Original method

    select customerid,

    customername

    into NewTable1

    from TestDB.dbo.customer

    union

    select customerid,

    customername

    from TestDB1.dbo.customer

    ;

    SET STATISTICS TIME,IO OFF

    ;

    DROP TABLE dbo.NewTable1

    ;

    PRINT '============================================================================================='

    PRINT '============================================================================================='

    GO 5

    What I get on my machine is two nearly identical scripts taking turns winning except for the first iteration which doesn't count in this kind of testing.

    Beginning execution loop

    Table 'Customer'. Scan count 5, logical reads 3291, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Customer'. Scan count 5, logical reads 3291, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 2184 ms, elapsed time = 1230 ms.

    (750114 row(s) affected)

    Table 'Customer'. Scan count 5, logical reads 3291, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Customer'. Scan count 5, logical reads 3291, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 2089 ms, elapsed time = 852 ms.

    (750114 row(s) affected)

    =============================================================================================

    =============================================================================================

    Table 'Customer'. Scan count 5, logical reads 3291, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Customer'. Scan count 5, logical reads 3291, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 1982 ms, elapsed time = 1152 ms.

    (750114 row(s) affected)

    Table 'Customer'. Scan count 5, logical reads 3291, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Customer'. Scan count 5, logical reads 3291, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 2061 ms, elapsed time = 1136 ms.

    (750114 row(s) affected)

    =============================================================================================

    =============================================================================================

    Table 'Customer'. Scan count 5, logical reads 3291, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Customer'. Scan count 5, logical reads 3291, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 2057 ms, elapsed time = 903 ms.

    (750114 row(s) affected)

    Table 'Customer'. Scan count 5, logical reads 3291, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Customer'. Scan count 5, logical reads 3291, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 2045 ms, elapsed time = 1089 ms.

    (750114 row(s) affected)

    =============================================================================================

    =============================================================================================

    Table 'Customer'. Scan count 5, logical reads 3291, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Customer'. Scan count 5, logical reads 3291, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 2012 ms, elapsed time = 1036 ms.

    (750114 row(s) affected)

    Table 'Customer'. Scan count 5, logical reads 3291, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Customer'. Scan count 5, logical reads 3291, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 1995 ms, elapsed time = 1012 ms.

    (750114 row(s) affected)

    =============================================================================================

    =============================================================================================

    Table 'Customer'. Scan count 5, logical reads 3291, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Customer'. Scan count 5, logical reads 3291, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 2013 ms, elapsed time = 1089 ms.

    (750114 row(s) affected)

    Table 'Customer'. Scan count 5, logical reads 3291, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Customer'. Scan count 5, logical reads 3291, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 2090 ms, elapsed time = 1081 ms.

    (750114 row(s) affected)

    =============================================================================================

    =============================================================================================

    Batch execution completed 5 times.

    I have to agree with the others... changing the code on the recommendation from the DA was a waste of time on my machine. That, notwithstanding, neither you nor the DA will know for sure until you run this test and others (for different scenarios).

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • ChrisM@Work (8/22/2014)


    Cathy DePaolo (8/22/2014)


    I checked the execution plans of each, and they are the same. I am of the opinion that the subquery is not going to improve performance of the query, and the execution plans seem to prove me correct. I prefer the straight "select into" version because it is less code and, in my opinion, is a cleaner version.

    I was wondering if anyone in the community knows why I would be told, by a database architect, that the subquery version of this query would be better performing. I was forced to change all of my queries to use the subquery version, rather than a straight select into. I was not given a reason that satisfied me, so I am hoping someone in this community can tell me why I would be directed in this manner.

    Thank you for your time.

    Your architect is incorrect as others have pointed out and the time you spent changing all of your relevant queries was pointlessly wasted.

    The exercise isn't a complete waste of time however - you're hopefully encouraged by the responses here to ask the next self-appointed expert who comes along to prove their crazy-*** theory before you waste good company time implementing it. You might also question what significant opportunities for improvement this particular database architect was failing to see.

    As already said, +1000!

    There's nothing wrong with challenging a DA when it comes to performance because, and with absolutely no malice in my heart, DAs don't actually need to know how to write a lick of code, never mind high performance code, to do their jobs correctly.

    It's a real shame that this DA didn't do a substantial bit of testing before issuing this replacement edict. Reminds me of the "experts" that one of my previous companies hired before I joined... they recommended replacing all cursors with Temp Tables and WHILE loops. No one asked for or offered proof. They spent 6 months doing the replacements and the code actually ran slower.

    Titles mean nothing in this area. Trust no one when it comes to such recommendations (not even me). Always ask to see some substantial coded proof or create the tests yourself. You could save the company a whole lot of time and aggravation. If you take the time to do some additional "what if" experimentation in the process, you might actually come up with a way to make a real improvement in performance.

    Sorry... I get ticked at such "experts". I'll put the soapbox away now.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Sean Lange (8/22/2014)


    There is no valid reason that one is preferred over the other.

    Actually, there is 😀 ... the first code is both shorter to write and easier to read, not to mention that the original code didn't cost extra to have a change put in that didn't matter. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • BTW, for anyone interested but especially for Cathy...

    The following two articles explain the basics of how to quickly and easily build a million rows of test data and the resulting code only takes seconds to execute so you can test over and over, if need be. Let your imagination run wild.

    http://www.sqlservercentral.com/articles/Data+Generation/87901/

    http://www.sqlservercentral.com/articles/Test+Data/88964/

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (8/23/2014)


    DAs don't actually need to know how to write a lick of code, never mind high performance code, to do their jobs correctly.

    Hmm???

    😎

    BTW +100 for the soapbox speach

  • Jeff Moden (8/23/2014)


    There's nothing wrong with challenging a DA when it comes to performance

    up to there I agree with you 100%

    because, and with absolutely no malice in my heart, DAs don't actually need to know how to write a lick of code, never mind high performance code, to do their jobs correctly.

    And I don't agree with that at all; a DA had better be able to produce a schema design which enables high-performance code to be written, and if the DA knows nothing about code and in particular nothing about high performance code that DA is not going to be able to do that.

    Tom

  • Jeff Moden (8/23/2014)


    Cathy DePaolo (8/22/2014)


    I checked the execution plans of each, and they are the same. I am of the opinion that the subquery is not going to improve performance of the query, [font="Arial Black"]and the execution plans seem to prove me correct.[/font] I prefer the straight "select into" version because it is less code and, in my opinion, is a cleaner version.

    I was wondering if anyone in the community knows why I would be told, by a database architect, that the subquery version of this query would be better performing. I was forced to change all of my queries to use the subquery version, rather than a straight select into. I was not given a reason that satisfied me, so I am hoping someone in this community can tell me why I would be directed in this manner.

    Thank you for your time.

    Gosh, be careful now. In this case, it works out, but the Execution plans never [font="Arial Black"]prove [/font]anything having to do with performance. Not even the ACTUAL execution plans.

    Sorry, but I'm going to disagree with you there.

    If two queries have identical execution plans, then they will (in the absence of blocking or waits) perform identically.

    I don't mean same costs (costs are always estimated), I mean identical plans. If the operators are the same and the row counts are the same and the number of executions for each operator is the same, then the two queries will be executed the same way because the plan is the 'recipe' given to the execution engine.

    It's a good way to tell if two different ways of writing a query are equivalent. If they produce identical plans, then they have to be logically equivalent.

    I enjoy doing this to people who insist on 'subqueries in the from clause are BAD' or 'subqueries in the from clause are good' (I've seen both recently), as I can just write the queries with and without the subquery (logically equivalent ways) and then show that the execution plans are absolutely, 100%, completely identical in every way and hence there's no difference between the two queries by the time they reach the query execution engine.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • TomThomson (8/24/2014)


    Jeff Moden (8/23/2014)


    There's nothing wrong with challenging a DA when it comes to performance

    up to there I agree with you 100%

    because, and with absolutely no malice in my heart, DAs don't actually need to know how to write a lick of code, never mind high performance code, to do their jobs correctly.

    And I don't agree with that at all; a DA had better be able to produce a schema design which enables high-performance code to be written, and if the DA knows nothing about code and in particular nothing about high performance code that DA is not going to be able to do that.

    We wouldn't want to leave the design of the internals of a building to the builders would we?

    😎

  • Jeff Moden (8/23/2014)


    Sean Lange (8/22/2014)


    There is no valid reason that one is preferred over the other.

    Actually, there is 😀 ... the first code is both shorter to write and easier to read, not to mention that the original code didn't cost extra to have a change put in that didn't matter. 😉

    The "valid" reasons here are subject to personal preference (other then fewer keystrokes). I would not have changed the original because I too think that is easier to read. I certainly would never force somebody to change their code because I don't like the style in which they wrote it. Now if performance was measurably an issue that would be another story.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply