Expert Performance Indexing in Azure SQL and SQL Server 2022

  • Comments posted to this topic are about the item Expert Performance Indexing in Azure SQL and SQL Server 2022

  • When it’s written “free ebook to download”, I’m expecting to be free of cost not freely able to download and pay for a ebook…

  • CozzaroNero wrote:

    When it’s written “free ebook to download”, I’m expecting to be free of cost not freely able to download and pay for a ebook…

    I was just going to write something similar.  Not a book... the comment above. 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Apologies, put the URL in the wrong field.

  • Thanks, Steve.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • There is a great problem in the book, when we are talking about performance. In a HEAP - on page 3 - data is NOT ordered as inserted. We reuse space on already allocated pages if possible, or allocate a new page elsewhere in the file - maybe earlier, because of drop of another object or maybe later in the file. It is why the execution time can be high when inserting into a HEAP.  The following - not optimized - small script shows the principle used for a HEAP.

    USE master;
    GO
    DROP DATABASE IF EXISTS TestDB;
    GO
    CREATE DATABASE TestDB;
    GO
    USE TestDB;

    CREATE TABLE dbo.HeapData
    (
    ID INTNOT NULLIDENTITY
    CONSTRAINT PK_HeapDataPRIMARY KEY NONCLUSTERED,
    Txt VARCHAR (1000) NOT NULL
    );

    CREATE TABLE dbo.t1
    (
    ID INT NOT NULLIDENTITY
    CONSTRAINT PK_t1 PRIMARY KEY,
    Txt CHAR (1000) NOT NULL
    );
    GO
    SET NOCOUNT ON;
    GO
    INSERT INTO dbo.t1 (Txt) VALUES
    ('xxxx');
    GO 5000
    INSERT INTO dbo.HeapData (Txt) VALUES
    (REPLICATE ('xxxxxxxxxx', 90));
    GO 50
    INSERT INTO dbo.t1 (Txt)
    SELECT Txt
    FROM dbo.t1;
    GO 5
    INSERT INTO dbo.HeapData (Txt) VALUES
    (REPLICATE ('xxxxxxxxxx', 70));
    GO 5000
    SELECTFPLC.file_id, FPLC.page_id, COUNT (*)
    FROM dbo.HeapData CROSS APPLY sys.fn_PhysLocCracker (%%physloc%%) AS FPLC
    GROUP BY FPLC.file_id, FPLC.page_id
    ORDER BY FPLC.file_id, FPLC.page_id;
    GO
    DELETE
    FROM dbo.HeapData
    WHERE ID % 2 = 1;

    CHECKPOINT;
    GO
    INSERT INTO dbo.HeapData (Txt) VALUES
    (REPLICATE ('xxxxxxxxxx', 20));
    GO 5000
    SELECTFPLC.file_id, FPLC.page_id, COUNT (*)
    FROM dbo.HeapData CROSS APPLY sys.fn_PhysLocCracker (%%physloc%%) AS FPLC
    GROUP BY FPLC.file_id, FPLC.page_id
    ORDER BY FPLC.file_id, FPLC.page_id;
    GO
    DROP TABLE dbo.t1;
    GO
    CHECKPOINT;
    WAITFOR DELAY '0:0:20';
    GO
    INSERT INTO dbo.HeapData (Txt)
    SELECT Txt
    FROM dbo.HeapData;
    GO 2
    SELECTFPLC.file_id, FPLC.page_id, COUNT (*)
    FROM dbo.HeapData CROSS APPLY sys.fn_PhysLocCracker (%%physloc%%) AS FPLC
    GROUP BY FPLC.file_id, FPLC.page_id
    ORDER BY FPLC.file_id, FPLC.page_id;

    SELECT HeapData.ID, FPLC.file_id, FPLC.page_id
    FROM dbo.HeapData CROSS APPLY sys.fn_PhysLocCracker (%%physloc%%) AS FPLC
    ORDER BY FPLC.file_id, FPLC.page_id;

    SELECT HeapData.ID, FPLC.file_id, FPLC.page_id
    FROM dbo.HeapData CROSS APPLY sys.fn_PhysLocCracker (%%physloc%%) AS FPLC
    ORDER BY HeapData.ID;
    GO
    -- file_id is ignored because we only have one file
    -- page_id_diff shows both positive and negative values
    -- the next row in ID order is both on later and earlier pages

    WITH Data
    AS
    (
    SELECT HeapData.ID, FPLC.file_id, FPLC.page_id
    FROM dbo.HeapData CROSS APPLY sys.fn_PhysLocCracker (%%physloc%%) AS FPLC
    )
    SELECT d1.ID, d2.ID, d1.page_id, d2.page_id, d2.page_id - d1.page_id AS page_id_diff
    FROM Data AS d1 INNER JOIN Data AS d2 ONd1.ID = d2.ID - 1 OR
    d1.ID = d2.ID - 2-- we delete ID % 2 = 1 rows
    WHEREd1.file_id = d2.file_id AND
    d1.page_id <> d2.page_id;

  • That's great but without the book, the uncommented code you posted doesn't explain at all what the code is supposed to be teaching.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I strongly disagree. The book is Advanced/Expert, so I have to assume that it is not only read by beginners. So my code and example is just a simple example.

    It is clearly stated on page 3 that data appears physically in the same order as it is inserted. This may, among other things, be the reason why I see statements such as that it is not necessary to sort data, since they are accessed in the same order as inserted, that using HEAP is fast, since data does not have to be inserted at a specific position but just lastly, .... I have a script that shows that when inserting millions of rows, using HEAP will cause double the execution time compared to a Clustered index.

    Unfortunately, I also still see that in various posts it is written that data in a Clustered index is physically stored according to the Cluster Key. With a simple SELECT with table hint NOLOCK, it can be clearly seen that data in a Clustered Index is only logically ordered by Cluster Key - IAM and not Cluster Key is used to access data.

    Incidentally, it is decidedly problematic when someone writes first and last in a table. Since a table always consists of unordered rows, the result is only sorted if ORDER BY is specified at the same time. It is still seen that DISTINCT or GROUP BY results in a sorted result - a serious error that will not be corrected by repeating.

    If on page 3 such a serious error is brushed off, there is reason to react. And there is no need to read further.

  • A few comments

    First, if you have issues with the book, they ought to be reported to the author/publisher.

    Second, many books, and much code, has errors. It could be early in the code or later, but an error in one place doesn't make the entire thing wrong.

    Third, the clustered index does store data in physical order on pages. The pagse/extents could get out of order, which is fragmentation, but that doesn't change the physical nature of storing the data. It is stored in the clustered key order.  I think this might be more a verbiage argument, but I'd be interested in a repro that shows differently. Or an article if you feel people are misinformed.

  • When the book is profiled several times on your site, it must also be reasonable that it is the same place with the same readers who are warned. It is not just a small error in some code, but a serious error in terms of understanding. The first row inserted is the first row in the table. The second row inserted is the second row in the table. The third row is the third row in the table. It could be simply expressed by the fact that all rows that are inserted are inserted into already allocated pages. If there is no space, a new page is allocated. This way, misunderstandings can be avoided and wrong claims can be avoided.

    It is correct that data is physically arranged within a page when a Clustered Index is defined. But not within an extent and within the files the table is stored in. If a table is stored on 20,000 pages, it can hardly be called a physical order if the first pages in Cluster Key order are page 1:245, the next is page 1 :9999, the third is page 1:27, etc. This can be easily proven by selecting data from a table where the hint NOLOCK is specified. Unfortunately, this hint is also standard for some.

    But if your site cannot tolerate serious errors being pointed out, it just has to be taken into account.

  • Some of the pages from my book published on Kindle.

    Attachments:
    You must be logged in to view attached files.
  • Steve Jones - SSC Editor wrote:

    Third, the clustered index does store data in physical order on pages. The pagse/extents could get out of order, which is fragmentation, but that doesn't change the physical nature of storing the data. It is stored in the clustered key order.  I think this might be more a verbiage argument, but I'd be interested in a repro that shows differently. Or an article if you feel people are misinformed.

    After all this time, it's unfortunate to see just how many people believe in that myth.  One was written quite nicely more than a decade ago, courtesy of Wayne Sheffield.

    https://blog.waynesheffield.com/wayne/archive/2012/10/does-a-clustered-index-really-physically-store-the-rows-in-key-order/

    The summary at the bottom of his article says it all as a tl;dr...

    The slot array is always maintained in key value order, and the rows on each page are those that should be on that page as reflected by the key value of the row. However, the physical storage of the row on that page is not necessarily maintained in key value order. Changes to data so that the data no longer fits into its original space, or inserts that are not in the key value order can cause the actual data to not be physically stored in the key order and will be stored at the beginning of the unused space at the end of the page. When the unused space is consolidated, existing rows are moved up the page, but they are not resorted.

    The same is true for non-clustered indexes.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • This was removed by the editor as SPAM

  • "However, the physical storage of the row on that page is not necessarily maintained in key value order" - the only important is, that the rows only can be accessed in slot order and in this way in index key order. Data in extents and in the file can be accessed in Key order or in physical order by specifying NOLOCK. With NOLOCK SQL Server use the IAM.

     

    Attachments:
    You must be logged in to view attached files.
  • The storage on the page isn't important because you can't read a row. you can only read a page. Rows on the page can move if they are updated, but it's not a material difference in how the index performs.

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply