Expert Performance Indexing in Azure SQL and SQL Server 2022

Question

Expert Performance Indexing in Azure SQL and SQL Server 2022

Site Owners

SSC Guru

Points: 80342
More actions
March 17, 2023 at 12:00 am

#4155550

Comments posted to this topic are about the item Expert Performance Indexing in Azure SQL and SQL Server 2022

January 9, 2024 at 7:37 am

This was removed by the editor as SPAM

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply

CozzaroNero SSC-Addicted Points: 467 More actions · Answer 1

When it’s written “free ebook to download”, I’m expecting to be free of cost not freely able to download and pay for a ebook…

Jeff Moden SSC Guru Points: 1004591 More actions Follow · Answer 2

CozzaroNero wrote:

When it’s written “free ebook to download”, I’m expecting to be free of cost not freely able to download and pay for a ebook…

I was just going to write something similar. Not a book... the comment above.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Steve Jones - SSC Editor SSC Guru Points: 737961 More actions · Answer 3

Steve Jones - SSC Editor

SSC Guru

Points: 737961

March 31, 2023 at 3:45 pm

#4167626

Apologies, put the URL in the wrong field.

Jeff Moden SSC Guru Points: 1004591 More actions Follow · Answer 4

Thanks, Steve.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

carsten.saastamoinen Old Hand Points: 341 More actions · Answer 5

There is a great problem in the book, when we are talking about performance. In a HEAP - on page 3 - data is NOT ordered as inserted. We reuse space on already allocated pages if possible, or allocate a new page elsewhere in the file - maybe earlier, because of drop of another object or maybe later in the file. It is why the execution time can be high when inserting into a HEAP. The following - not optimized - small script shows the principle used for a HEAP.

USE master;
GO
DROP DATABASE IF EXISTS TestDB;
GO
CREATE DATABASE TestDB;
GO
USE TestDB;

CREATE TABLE  dbo.HeapData
(
ID INTNOT NULLIDENTITY
CONSTRAINT PK_HeapDataPRIMARY KEY NONCLUSTERED,
Txt VARCHAR (1000) NOT NULL
);

CREATE TABLE dbo.t1
(
ID INT NOT NULLIDENTITY
CONSTRAINT PK_t1 PRIMARY KEY,
Txt CHAR (1000) NOT NULL
);
GO
SET NOCOUNT ON;
GO
INSERT INTO dbo.t1 (Txt) VALUES
('xxxx');
GO 5000
INSERT INTO dbo.HeapData (Txt) VALUES
(REPLICATE ('xxxxxxxxxx', 90));
GO 50
INSERT INTO dbo.t1 (Txt) 
SELECT Txt
FROM dbo.t1;
GO 5
INSERT INTO dbo.HeapData (Txt) VALUES
(REPLICATE ('xxxxxxxxxx', 70));
GO 5000
SELECTFPLC.file_id, FPLC.page_id, COUNT (*)
FROM dbo.HeapData CROSS APPLY sys.fn_PhysLocCracker (%%physloc%%) AS FPLC
GROUP BY FPLC.file_id, FPLC.page_id
ORDER BY FPLC.file_id, FPLC.page_id;
GO
DELETE 
FROM dbo.HeapData
WHERE ID % 2 = 1;

CHECKPOINT;
GO
INSERT INTO dbo.HeapData (Txt) VALUES
(REPLICATE ('xxxxxxxxxx', 20));
GO 5000
SELECTFPLC.file_id, FPLC.page_id, COUNT (*)
FROM dbo.HeapData CROSS APPLY sys.fn_PhysLocCracker (%%physloc%%) AS FPLC
GROUP BY FPLC.file_id, FPLC.page_id
ORDER BY FPLC.file_id, FPLC.page_id;
GO
DROP TABLE dbo.t1;
GO
CHECKPOINT;
WAITFOR DELAY '0:0:20';
GO
INSERT INTO dbo.HeapData (Txt)
SELECT Txt
FROM dbo.HeapData;
GO 2
SELECTFPLC.file_id, FPLC.page_id, COUNT (*)
FROM dbo.HeapData CROSS APPLY sys.fn_PhysLocCracker (%%physloc%%) AS FPLC
GROUP BY FPLC.file_id, FPLC.page_id
ORDER BY FPLC.file_id, FPLC.page_id;

SELECT HeapData.ID, FPLC.file_id, FPLC.page_id
FROM dbo.HeapData CROSS APPLY sys.fn_PhysLocCracker (%%physloc%%) AS FPLC
ORDER BY FPLC.file_id, FPLC.page_id;

SELECT HeapData.ID, FPLC.file_id, FPLC.page_id
FROM dbo.HeapData CROSS APPLY sys.fn_PhysLocCracker (%%physloc%%) AS FPLC
ORDER BY HeapData.ID;
GO
-- file_id is ignored because we only have one file
-- page_id_diff shows both positive and negative values 
-- the next row in ID order is both on later and earlier pages

WITH Data
AS
(
SELECT HeapData.ID, FPLC.file_id, FPLC.page_id
FROM dbo.HeapData CROSS APPLY sys.fn_PhysLocCracker (%%physloc%%) AS FPLC
)
SELECT d1.ID, d2.ID, d1.page_id, d2.page_id,  d2.page_id - d1.page_id AS page_id_diff 
FROM Data AS d1 INNER JOIN Data AS d2 ONd1.ID = d2.ID - 1 OR 
d1.ID = d2.ID - 2-- we delete ID % 2 = 1 rows
WHEREd1.file_id = d2.file_id AND 
d1.page_id <> d2.page_id;

This reply was modified 1 year, 4 months ago by carsten.saastamoinen.

Jeff Moden SSC Guru Points: 1004591 More actions Follow · Answer 6

That's great but without the book, the uncommented code you posted doesn't explain at all what the code is supposed to be teaching.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

carsten.saastamoinen Old Hand Points: 341 More actions · Answer 7

I strongly disagree. The book is Advanced/Expert, so I have to assume that it is not only read by beginners. So my code and example is just a simple example.

It is clearly stated on page 3 that data appears physically in the same order as it is inserted. This may, among other things, be the reason why I see statements such as that it is not necessary to sort data, since they are accessed in the same order as inserted, that using HEAP is fast, since data does not have to be inserted at a specific position but just lastly, .... I have a script that shows that when inserting millions of rows, using HEAP will cause double the execution time compared to a Clustered index.

Unfortunately, I also still see that in various posts it is written that data in a Clustered index is physically stored according to the Cluster Key. With a simple SELECT with table hint NOLOCK, it can be clearly seen that data in a Clustered Index is only logically ordered by Cluster Key - IAM and not Cluster Key is used to access data.

Incidentally, it is decidedly problematic when someone writes first and last in a table. Since a table always consists of unordered rows, the result is only sorted if ORDER BY is specified at the same time. It is still seen that DISTINCT or GROUP BY results in a sorted result - a serious error that will not be corrected by repeating.

If on page 3 such a serious error is brushed off, there is reason to react. And there is no need to read further.

Steve Jones - SSC Editor SSC Guru Points: 737961 More actions · Answer 8

A few comments

First, if you have issues with the book, they ought to be reported to the author/publisher.

Second, many books, and much code, has errors. It could be early in the code or later, but an error in one place doesn't make the entire thing wrong.

Third, the clustered index does store data in physical order on pages. The pagse/extents could get out of order, which is fragmentation, but that doesn't change the physical nature of storing the data. It is stored in the clustered key order. I think this might be more a verbiage argument, but I'd be interested in a repro that shows differently. Or an article if you feel people are misinformed.

carsten.saastamoinen Old Hand Points: 341 More actions · Answer 9

When the book is profiled several times on your site, it must also be reasonable that it is the same place with the same readers who are warned. It is not just a small error in some code, but a serious error in terms of understanding. The first row inserted is the first row in the table. The second row inserted is the second row in the table. The third row is the third row in the table. It could be simply expressed by the fact that all rows that are inserted are inserted into already allocated pages. If there is no space, a new page is allocated. This way, misunderstandings can be avoided and wrong claims can be avoided.

It is correct that data is physically arranged within a page when a Clustered Index is defined. But not within an extent and within the files the table is stored in. If a table is stored on 20,000 pages, it can hardly be called a physical order if the first pages in Cluster Key order are page 1:245, the next is page 1 :9999, the third is page 1:27, etc. This can be easily proven by selecting data from a table where the hint NOLOCK is specified. Unfortunately, this hint is also standard for some.

But if your site cannot tolerate serious errors being pointed out, it just has to be taken into account.

carsten.saastamoinen Old Hand Points: 341 More actions · Answer 10

Some of the pages from my book published on Kindle.

This reply was modified 1 year, 4 months ago by carsten.saastamoinen.

Attachments:

You must be logged in to view attached files.

Jeff Moden SSC Guru Points: 1004591 More actions Follow · Answer 11

Steve Jones - SSC Editor wrote:

Third, the clustered index does store data in physical order on pages. The pagse/extents could get out of order, which is fragmentation, but that doesn't change the physical nature of storing the data. It is stored in the clustered key order. I think this might be more a verbiage argument, but I'd be interested in a repro that shows differently. Or an article if you feel people are misinformed.

After all this time, it's unfortunate to see just how many people believe in that myth. One was written quite nicely more than a decade ago, courtesy of Wayne Sheffield.

https://blog.waynesheffield.com/wayne/archive/2012/10/does-a-clustered-index-really-physically-store-the-rows-in-key-order/

The summary at the bottom of his article says it all as a tl;dr...

The slot array is always maintained in key value order, and the rows on each page are those that should be on that page as reflected by the key value of the row. However, the physical storage of the row on that page is not necessarily maintained in key value order. Changes to data so that the data no longer fits into its original space, or inserts that are not in the key value order can cause the actual data to not be physically stored in the key order and will be stored at the beginning of the unused space at the end of the page. When the unused space is consolidated, existing rows are moved up the page, but they are not resorted.
The same is true for non-clustered indexes.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

carsten.saastamoinen Old Hand Points: 341 More actions · Answer 12

"However, the physical storage of the row on that page is not necessarily maintained in key value order" - the only important is, that the rows only can be accessed in slot order and in this way in index key order. Data in extents and in the file can be accessed in Key order or in physical order by specifying NOLOCK. With NOLOCK SQL Server use the IAM.

This reply was modified 1 year, 4 months ago by carsten.saastamoinen.

Attachments:

You must be logged in to view attached files.

Steve Jones - SSC Editor SSC Guru Points: 737961 More actions · Answer 13

The storage on the page isn't important because you can't read a row. you can only read a page. Rows on the page can move if they are updated, but it's not a material difference in how the index performs.

Expert Performance Indexing in Azure SQL and SQL Server 2022

Attachments:

Attachments:

Cookies on SQLServerCentral