How to Unclog SQL Server

,

Not long ago, we were notified of a query timing out in an application after three minutes.  We were consulted to help find out why.  No, it was not SQL Server itself timing out, we explained.  SQL Server doesn’t understand what a time out is.  Something is just taking too long to run on the application, and we need to figure out what it is.

It didn’t take a lot of time to find the culprit.  It was a query that was bringing in gobs of data to SQL Server and choking the life out of it.  One of our DBAs got the same query to run in seconds.  How?  By using temp tables and bringing in the data more appropriately.

SQL Server is a powerful, set-based application, which can process a lot of data.  That processing capacity is not unlimited, however, and it doesn’t mean we shouldn’t be good citizens when we use it.  When processing huge chunks of data, keep the following in mind:

  • Filter early rather than late (columns and rows)! I’ll show you in the demo below how much it can help.
  • Remember that a badly constructed query can take down tempdb, or really slow it down.

Most people will either choose temporary tables, derived tables or table variables as homes for their temporary structures.  How do you know which to choose?  There is no magic (unless you’re Paul White). However, there are some guidelines:

  • Table variables are for when the data must survive a transaction rollback, or with SSIS, or to prevent unwanted recompiles, but know that the optimizer will assume that any query to the structure will return 1 row. That can be good, or really, really bad. This, by the way, will be addressed in the SQL Server 2019 release.s
  • Derived tables are great when needing to bring data locally.
  • Temp tables offer the advantage of indexing, which can really help. When in doubt, try this first
  • On rare occasions, I have even seen a stacked CTE outperform every other option.

Please copy and paste the scripts below, which uses AdventureWorks2012. Each should be run separately.

Script 1

Let's say that I need to get rowcounts of all the tables on the server with over 5000 rows.  I write a cursor to go out and get the rowcounts, then I filter for the rows over 5000.  What are my logical reads?

/* Script 1 -  Setup tables and SET changes */
DECLARE @DB INT = (SELECT DB_ID() AS AdventureWorks2012);
DBCC FLUSHPROCINDB(@DB);
GO
USE AdventureWorks2012;
GO
SET NOCOUNT ON;
SET STATISTICS TIME, IO ON;
IF OBJECT_ID('tempdb..#RowCountTablePrelim') IS NOT NULL
    DROP TABLE #RowCountTablePrelim;
IF OBJECT_ID('tempdb..#RowCountTableFinal') IS NOT NULL
    DROP TABLE #RowCountTableFinal;
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
CREATE TABLE #RowCountTablePrelim
(
    TableName sysname NULL,
    [RowCount] BIGINT NULL
);
CREATE TABLE #RowCountTableFinal
(
    TableName sysname NULL,
    [RowCount] BIGINT NULL
);

First, let's grab the row counts for all the tables on the server.  We only want tables big enough to consider (> 5000 rows). This will spit out separate row counts for each table partition as a separate entry, but we'll deal with that in a minute.

 

/* Script 2 - Add data */
DECLARE @dbName sysname;
DECLARE @dbCursor CURSOR;
DECLARE @dbSQL NVARCHAR(MAX);
SET @dbCursor = CURSOR FAST_FORWARD LOCAL FOR
SELECT name
FROM sys.databases
WHERE source_database_id IS NULL
      AND database_id > 4
      AND is_read_only = 0
      AND state_desc = 'ONLINE'
ORDER BY name;
OPEN @dbCursor;
FETCH NEXT FROM @dbCursor
INTO @dbName;
WHILE (@@FETCH_STATUS = 0)
BEGIN
    SET @dbSQL
        = N'USE [' + @dbName + N'];
       INSERT INTO #RowCountTablePrelim
       SELECT ''['  + @dbName
          + N']'' + ''.'' + ''['' + SCHEMA_NAME(schema_id) + '']'' + ''.'' + ''['' + t.name + '']'' AS TableName,
       SUM(ps.row_count) AS [RowCount]
       FROM sys.tables AS t
       INNER JOIN sys.dm_db_partition_stats AS ps
       ON t.object_id = ps.object_id
       AND ps.index_id < 2
    GROUP BY t.schema_id, t.name, ps.row_count;';
    PRINT @dbSQL;
    EXECUTE sys.sp_executesql @dbSQL;
    FETCH NEXT FROM @dbCursor
    INTO @dbName;
END;
CLOSE @dbCursor;
DEALLOCATE @dbCursor;

Let's clean up the counts by adding all the partition counts for each table into one final number

/* Script 3 - clean up the data and insert into tables */
INSERT INTO #RowCountTableFinal
(
    TableName,
    [RowCount]
)
SELECT TableName,
       SUM([RowCount])
FROM #RowCountTablePrelim
GROUP BY ROLLUP(TableName);
SELECT TableName,
       [RowCount]
FROM #RowCountTablePrelim
WHERE [RowCount] > 5000
ORDER BY [RowCount] DESC;

DROP TABLE #RowCountTableFinal;
DROP TABLE #RowCountTablePrelim;

Plugging the output of the STATS IO into Richie Rump’s Statistics Parser, we get the following:

Now, what if we had filtered early? Let's do that.

/* Script 4 - reload the data differently */
IF OBJECT_ID('tempdb..#RowCountTablePrelim') IS NOT NULL
    DROP TABLE #RowCountTablePrelim;
IF OBJECT_ID('tempdb..#RowCountTableFinal') IS NOT NULL
    DROP TABLE #RowCountTableFinal;
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
CREATE TABLE #RowCountTablePrelim
(
    TableName sysname NULL,
    [RowCount] BIGINT NULL
);
CREATE TABLE #RowCountTableFinal
(
    TableName sysname NULL,
    [RowCount] BIGINT NULL
);
DECLARE @db2Name sysname;
DECLARE @db2Cursor CURSOR;
DECLARE @db2SQL NVARCHAR(MAX);
SET @db2Cursor = CURSOR FAST_FORWARD LOCAL FOR
SELECT name
FROM sys.databases
WHERE source_database_id IS NULL
      AND database_id > 4
      AND is_read_only = 0
      AND state_desc = 'ONLINE'
ORDER BY name;
OPEN @db2Cursor;
FETCH NEXT FROM @db2Cursor
INTO @db2Name;
WHILE (@@FETCH_STATUS = 0)
BEGIN
    SET @db2SQL
        = N'USE [' + @db2Name + N'];
       INSERT INTO #RowCountTablePrelim
       SELECT ''['  + @db2Name
          + N']'' + ''.'' + ''['' + SCHEMA_NAME(schema_id) + '']'' + ''.'' + ''['' + t.name + '']'' AS TableName,
       SUM(ps.row_count) AS [RowCount]
       FROM sys.tables AS t
       INNER JOIN sys.dm_db_partition_stats AS ps
       ON t.object_id = ps.object_id
       AND ps.index_id < 2
       WHERE ps.row_count > 5000
       GROUP BY t.schema_id, t.name, ps.row_count;';
    PRINT @db2SQL;
    EXECUTE sys.sp_executesql @db2SQL;
    FETCH NEXT FROM @db2Cursor
    INTO @db2Name;
END;
CLOSE @db2Cursor;
DEALLOCATE @db2Cursor;
INSERT INTO #RowCountTableFinal
(
    TableName,
    [RowCount]
)
SELECT TableName,
       SUM([RowCount])
FROM #RowCountTablePrelim
GROUP BY ROLLUP(TableName);
SELECT TableName,
       [RowCount]
FROM #RowCountTablePrelim
--WHERE [RowCount] > 5000  _Filtered early instead of late
ORDER BY [RowCount] DESC;
DROP TABLE #RowCountTableFinal;
DROP TABLE #RowCountTablePrelim;

Now look:

Only 23% of the scans, and 19% of the logical reads!  I call that progress.

Why table variables can work against you

Let's try another look where we show why an actual execution plan is the best choice to understand what is happening.

SET STATISTICS TIME, IO OFF;
DECLARE @NotAGreatIdea TABLE
(
    RID INT IDENTITY(1, 1),
    Col1 UNIQUEIDENTIFIER
);
DECLARE @i INT = 1;
WHILE @i < 100
BEGIN
    INSERT INTO @NotAGreatIdea
    (
        Col1
    )
    VALUES
    (NEWID());
    SET @i = @i + 1;
END;
SELECT RID,
       Col1
FROM @NotAGreatIdea
WHERE RID < 40;

Here’s what you’ll get in the execution plan:

Not a really good guess. Imagine if you had a million rows…

Now, let's try a temp table

CREATE TABLE #ThisOughtaDoIt
(
    RID INT IDENTITY(1, 1),
    Col1 UNIQUEIDENTIFIER
);
GO
DECLARE @i INT = 1
WHILE @i < 100
 BEGIN
  INSERT INTO #ThisOughtaDoIt
  (
    Col1
  )
  VALUES
  (NEWID());
 SET @i = @i + 1
END
CREATE CLUSTERED INDEX cix_rid ON #ThisOughtaDoIt (RID ASC, Col1);
GO
SELECT RID, Col1    
FROM #ThisOughtaDoIt
WHERE RID < 40;
DROP TABLE #ThisOughtaDoIt;

Much better!

What about a derived table?

 

CREATE TABLE dbo.DerivedTableQuery (
    RID INT IDENTITY(1, 1),
    Col1 UNIQUEIDENTIFIER
);
GO
DECLARE @i INT = 1
WHILE @i < 100
 BEGIN
  INSERT INTO dbo.DerivedTableQuery
  (
      Col1
  )
  VALUES
  (NEWID());
  SET @i = @i + 1
 END
CREATE CLUSTERED INDEX cix_rid ON dbo.DerivedTableQuery (RID ASC, Col1);
GO
SELECT RID, Col1 FROM (SELECT RID, Col1 FROM dbo.DerivedTableQuery) AS dtq
WHERE RID < 40;
GO
DROP TABLE dbo.DerivedTableQuery;

And the results…

The only way you are really going to know which temporary structure to use is to test it out.  There is a time and place for all your options (and at least you have more than one option!).  Don’t be afraid to test them all, and find the best one before you deploy.

Rate

5 (2)

Share

Share

Rate

5 (2)