SQLServerCentral Article

Basic Checks on a New AWS RDS SQL Server

,

Introduction

In this article, we shall imagine that we were just recruited by a new company and handed over a new Amazon RDS SQL Server instance. Our first job would be to understand the instance and key databases, tables, and other server-level and database-level objects in the instance. Just for fun, we are going to do this purely using T-SQL from beginning to end. Here are some of the basic checks we shall do:

  1. Database list
  2. Size of databases
  3. Busiest databases
  4. Backup History
  5. Top tables by size

Connecting to the Amazon RDS Instance Using Azure Data Studio

In a previous article, we established a connection from an Amazon EC2 instance to our existing Amazon RDS instance. We shall use Azure Data Studio installed in this Amazon EC2 instance to explore the SQL Server instance (See Figure 1).

Amazon RDS - Interface for Azure Data Studio inside an Amazon EC2 instance Figure 1: Create a Connection Using Azure Data Studio

In the connection details pane, we provide the details given. Our “server” is the endpoint URL from our Amazon RDS console (Figure 2).

Graphical user interface, text, application, email Description automatically generated Figure 2: Instance Connection Parameters on Azure Data Studio

 

On the home page, our dashboard displays information about our instance already including:

  1. The SQL Server version – 15.0.4198.2
  2. The SQL Server Edition – Express Edition
  3. The host computer name – EC2AMAZ-4GCDCNN (this is the Amazon EC2 instance name at OS level)
  4. The Operating System – Windows Server 2016 Datacenter 10.0
  5. Backup Status - 2 backups within the last 24 hours
  6. Database Sizes - depicted as a graph, measured in MB with rdsadmin being the largest
Amazon RDS - Summary of instance properties as seen on Azure Studio Figure 3: Amazon RDS Instance Overview on Azure Data Studio

The databases tab also display the list of six databases hosted on this instance. While it is great to have this information visually, as mentioned earlier, we shall explore this instance using T-SQL code.

List of Databases in the Instance as seen in Azure Data Studio Figure 4: List of Databases in the Instance

We start by clicking the connections button. This gives us an interface similar to SQL Server Management Studio. We can option a new query window by right-clicking the connection name – Amazon RDS Connection. Recall that this is the name we used to define our connection when logging on using Azure Data Studio.

Amazon RDS - New Query on the Instance on Azure Data Studio Figure 5: New Query on the Instance

List of Databases with Key Properties

The code in Listing 1 extracts information about the databases on our Amazon RDS instance by querying the system table sys.databases. Before querying this table, we can examine the columns available using the system stored procedure sp_columns (see Listing 1a). In Listing 1b we go ahead to build a query using the columns that are of interest to us.

Listing 1: List of databases in the instance

-- Listing 1: List of databases in the instance
-- Listing 1a: sp_columns displays a list of columns so us select what we need
USE master
GO
EXEC sp_columns databases;
-- Listing 1b: As an example, we select the following columns
USE master
GO
SELECT
    [name]
    , [database_id]
    , [user_access_desc]
    , [state_desc]
    , [recovery_model]
    , [recovery_model_desc]
    , [containment_desc]
    , [catalog_collation_type_desc]
    , [is_accelerated_database_recovery_on]
    , [is_memory_optimized_enabled]
    , [physical_database_name]
FROM sys.databases;
Output of Listing 1 - Database in Amazon RDS Instance Figure 6: Output of Listing 1 - Database in Amazon RDS Instance

You can get the full description of inbuilt SQL Server system tables, catalog views, stored procedures, functions and more by doing a search with the object name in SQL Server documentation. The full description of sys.databases can be found here for example: sys.databases (Transact-SQL) - SQL Server | Microsoft Learn

Accessing Documentation for SQL Server System Objects Figure 7: Accessing Documentation for SQL Server System Objects

List of Databases with Sizes in Megabytes

Listing 2 is a script that shows the size of all databases on the Amazon RDS instance. This script leverages the sp_msforeachdb stored procedure which can loop through all databases on a SQL Server instance. For each the database, the script specified in the @command1 parameter is executed for each database. The overall script is built such that the data generated by the core script is written to a table named [DB Size].

Notice the use of simple calculations such as data_size + log_size)/1024/1024 AS [Total DB Size (GB)] (computed columns) to ensure the eventual output is in a desirable unit or measurement. We also use the ORDER BY clause to ensure the largest databases are shown first in the result set.

Listing 2: Check Size of All Databases

-- Listing 2: Check Size of All Databases
-- Check Current size of all databases --
PRINT 'Checking Database Sizes'
USE tempdb
GO
CREATE TABLE [DB Size] ([Database Name] VARCHAR(300),[Total Size (MB)] INT,[Total Size (GB)] INT)
GO
INSERT INTO [DB Size]
EXEC sp_MSforeachdb @command1=
'USE [?]
select DB_NAME(), ((a.data_size + b.log_size)/1024) [Total DB Size (MB)],
((a.data_size + b.log_size)/1024/1024) [Total DB Size (GB)] from
(select SUM(size*8) data_size from sysfiles where filename like ''%mdf'' or filename like ''%ndf'') a,
(select SUM(size*8) log_size from sysfiles where filename like ''%ldf'') b'
SELECT * FROM [DB Size] ORDER BY [Total Size (MB)] DESC
DROP TABLE [DB Size]
GO

Figure 6 Shows the output of this query. Because this is a new instance, rdsadmin happens to be the largest instance followed by msdb.

List of dabases in Amazon RDS instance - query output Figure 8: Database Size in Amazon RDS Instance Expressed in MB

 

Check Database Activity Based on Index Usage

Listing 2 is a script that shows the number and timing of user seeks, scans and lookups of indexes in each database. These numbers are indicative of how busy a database is. Panjak Patel gives a good summary of what index seeks, index scans and table scans mean in this short article.

Gail Shaw, a well-known SQL Server MVP and MCM differentiates User Seeks and Scans as follows:

“User_seeks will be the number of times a user query could use a seek on the index (execution plan shows an index seek operation), user scans is the number of times that a user query had to scan the index to get the necessary results. Any select without a where clause will be a scan.”

In his characteristic deep-dive fashion, Brent Ozar another MCM-cum-MVP illustrates index seeks and scans further in this article.

I tend to think of an index seek like looking for something in an ordered list, example a list of numbers 1 to 10. If you are asked to find the number 7, you can easily split the list in two since it is order and quickly determine that 7 will be in the second half. Index seeks are generally fast. For an index scan operation, the engine must look through the entire physical structure of the index. A table scan means the engine looks through the entire table (typically the case for heaps). An index lookup simply means a non-clustered index is used to fulfil part of the result set required by a query.

The discussion on indexes might be a bit of a digression but the point is user interactions with indexes in the database are an indication of how busy the database is.

Listing 3: Check Database Usage

-- Listing 3: Check Database Usage --
PRINT 'Checking Database Usage'
SELECT db_name(database_id) [Database]
, SUM(user_seeks) [Total User Seeks]
, SUM(user_scans) [Total User Scans]
, SUM(user_lookups) [Total User Lookups]
, SUM(user_updates) [Total User Updates]
, MAX(last_user_seek) [Last User Seek]
, MAX(last_user_scan) [Last User Scan]
, MAX(last_user_lookup)[Last User Lookup]
, MAX(last_user_update) [Last User Update]
FROM sys.dm_db_index_usage_stats
GROUP BY database_id
ORDER BY [Database];

Figure 9 gives us a sneak peek into the statistics in our specific case. Based on the data, rdsadmin is the busiest database.

Output for Query on Busiest Databases Figure 9: Output for Query on Busiest Databases

 

Check Database Backups on Amazon RDS Instance

Listing 4 checks the last backup of each database and displays the backup size, backup time and other details. The script checks for the past thirty days. It can also be revised to check for databases that have not been backed up in the last 30 days (or 7 days preferably). We can accomplish this with the Common Table Expression (CTE) shown in Listing 5.

Listing 4: Check Database Backup Sets

-- Listing 4: Check Database Backup Sets --
SELECT
    bus.database_name
    ,bus.name
    ,bus.description
    ,bus.first_lsn
    ,bus.last_lsn
    ,round(bus.backup_size/1024,0) as backup_size_MB
    ,bus.server_name
    ,bus.backup_start_date
    ,bus.backup_finish_date
    ,bus.expiration_date
FROM msdb..backupset bus
JOIN msdb..backupmediafamily bmf on bus.media_set_id=bmf.media_set_id
WHERE bus.backup_start_date >= (getdate() - 30);

The result of Listings 4 and 5 are showed in Figures 10 and 11.

Output of Listing 4 - Database Backup Figure 10: Output of Listing 4 - Database Backup

Listing 5: Check Databases Not Backed Up

-- Listing 5: Check Databases Not Backed Up --
WITH backupcheck AS (
SELECT
    bus.database_name
    ,bus.name
    ,bus.description
    ,bus.first_lsn
    ,bus.last_lsn
    ,round(bus.backup_size/1024,0) as backup_size_MB
    ,bus.server_name
    ,bus.backup_start_date
    ,bus.backup_finish_date
    ,bus.expiration_date
FROM msdb..backupset bus
JOIN msdb..backupmediafamily bmf on bus.media_set_id=bmf.media_set_id
WHERE bus.backup_start_date >= (getdate() - 30)
)
SELECT * from sys.databases
WHERE name NOT IN (select database_name from backupcheck);
Output of Listing 5 - Databases Not Backed Up (Tempdb) Figure 11: Output of Listing 5 - Databases Not Backed Up

Check Largest Tables and their Sizes

Listing 6 gives us the largest tables in a specific database. The script leverages three joins on four system tables to achieve this result – sys.tables, sys.indexes, sys.partitions, and sys.allocation_units. As earlier mentioned, the column list from these tables can be return by invoking the system stored procedure sp_columns. Extensive documentation of the tables and their columns is also available in SQL Server documentation.

One might be interested in knowing the largest tables in a database to determine where to focus performance tuning effort or how to handle data archiving (by using table partitioning for example). Figure 10 shows us the output of this query in our example.

Listing 6: Check Size of Largest Tables

-- Listing 6: Check Size of Largest Tables
USE rdsadmin
GO
-- Get the space used by table TableName
SELECT
    TABL.name AS table_name,
    INDX.name AS index_name,
    SUM(PART.rows) AS rows_count,
    SUM(ALOC.total_pages) AS total_pages,
    SUM(ALOC.used_pages) AS used_pages,
    SUM(ALOC.data_pages) AS data_pages,
    (SUM(ALOC.total_pages)*8/1024) AS total_space_MB,
    (SUM(ALOC.used_pages)*8/1024) AS used_space_MB,
    (SUM(ALOC.data_pages)*8/1024) AS data_space_MB
FROM sys.tables AS TABL
INNER JOIN sys.indexes AS INDX
ON TABL.object_id = INDX.object_id
INNER JOIN sys.partitions AS PART
ON INDX.object_id = PART.object_id
AND INDX.index_id = PART.index_id
INNER JOIN sys.allocation_units AS ALOC
ON PART.partition_id = ALOC.container_id
WHERE INDX.object_id > 255
AND INDX.index_id <= 1
GROUP BY TABL.name,
INDX.object_id,
INDX.index_id,
INDX.name
HAVING (SUM(ALOC.used_pages)*8/1024) > 1
ORDER BY Object_Name(INDX.object_id),
(SUM(ALOC.total_pages)*8/1024) DESC
GO
Output of Listing 6 – Largest Tables in a Database Figure 12: Output of Listing 6 – Largest Tables in a Database

Conclusion

In this article, we have seen just a few checks a data engineer might want to check when presented with a new environment. Just for fun, we have done these checks using raw T-SQL queries. The checks demonstrated in this article are just a small subset of things a Data Engineer might want to know about a new database environment. In addition, there is a plethora of tools in the market that you can used to achieve even more sophisticated results. Obviously, these tools implement similar queries in the background. Here is a short list:

  1. Microsoft’s Utility Control Point
  2. Redgate’s SQL Monitor
  3. dbForge SQL Tools

In exact terms, dbForge SQL Tools offers interesting capabilities relevant for exploring a new database. These include:

  • Get a full record of database changes and find out who committed them
  • Generate lifelike meaningful test data for load testing and integration
  • Run T-SQL unit tests as a part of continuous integration with DevOps
  • And many more

Rate

5 (1)

You rated this post out of 5. Change rating

Share

Share

Rate

5 (1)

You rated this post out of 5. Change rating