Stairway to ScriptDOM

Stairway to ScriptDOM Level 3 - Finding Patterns in the Abstract Syntax Tree

,

In Level 1 of this series, we learned about what ScriptDOM is and what it can do. In Level 2, we learned a few terms around ScriptDOM, such as Abstract Syntax Tree, Tokens and how the parser works. In this article we will explore the Abstract Syntax Tree further, use it to find coding patterns, understand what a visitor class is, and how it can help us find coding patterns more efficiently.

What is an Abstract Syntax Tree?

Generically speaking, an Abstract Syntax Tree is a tree-like representation of the syntax in code. It is needed to ensure that the code is syntactically correct and can be used to find specific patterns in the code. For example, take a stored procedure like the one below.

USE WideWorldImporters;
GO
CREATE PROCEDURE dbo.testproc
AS
BEGIN
       --John Doe 4/12/2022 Removing nolock
       SELECT * FROM [Sales].[OrderLines]
END

An Abstract Syntax Tree for this would be constructed as shown below:

Each one of the blue boxes above would be called a ‘node’ and a single line between them would be called a ‘branch’.  If I were to search above code for ‘nolock’, for example, using the traditional method, I would need to run a DMV based query like this one:

SELECT DISTINCT
       o.name AS Object_Name,
       o.type_desc
FROM sys.sql_modules m
       INNER JOIN
       sys.objects o
         ON m.object_id = o.object_id
WHERE o.name = 'testproc' AND m.definition Like '%[nolock]%';

The result I get would be finding the nolock in the comment, as shown below.

If we were to do this search on the abstract syntax tree, it knows that the single- line comment and what it contains is not what we are looking for, since the tree constructed has specifically marked that node as a comment and not as code. This is the advantage of having the syntax constructed as a tree and each node having its identity, as opposed to searching things in text where every line is considered valid for search.

Exploring the Abstract Syntax Tree to Find SELECT *

Take the simple script shown below. All this does is to parse the statement sent to it, and if that is clean, it constructs the Abstract Syntax Tree.

$ScriptData = @"
SELECT * FROM dbo.customers
"@
Add-Type -Path "C:\ugpresentation\ScriptDom\Microsoft.SqlServer.TransactSql.ScriptDom.dll"
Write-Host "Attempting to parse..." 
$parser = New-Object Microsoft.SqlServer.TransactSql.ScriptDom.TSql150Parser($true)
#object to handle parsing errors if any
$parseErrors = New-Object System.Collections.Generic.List[Microsoft.SqlServer.TransactSql.ScriptDom.ParseError]
#object that handles strings script to parse
$stringReader = New-Object System.IO.StringReader($ScriptData)
#parser creates parsed objects from strings 
$parsedObjects = $parser.Parse($stringReader, [ref]$parseErrors)
if($parseErrors.Count -eq 0) {
        Write-Host "$($parser.GetType().Name): No parsing errors" -ForegroundColor Green
    }
    else {
        Write-Host "$($parser.GetType().Name): $($parseErrors.Count) parsing error(s): $(($parseErrors | ConvertTo-Json))" -ForegroundColor Yellow
    }

Upon running the script, the output is as shown here:

Let us now explore the Abstract Syntax Tree it created from the PowerShell prompt. At the first level, we see the tree as shown here. ‘Batches’ refers to number of batches in the script we passed. In this case, it is one statement and therefore, one batch.

Going down one level further, we see the results below. We see that the batch has a SELECT statement in it.

Going down further into the statement, we see more details. This shows various components of a select statement.

Since the one we chose is relatively simple and only has a ‘queryexpression’, let us look at what that part of the tree contains.

We can see that the ‘selectelements’ node contains a ‘selectstarexpression’, identifying the ‘*’ that is part of the select. That is what we can use to find select * expressions.

If I now rewrote the query to find this, it would be like this one:

$ScriptData = @"
SELECT * FROM dbo.customers
"@
Add-Type -Path "C:\ugpresentation\ScriptDom\Microsoft.SqlServer.TransactSql.ScriptDom.dll"
Write-Host "Attempting to parse..." 
$parser = New-Object Microsoft.SqlServer.TransactSql.ScriptDom.TSql150Parser($true)
#object to handle parsing errors if any
$parseErrors = New-Object System.Collections.Generic.List[Microsoft.SqlServer.TransactSql.ScriptDom.ParseError]
#object that handles strings script to parse
$stringReader = New-Object System.IO.StringReader($ScriptData)
#parser creates parsed objects from strings 
$parsedObjects = $parser.Parse($stringReader, [ref]$parseErrors)
if($parseErrors.Count -eq 0) {
        Write-Host "$($parser.GetType().Name): No parsing errors" -ForegroundColor Green
    }
    else {
        Write-Host "$($parser.GetType().Name): $($parseErrors.Count) parsing error(s): $(($parseErrors | ConvertTo-Json))" -ForegroundColor Yellow
    }
$stmtcount = $parsedObjects.batches[0].statements.count
     for ( $index = 0; $index -lt $stmtcount; $index++)
    {
       $stmtype = $parsedobjects.batches[0].statements[$index].GetType().name
       if ($stmtype -eq 'SelectStatement')
       {
        $selectelements = $parsedobjects.batches[0].statements[$index].queryexpression.selectelements.count
        for ( $element = 0; $element -lt $selectelements; $element++)
        {
            if ($parsedobjects.batches[0].Statements[$index].queryexpression.selectelements[$element].gettype().name -eq 'SelectStarExpression')
            {
                write-host 'SELECT STAR FOUND' -BackgroundColor Red
            }
        }
       } 
    }

Running this, I would get the result shown below.

The problem with writing code in this way is that we have to know how it is structured. In this case, it is a simple one-line query. But a select star could be buried in server places in the code. It could be within a BEGIN..END, within an IF statement, within a CTE, a SELECT for an INSERT and on and on.

To programmatically traverse all those constructs to find it would be time consuming and create some really unwieldy, bulky code. This code, even if one writes it, may not work for a structure that is slightly different. Say someone wrote code without a BEGIN..END and someone else used BEGIN TRY. Our code might not find these examples.

To create code that is more efficient and make it work for any kind of construct – we need to understand what are called VISITOR classes in the ScriptDOM world.

Classes, Methods and Visitor Class

ScriptDOM is a .NET framework-based library. It is therefore important to understand terminology related to using such libraries, such as classes and methods.

A class essentially is a cookie cutter for an object. Classes describe object types, and objects are usable with each instance of a class. A method is declared in a class and causes certain code to be run when called. Since an object is based on same ‘cookie cutter’ class, it will have same properties and methods of the parent class.

A great introduction to these basics can be found here. Since we are using PowerShell for all our examples here, an introduction to usage of classes and methods can be found here.

In context of script above – classes and methods would be as below.

In addition to this there is one more term to understand; it is called a ‘Visitor Class’ and based on the Visitor Design Pattern. A detailed description of this design pattern can be found here.

A ‘visitor’ class is a backdoor way of finding specific patterns. Let’s say we consider SelectStarExpression as the pattern we want to find. If we include the FindSelectStar as a method of the parser class and use it, that may be one way. But then, what if we want to find another pattern, say NOLOCK hint? We then have to add a FindNoLock method. And then what about four part tablenames, or tables without an alias, or a missing index setting, or a missing primary key? On and on.

There are hundreds of anti-patterns to find, and adding methods to the base object would make it really huge and difficult to manage. This is why we have a ‘visitor class’. The visitor class is a backdoor way of declaring a class, The original object (the tsqlfragment) that had to perform the behavior (detect the anti pattern) is now passed to one of the visitor’s methods as an argument, providing the method access to all necessary data contained within the object. The steps to use it are as below.

First, set up the script parameter to accept a T-SQL script being passed.

function Find-LintingErrors
{
       
    [CmdletBinding()]
    param(
          [string] $SelectStatement
    )

Next, add a reference to the library.

try {
Add-Type -Path "C:\ugpresentation\ScriptDom\Microsoft.SqlServer.TransactSql.ScriptDom.dll"

Now we use the parser to parse the script and create a tsqlfragment (abstract syntax tree) as the output.

   $DDLParser = New-Object Microsoft.SqlServer.TransactSql.ScriptDom.TSql150Parser($true)
    $DDLparserErrors = New-Object System.Collections.Generic.List[Microsoft.SqlServer.TransactSql.ScriptDom.ParseError]
    # create a StringReader for the script for parsing
    $stringReader = New-Object System.IO.StringReader($selectstatement)
    # parse the script
    $tSqlFragment = $DDLParser.Parse($stringReader, [ref]$DDLParsererrors)
# raise an exception if any parsing errors occur
    if($DDLParsererrors.Count -gt 0) {
        throw "$($DDLParsererrors.Count) parsing error(s): $(($DDLParsererrors | ConvertTo-Json))"

We now have to pass the abstract syntax tree created by the parser in the variable $tsqlfragment to another function where the visitor class is declared and finds the pattern we are looking to find. In the new function, declare the fragment as a parameter and declare the visitor class. Within the class, specify the ‘fragment’ that is of interest and needs to be found. In this case that is the SELECTSTAREXPRESSION. If that is found, we throw an error.

#Find references to select * in views/stored procs/functions
function Find-SelectStarWithPattern
{
[CmdletBinding()]
param(
$TSQLFragmentForRule
)
    Try
    {
        class VisitorSelectStar: Microsoft.SqlServer.TransactSql.ScriptDom.TSqlConcreteFragmentVisitor 
        { 
{ 
         [void]Visit ([Microsoft.SqlServer.TransactSql.ScriptDom.SelectStarExpression] $fragment) 
            { 
errorline = "WARNING: 'SELECT * found at "
                write-host $errorline $fragment.StartLine ":" $fragment.StartColumn ":" $fragment.FragmentLength -BackgroundColor red
 
}

To invoke the visitor class, we have create an object is created from it first and then call the accept method of the fragment with this object passed in as an argument. This method then invokes the visitor’s Visit method. Each ‘child’ of the fragment (which may mean every reference to selectstar or tablehint) is ‘visited’ and found similarly.

$visitorselectstar = [VisitorSelectStar]::new()
tSqlFragmentforrule.Accept($visitorselectstar) 

The complete functions are shown below.

#Find references to select * in views/stored procs/functions
function Find-SelectStarWithPattern
{
    [CmdletBinding()]
    param(
        $TSQLFragmentForRule
    )
    Try
    {
        class VisitorSelectStar: Microsoft.SqlServer.TransactSql.ScriptDom.TSqlConcreteFragmentVisitor 
        {
            [void]Visit ([Microsoft.SqlServer.TransactSql.ScriptDom.SelectStarExpression] $fragment) 
            {
                $errorline = "WARNING: 'SELECT * found at "
                write-host $errorline $fragment.StartLine ":" $fragment.StartColumn ":" $fragment.FragmentLength -BackgroundColor red
            }
        }
        
      $visitorselectstar = [VisitorSelectStar]::new()
        $tSqlFragmentforrule.Accept($visitorselectstar)
    
   } #end try
catch {
    throw
}
} 

The shell from which this is called is as below.

    function Find-LintingErrors
{
    [CmdletBinding()]
    param(
          [string] $SelectStatement
    )
    try {
Add-Type -Path "C:\ugpresentation\ScriptDom\Microsoft.SqlServer.TransactSql.ScriptDom.dll"
    $DDLParser = New-Object Microsoft.SqlServer.TransactSql.ScriptDom.TSql150Parser($true)
    $DDLparserErrors = New-Object System.Collections.Generic.List[Microsoft.SqlServer.TransactSql.ScriptDom.ParseError]
    # create a StringReader for the script for parsing
    $stringReader = New-Object System.IO.StringReader($selectstatement)
    # parse the script
    $tSqlFragment = $DDLParser.Parse($stringReader, [ref]$DDLParsererrors)
  
   Find-SelectStarWithPattern $tSqlFragment
}
catch {
    throw
}
}

The below procedure, for example, has 2 references to SELECT STAR. Here is how we call the function.

$ScriptData = @"
USE [WideWorldImporters]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO

CREATE OR ALTER PROCEDURE [Website].[InsertCustomerOrders]
@Orders Website.OrderList READONLY,
@OrderLines Website.OrderLineList READONLY,
@OrdersCreatedByPersonID INT,
@SalespersonPersonID INT
WITH EXECUTE AS OWNER
AS
BEGIN
    SET NOCOUNT ON;
    
    SET XACT_ABORT ON;
    DECLARE @OrdersToGenerate AS TABLE
    (
        OrderReference INT PRIMARY KEY,   -- reference from the application
        OrderID INT
    );
    -- allocate the new order numbers
    INSERT @OrdersToGenerate (OrderReference, OrderID)
    SELECT *
    FROM myserver.mydb.dbo.Orders (NOLOCK);
    SET TRANSACTION ISOLATION LEVEL READ uncommitted;
    BEGIN TRY
        BEGIN TRAN;
        --Comment 1
        INSERT Sales.Orders
            (OrderID, CustomerID, SalespersonPersonID, PickedByPersonID, ContactPersonID, BackorderOrderID, OrderDate,
             ExpectedDeliveryDate, CustomerPurchaseOrderNumber, IsUndersupplyBackordered, Comments, DeliveryInstructions, InternalComments,
             PickingCompletedWhen, LastEditedBy, LastEditedWhen)
        SELECT otg.OrderID, o.CustomerID, @SalespersonPersonID, NULL, o.ContactPersonID, NULL, SYSDATETIME(),
               o.ExpectedDeliveryDate, o.CustomerPurchaseOrderNumber, o.IsUndersupplyBackordered, o.Comments, o.DeliveryInstructions, NULL,
               NULL, @OrdersCreatedByPersonID, SYSDATETIME()
        from dbo.OrdersToGenerate AS otg
        INNER JOIN dbo.Orders AS o
        ON otg.OrderReference = o.OrderReference;
        --Comment 2
        insert Sales.OrderLines
            (OrderID, StockItemID, [Description], PackageTypeID, Quantity, UnitPrice,
             TaxRate, PickedQuantity, PickingCompletedWhen, LastEditedBy, LastEditedWhen)
        SELECT *
        FROM dbo.OrdersToGenerate AS otg
        INNER JOIN OrderLines (nolock) AS ol
        ON otg.OrderReference = ol.OrderReference
INNER JOIN dbo.Orders AS o
ON ol.OrderReference = o.OrderReference
        INNER JOIN Warehouse.StockItems (nolock) AS si
        ON ol.StockItemID = si.StockItemID;
        COMMIT;
    END TRY
    BEGIN CATCH
        IF XACT_STATE() <> 0 ROLLBACK;
        THROW;
        RETURN -1;
    END CATCH;
    RETURN 0;
END;
GO
"@
Find-LintingErrors $ScriptData 

We see the results here, as both occurrences are returned.

Since select star is only an expression, there is no further action needed within the visitor class that finds it other than to throw an error. But more nuanced coding may be needed for other situations, such as finding a nolock, where the visitor class finds the table hint, but one has to programmatically check if the hint is a nolock hint. Another case may be finding a 4-part table reference (a linked server reference) where the visitor class finds the table reference but one has to code further to see if it has 4 parts. Examples of doing both are shown below.

In the below code, we find the table hint in the class ‘visitortablehintref’. In the ‘visit’ method of this class, we check what the type of hint is (could be any type allowed for table hints). Since we are looking for nolock, if it matches ‘nolock’ we throw an error.

#Find no lock hint in views/stored procs/functions
function Find-NoLockHintWithPattern
{
    [CmdletBinding()]
    param(
           $tsqlfragmentforrule
    )
    Try
    {
         class VisitorTableHintRef: Microsoft.SqlServer.TransactSql.ScriptDom.TSqlFragmentVisitor {
        
         [void]Visit ([Microsoft.SqlServer.TransactSql.ScriptDom.TableHint] $fragment) {
            
            $tablehint = $fragment
            if ($tablehint.HintKind -eq 'Nolock')
            {
                $errorline = "WARNING: NOLOCK hint found at "
                write-host $errorline $fragment.StartLine ":" $fragment.StartColumn ":" $fragment.FragmentLength -BackgroundColor Red
            }
        }   
     }
        $visitortablehintref = [VisitorTableHintRef]::new()
        $tSqlFragmentforrule.Accept($visitortablehintref)
} 
catch {
    throw
}
} 

Here is the complete script we can run from the shell.

$ScriptData = @"
USE [WideWorldImporters]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO

CREATE OR ALTER PROCEDURE [Website].[InsertCustomerOrders]
@Orders Website.OrderList READONLY,
@OrderLines Website.OrderLineList READONLY,
@OrdersCreatedByPersonID INT,
@SalespersonPersonID INT
WITH EXECUTE AS OWNER
AS
BEGIN
    SET NOCOUNT ON;
    
    SET XACT_ABORT ON;
    DECLARE @OrdersToGenerate AS TABLE
    (
        OrderReference INT PRIMARY KEY,   -- reference from the application
        OrderID INT
    );
    -- allocate the new order numbers
    INSERT @OrdersToGenerate (OrderReference, OrderID)
    SELECT *
    FROM myserver.mydb.dbo.Orders (NOLOCK);
    SET TRANSACTION ISOLATION LEVEL READ uncommitted;
    BEGIN TRY
        BEGIN TRAN;
        --Comment 1
        INSERT Sales.Orders
            (OrderID, CustomerID, SalespersonPersonID, PickedByPersonID, ContactPersonID, BackorderOrderID, OrderDate,
             ExpectedDeliveryDate, CustomerPurchaseOrderNumber, IsUndersupplyBackordered, Comments, DeliveryInstructions, InternalComments,
             PickingCompletedWhen, LastEditedBy, LastEditedWhen)
        SELECT otg.OrderID, o.CustomerID, @SalespersonPersonID, NULL, o.ContactPersonID, NULL, SYSDATETIME(),
               o.ExpectedDeliveryDate, o.CustomerPurchaseOrderNumber, o.IsUndersupplyBackordered, o.Comments, o.DeliveryInstructions, NULL,
               NULL, @OrdersCreatedByPersonID, SYSDATETIME()
        from dbo.OrdersToGenerate AS otg
        INNER JOIN dbo.Orders AS o
        ON otg.OrderReference = o.OrderReference;
        --Comment 2
        insert Sales.OrderLines
            (OrderID, StockItemID, [Description], PackageTypeID, Quantity, UnitPrice,
             TaxRate, PickedQuantity, PickingCompletedWhen, LastEditedBy, LastEditedWhen)
        SELECT *
        FROM dbo.OrdersToGenerate AS otg
        INNER JOIN OrderLines (nolock) AS ol
        ON otg.OrderReference = ol.OrderReference
INNER JOIN dbo.Orders AS o
ON ol.OrderReference = o.OrderReference
        INNER JOIN Warehouse.StockItems (nolock) AS si
        ON ol.StockItemID = si.StockItemID;
        COMMIT;
    END TRY
    BEGIN CATCH
        IF XACT_STATE() <> 0 ROLLBACK;
        THROW;
        RETURN -1;
    END CATCH;
    RETURN 0;
END;
GO
"@
Find-LintingErrors $ScriptData 

On running the above code, the visitor class for table hints finds all the instances of nolock hint as below with no additional coding effort. Here is what we see as the result:

In code below we find a linked server (4 part) name using a visitor class. The class actually finds tablereferences and within that, we have the visit method where we check if this tablereference has 4 parts. If yes, we throw an error.

#Find 4 part /linked server references in stored procedures/views/functions 
function Find-4PartNameWithPattern
{
    [CmdletBinding()]
    param(
           $TSqlfragmentforrule
    )
    Try
    {
       class VisitorTableRef: Microsoft.SqlServer.TransactSql.ScriptDom.TSqlFragmentVisitor 
       {

[void]Visit ([Microsoft.SqlServer.TransactSql.ScriptDom.TableReferenceWithAlias] $fragment) 
{
$tablereference = $fragment
$tablename = $tablereference.schemaobject.BaseIdentifier.value 
$servername = $tablereference.schemaobject.ServerIdentifier.value
            
if ($servername -eq $null)
{
}
else
{
$errorline = "WARNING: Table $tablename with linked server $servername found"
write-host $errorline $fragment.StartLine ":" $fragment.StartColumn ":" $fragment.FragmentLength -BackgroundColor red
                                        
}
}  
    }
    
        $visitortableref = [VisitorTableRef]::new()
        $tSqlFragmentforrule.Accept($visitortableref)
       
} #end try
catch {
    throw
}
} 
Find-LintingErrors $ScriptData 

On running above code, the visitor class for table references finds the 4-part table name.

Summary

In this post we went into the structure of the abstract syntax tree, how to walk the tree programmatically and find what we want inside of it..Then we looked at how to define/use a visitor class and find patterns in the tree more efficiently. We also saw three examples of using the visitor class – the select star, the nolock hint and 4 part table name. In the next article we will see more nuanced examples of code to find patterns.

Rate

Share

Share

Rate