SQLServerCentral Article

Update statistics in parallel

,

To update statistics on a whole database, there is the T-SQL command, sp_updatestats. This T-SQL statement works serially on one table at a time, so it is not very fast. Sometimes it would be nice to update statistics on an entire database using not just one, but several threads. The solution is the SSIS package  UPDATE_STATS_PARALLEL.dtsx, which creates dynamically and launches another SSIS package. The second package contains as many independent Execute SQL Tasks as needed to update statistics in parallel. 

The UPDATE_STATS_PARALLEL.dtsx has several variables that are used as input parameters to control the behaviour of the generated SSIS package. These are:

INSTANCE_NAME: name of the SqlServer Instance

DEST_DB: name of the database to perform update statistics on , and of the dynamically generated SSIS package.

NUM_OF_THREADS: maximum number of threads of the generated dtsx

SOURCE_SELECT: The T-SQL statement that generates the list of tables to update statistics on. This must return two columns:  [tname] and [comando]. These are the name of the table and the command to be run on the table, respectively.  Here is one example :

/*
 Update statistics on tables not empty  and with statistics older than 3 days
*/select
  t.name as [tname]
, ' update statistics ['+t.name+']  ' [comando]
 from sys.tables t
  left join sys.stats s 
   on t.object_id = s.object_id  
  join ( select OBJECT_ID 
              , sum(rows) rows 
          from  sys.partitions 
          group by object_id 
          having SUM(rows) >0
       )  pa
    on t.object_id = pa.object_id 
 where (  STATS_DATE(t.object_id,stats_id) is null 
       or datediff(DAY, STATS_DATE(t.object_id,stats_id), GETDATE() ) >=3        )
 group by t.name
 order by 1

The criteria in the where clause can be modified as needed.

Here is picture of the UPDATE_STATS_PARALLEL.dtsx package

The most important part of UPDATE_STATS_PARALLEL.dtsx is the VB script named "Create DTSX". This script creates an SSIS package named  ValueOf(DEST_DB).dtsx. The structure of the code is similar to that shown in this article: Building Packages Programmatically. Another example is this thread: Dynamically create an SSIS Bulk Insert Package

The code used is shown below.

 Public Sub Main()
        Dim packageName As String = Dts.Variables("DEST_DB").Value.ToString
        app = New Application()
        Dts.Events.FireInformation(0, "Update Stats Package", packageName, "", 0, True)
        Dim destDbConnectionString As String = Dts.Variables("DEST_CONNECT").Value.ToString
  
      'Create package
        dataTransferPackage = New Package()
        dataTransferPackage.LocaleID = 1029

      'Create DestConection
        Dim destConn As Microsoft.SqlServer.Dts.Runtime.ConnectionManager = dataTransferPackage.Connections.Add("OLEDB")
        destConn.Name = "DestDB"
        destConn.ConnectionString = destDbConnectionString
        workDestConn = New ServerConnection(Dts.Variables("INSTANCE_NAME").Value.ToString)
        workDestConn.DatabaseName = Dts.Variables("DEST_DB").Value.ToString
        
        Dim tablesDS As DataSet = workDestConn.ExecuteWithResults(Dts.Variables("SOURCE_SELECT").Value.ToString)
        Dim tablesDT As DataTable = tablesDS.Tables(0)
     ' For each row in the dataset, add an ExecuteSQLTask that performs update statistics
        For Each tablesR As DataRow In tablesDT.Rows
            Dim sqlcomandosource As String = tablesR.Item("comando").ToString
            'Dts.Events.FireInformation(0, "Creating components for table", "[" + schemaName + "].[" + tableName + "]", "", 0, True)
            Dim UpdStatsTask As Executable = dataTransferPackage.Executables.Add("STOCK:SQLTask")
            Dim thError As Microsoft.SqlServer.Dts.Runtime.TaskHost = CType(UpdStatsTask, Microsoft.SqlServer.Dts.Runtime.TaskHost)
            thError.Name = "Stats " + tablesR.Item("tname").ToString
            thError.Properties("Connection").SetValue(thError, dataTransferPackage.Connections("DestDB").Name)
            thError.Properties("SqlStatementSourceType").SetValue(thError, 1)
            thError.Properties("SqlStatementSource").SetValue(thError, sqlcomandosource)
            thError.Properties("DelayValidation").SetValue(thError, True)
            ' nnn thError.Properties("TimeOut").SetValue(thError, CUInt(50))
            'Dts.Events.FireInformation(0, "Error task created for table", "[" + schemaName + "].[" + tableName + "]", "", 0, True)
        Next

        dataTransferPackage.MaxConcurrentExecutables = Dts.Variables("NUM_OF_THREADS").Value.ToString
        dataTransferPackage.MaximumErrorCount = 100
        dataTransferPackage.DelayValidation = True

        app.SaveToXml(packageName + ".dtsx", dataTransferPackage, Nothing)
        Dts.TaskResult = ScriptResults.Success
    End Sub

Here is a picture of how the dynamically generated SSIS package might look. The package as many independent Execute SQL Tasks as the rows in the result set from the variable SOURCE_SELECT.

If this generated package runs, it will execute simultaneously up to NUM_OF_THREADS of these tasks. It will be much faster, and also more resource intensive, than the built-in T-SQL command, sp_updatestats.

After creating this package, the master package, UPDATE_STATS_PARALLEL.dtsx, executes the package using an SSIS ExecutePackageTask. The master package then deletes the child package using an SSIS FileSystemTask.

In the attachment there is also a T-SQL stored procedure that can be used to launch UPDATE_STATS_PARALLEL.dtsx using a T-SQL command. This is an optional part of the proposed solution .

Resources

Rate

4.56 (9)

You rated this post out of 5. Change rating

Share

Share

Rate

4.56 (9)

You rated this post out of 5. Change rating