Blog Post

Code Snippet for an SSIS DataFlow Performance Wiretap

,

I was recently tweaking a few Data Flows, and settled on a very small, but very useful script to help record execution timing inside the flow. 

Decomposition Is Time Consuming
You may know that I do recommend the decomposition technique to help diagnose performance problems in SSIS data flows.  That's the technique where you start recording execution times using fewer and fewer components in the data flow in order to get a differential execution time for particular components.  Even though it's not perfect, it does account for misleading effects of the pipeline backpressure mechanism.  (What I suggest later does not.)

Unfortunately, performing a decomposition analysis can take quite a while.  It is time consuming to replace the destination component with a row count, run the flow, record the results, and continue up the flow.
Adding a Performance Wiretap is Easy
Especially when the code is completely copy-and-paste!  I couldn't measure any performance impact of including this script code, and didn't expect any.  It doesn't use any columns in the data flow, and the operations it does do are very lightweight.  You don't even have to change anything in the script itself - the events that are logged include the name you give the component in the data flow (which must be unique) so you can easily identify which of several wiretaps is reporting information.

public class ScriptMain : UserComponent
{
  #region Statistic collectors
  private DateTime _preExecuteTime;
  private DateTime _firstRowTime;
  private DateTime _lastRowTime;
  private int _numRows;
  #endregion

  #region PreExecute
  // This method is called before any rows are read from any sources
  // We reset all statistics here
  public override void PreExecute()
  {
    base.PreExecute();
    this._firstRowTime = DateTime.MinValue;
    this._lastRowTime = DateTime.MinValue;
    this._numRows = 0;
  }
  #endregion

  #region ProcessInput
  // This is NOT a regular method override you normally see in a script component
  // This override is called once for each input buffer (set of rows) that
  // are passed into this script component. We count rows here, and record
  // first and last "row seen times".
  public override void ProcessInput(int InputID, Microsoft.SqlServer.Dts.Pipeline.PipelineBuffer Buffer)
  {
    base.ProcessInput(InputID, Buffer);
    #region Record the time the first row was seen
    if (this._firstRowTime == DateTime.MinValue)
    {
      this._firstRowTime = DateTime.Now;
    }
    #endregion
    #region If this buffer has any rows (the last buffer typically does NOT), record the time
    if (Buffer.RowCount > 0)
    {
      this._lastRowTime = DateTime.Now;
    }
    #endregion
    #region Add the number of rows in this buffer to the row count statistic
    this._numRows += Buffer.RowCount;
    #endregion
  }
  #endregion

  #region PostExecute
  public override void PostExecute()
  {
    bool fireAgain = true;
    base.PostExecute();
    #region Output statistics to package logging, as Information events
    ComponentMetaData.FireInformation(0, "",
      ComponentMetaData.Name + " first row seen at " + this._firstRowTime.ToString("hh:mm:ss.fff"),
      "", 0, ref fireAgain);
    ComponentMetaData.FireInformation(0, "",
      ComponentMetaData.Name + " last row seen at " + this._lastRowTime.ToString("hh:mm:ss.fff"),
      "", 0, ref fireAgain);
    TimeSpan duration = (this._lastRowTime - this._firstRowTime);
    ComponentMetaData.FireInformation(0, "",
      ComponentMetaData.Name + " watched " + this._numRows.ToString("###,###,##0") + " rows pass by in " + duration.TotalMilliseconds.ToString() + "ms",
      "", 0, ref fireAgain);
    #endregion
  }
  #endregion
}

So what does it do?  It records the time the first row passes it, the number of rows that go by, and the time the last row passes it.
How Can This Help Diagnose Performance?
If you want to know which section of your Data Flow takes the longest duration - place some of these scripts at various places in the flow.  After you run your Data Flow, you'll see how long each section takes - keepin gin mind their execution times will overlap each other.  (If you use two scripts, the second script's "first row" may very well be seen before the "last row" passes through the first script.)
If you're wondering what kind of impact a certain change has in your Data Flow - place this script before and after the affected area.  Run your flow before you make the change, and again after.  The results will show you how much impact your change had (but remember it won't account for hidden backpressure effects!)
If you're wondering which of several "branches" of a Data Flow that are combined using a Union All is the slowest, you can use these wiretaps to instrument each one.
Once again - please don't think this is a replacement for correct use of the decomposition technique.  Component performance in the Data Flow depends heavily on other components coming before and after the area you're interested in - and even flows occurring in parallel in the same (or other) Data Flow.

Original post (opens in new tab)

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating