Blog Post

Iterating Over Columns in the SSIS Script Component

,

Just recently a question was posed in the MSDN SSIS Forums about how to create a "generic" script component that would read all columns for each row passed through it in the Data Flow.  The standard impetus for this kind of behaviour is a logging or auditing step in the Data Flow.  Now - my first reaction is that you should probably be looking to code a custom component so that you aren't copying and pasting script code everywhere...
But I know that not everyone is comfortable coding a custom component from scratch - you have to be a code addict like me to find joy in that.  So here's a little "gateway drug" for those of you who are itching for the capability.  Perhaps a little taste of it will expose you to the script's shortcomings, and lead you to build your own lab to cook up a custom component to do this job properly.
Location, Location, Location
The big key here, as mentioned by Darren Green in an earlier post on the subject, is where to put code to do this.  It's not in one of the automatically generated stubs - you'll have to tread into slightly less familiar territory - the ProcessInput method override.  The standard stubs (PreExecute, PostExecute, and Input0_ProcessInputRow) are all useless for our purposes - you can safely delete them or ignore them.  Why?  PreExecute doesn't yet have any information about the buffer, and PostExecute is too late (obviously).  Input0_ProcessInputRow is called once per row... but with a carefully wrapped row object that presents the columns in nicely named properties.  There's no way to access the collection of columns.  Only ProcessInput has access to the buffer in a condition that allows iterating over the columns.
Paste This Code
public override void ProcessInput(int InputID, PipelineBuffer Buffer)
{
  
bool fireAgain = true;
  
ComponentMetaData.FireInformation(0, "",
    
Buffer.ColumnCount.ToString() + " columns",
    
"", 0, ref fireAgain);

  while (Buffer.NextRow())
  
{
    
for (int columnIndex = 0;
      
columnIndex < Buffer.ColumnCount;
      
columnIndex++)
    
{
      
string columnData = null;
      
if (Buffer.IsNull(columnIndex))
      
{
        columnData
= "is NULL";
      
}
      
else
      
{
        BufferColumn columnInfo
= Buffer.GetColumnInfo(columnIndex);
        
switch (columnInfo.DataType)
        
{
          
case DataType.DT_BOOL:
            columnData
= Buffer.GetBoolean(columnIndex).ToString();
            
break;

          case DataType.DT_WSTR:
            columnData
+= Buffer.GetString(columnIndex);
            
break;

          // add code to support more data types here

          default:
            columnData
= "";
            
break;
        
}
      }
      ComponentMetaData.FireInformation
(0, "",
        
"Column " + columnIndex.ToString() + ": " + columnData,
        
"", 0, ref fireAgain);
    
}
  }
  base.ProcessInput
(InputID, Buffer);
}

What's Happening In There?
It doesn't take much explanation - but every little bit helps. 
Initially, a loop has to be constructed to iterate over all the rows we're given.  Do remember that this isn't ALL the rows coming through your Data Flow - it's just one buffer's worth.  This method will get called several times (unless you have very few rows in your flow).
The column then gets checked to see if it's NULL... because NULLs cause quite a problem when you don't expect to see them.
After that, a little inquiry is made to ask for some information about the column.  This bit of code can definitely be optimized out of the loop - it is a burden on the system to ask for it for each and every row!
The reason for getting a little information about the column is apparent in the next block - the switch statement that handles different datatypes differently.  You can extend the code to handle the data types you expect.
Finally, after the loop, don't forget to call the base ProcessInput method.  Why?  Press F12 on that call and you'll see - the base code handles marking the processing as being complete when you've finished seeing all the rows.
Variations On This Theme
The above code isn't the only way to get this done.  You can do away with manually detecting the column data type and simply call ToString on the buffer's indexer - as in:
string columnData = Buffer[columnIndex].ToString();
But do still beware of nulls and other odd results.  Using the "Get" methods specific to the data type do perform faster, and are safer in the long run.

Original post (opens in new tab)

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating