Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

Iterating Over Columns in the SSIS Script Component

Just recently a question was posed in the MSDN SSIS Forums about how to create a "generic" script component that would read all columns for each row passed through it in the Data Flow.  The standard impetus for this kind of behaviour is a logging or auditing step in the Data Flow.  Now - my first reaction is that you should probably be looking to code a custom component so that you aren't copying and pasting script code everywhere...
But I know that not everyone is comfortable coding a custom component from scratch - you have to be a code addict like me to find joy in that.  So here's a little "gateway drug" for those of you who are itching for the capability.  Perhaps a little taste of it will expose you to the script's shortcomings, and lead you to build your own lab to cook up a custom component to do this job properly.
Location, Location, Location
The big key here, as mentioned by Darren Green in an earlier post on the subject, is where to put code to do this.  It's not in one of the automatically generated stubs - you'll have to tread into slightly less familiar territory - the ProcessInput method override.  The standard stubs (PreExecute, PostExecute, and Input0_ProcessInputRow) are all useless for our purposes - you can safely delete them or ignore them.  Why?  PreExecute doesn't yet have any information about the buffer, and PostExecute is too late (obviously).  Input0_ProcessInputRow is called once per row... but with a carefully wrapped row object that presents the columns in nicely named properties.  There's no way to access the collection of columns.  Only ProcessInput has access to the buffer in a condition that allows iterating over the columns.
Paste This Code

{
  
bool fireAgain = true;
  
ComponentMetaData.FireInformation(0, "",
    
Buffer.ColumnCount.ToString() + " columns",
    
"", 0, ref fireAgain);

  
while (Buffer.NextRow())
  
{
    
for (int columnIndex = 0;
      
columnIndex < Buffer.ColumnCount;
      
columnIndex++)
    
{
      
string columnData = null;
      
if (Buffer.IsNull(columnIndex))
      
{
        columnData
= "is NULL";
      
}
      
else
      
{
        BufferColumn columnInfo
= Buffer.GetColumnInfo(columnIndex);
        
switch (columnInfo.DataType)
        
{
          
case DataType.DT_BOOL:
            columnData
= Buffer.GetBoolean(columnIndex).ToString();
            
break;

          
case DataType.DT_WSTR:
            columnData
+= Buffer.GetString(columnIndex);
            
break;

          
// add code to support more data types here

          
default:
            columnData
= "";
            
break;
        
}
      }
      ComponentMetaData.FireInformation
(0, "",
        
"Column " + columnIndex.ToString() + ": " + columnData,
        
"", 0, ref fireAgain);
    
}
  }
  base.ProcessInput
(InputID, Buffer);
}
public override void ProcessInput(int InputID, PipelineBuffer Buffer)
What's Happening In There?
It doesn't take much explanation - but every little bit helps. 
Initially, a loop has to be constructed to iterate over all the rows we're given.  Do remember that this isn't ALL the rows coming through your Data Flow - it's just one buffer's worth.  This method will get called several times (unless you have very few rows in your flow).
The column then gets checked to see if it's NULL... because NULLs cause quite a problem when you don't expect to see them.
After that, a little inquiry is made to ask for some information about the column.  This bit of code can definitely be optimized out of the loop - it is a burden on the system to ask for it for each and every row!
The reason for getting a little information about the column is apparent in the next block - the switch statement that handles different datatypes differently.  You can extend the code to handle the data types you expect.
Finally, after the loop, don't forget to call the base ProcessInput method.  Why?  Press F12 on that call and you'll see - the base code handles marking the processing as being complete when you've finished seeing all the rows.
Variations On This Theme
The above code isn't the only way to get this done.  You can do away with manually detecting the column data type and simply call ToString on the buffer's indexer - as in:
string columnData = Buffer[columnIndex].ToString();
But do still beware of nulls and other odd results.  Using the "Get" methods specific to the data type do perform faster, and are safer in the long run.

Comments

Posted by Steve Jones on 3 May 2011

Nice one. This is a good trick to keep around in my toolbox.

Posted by mudzana on 4 May 2011

Nice

Posted by vishal.gamji on 5 May 2011

Kewl.

Posted by Richard Campbell on 5 May 2011

I have been looking or a way to do this fo a long time. There is a wy to do this with Input0_PrcocessInputRow but can be broken if you make any changes to the input columns after the script has been created.

Posted by Richard Campbell on 5 May 2011

I forgot to ask, can this be done ib SSIS 2005 (in VB script)? I would expect that it would be possible.

Posted by Richard Campbell on 6 May 2011

Having succeeded in implementing the code in VB today without too much trouble, I have a question. I can use this code to get the column values but how do I get the column names?

I am trying to develop a generic script that will consume rows of data and submit them to a Web Service. To do this I need the column names as well as the values.

Any help would be much appreciated.

Posted by tmcdermid on 29 May 2011

@Richard - It won't be easy for you to get the column names.  From the script... I'm not entirely sure it's possible, but it should be.

In the PreExecute method, you can iterate over the ComponentMetaData.InputCollection[0].InputColumnCollection collection.  Each element in there will have a .Name.  Once you've found the name(s) you're looking for, then you'll have to make a call to BufferManager.FindColumnByLineageId, and pass it the LineageID from the column you just found.  Unfortunately, I can't find a way to get at the PipelineComponent instance that is the ScriptComponent.  BufferManager is a property of PipelineComponent, but ScriptComponent doesn't expose it... even though it has to be an instance of PipelineComponent somewhere along a chain of inheritances.

If you're looking to do some dynamic stuff that relates to the column name, you probably have to code a custom component.  It's not terribly hard.  If you're the Richard Campbell based in Vancouver, drop by one of the VanPASS meetings with your code, and within a half hour, I'm sure I'll be able to take your Script code and turn it into a custom component...

Leave a Comment

Please register or log in to leave a comment.