Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

Devin Knight

Devin is a BI consultant at Pragmatic Works Consulting. Previously, he has tech edited the book Professional Microsoft SQL Server 2008 Integration Services and was an author in the book Knight's 24-Hour Trainer: Microsoft SQL Server 2008 Integration Services. Devin has spoken at past conferences like PASS and at several SQL Saturday events. He is a contributing member to the Business Intelligence Special Interest Group (SIG) for PASS as a leader in the SSIS Focus Group. Making his home in Jacksonville, FL, Devin is a participating member of the local users’ group (JSSUG).

Answers to Advanced SSIS Interview Questions

 

Last week I wrote a set of Advanced SSIS Interview Questions.  Here are the answers I came up for these.  Of course you may have some variation of what I have and I would love to hear how you would answer these.  Feel free to email me how you would answer them. 

I’ll skip the demonstrate/whiteboard part of the questions.

1. Demonstrate or whiteboard how you would suggest using configuration files in packages.  Would you consider it a best practice to create a configuration file for each connection manager or one for the entire package?

There should be a single configuration file for each connection manager in your packages that stores their connection string information.  So if you have 6 connection managers then you have 6 config files.  You can use the same config file across all your packages that use the same connections. 

If you have a single config file that stores all your connection managers then all your packages must have contain the connection managers that are stored in that config file.  This means you may have to put connection managers in your package that you don’t even need.

2. Demonstrate or whiteboard how checkpoints work in a package.

When checkpoints are enabled on a package if the package fails it will save the point at which the package fails.  This way you can correct the problem then rerun from the point that it failed instead of rerunning the entire package.  The obvious benefit to this is if you load a million record file just before the package fails you don’t have to load it again.

3. Demonstrate or whiteboard using a loop in a package so each file in a directory with the .txt extension is loaded into a table.  Before demonstrating this tell which task/container accomplishes this and which enumerator will be used.  (Big hint on which task/container to use is that it requires and enumerator)

This would require a Foreach Loop using the Foreach File Enumerator.  Inside the Foreach Loop Editor you need to set a variable to store the directory of the files that will be looped through.  Next select the connection manager used to load the files and add an expression to the connection string property that uses the variable created in the Foreach Loop.

  1. Demonstrate or whiteboard how transactions work in a package.

If transactions are enabled on your package and tasks then when the package fails it will rollback everything that occurred during the package. First make sure MSDTC (Microsoft Distributed Transaction Coordinator) is enabled in the Control Panel -> Administrative Tools -> Component Services. Transactions must be enabled not only on the package level but also on each task you want included as part of the transaction. To have the entire package in a transaction set TransactionOption at the package level to Required and each task to Supported.

  1. If you have a package that runs fine in Business Intelligence Development Studio (BIDS) but fails when running from a SQL Agent Job what would be your first guess on what the problem is?

The account that runs SQL Agent Jobs likely doesn’t have the needed permissions for one of the connections in your package. Either elevate the account permissions or create a proxy account.

To create a proxy account you need to first create new credentials with the appropriate permissions. Next assign those credentials to a proxy account. When you run the job now you will select Run As the newly created proxy account.

  1. What techniques would you consider to add auditing to your packages?  You’re required to log when a package fails and how many rows were extracted and loaded in your sources and destinations.

I like to create a database that is designated for package auditing. Track row counts coming from a source and which actually make it to a destination. Row counts and package execution should be all in one location and then optionally report off that database.

There are also third party tools that can accomplish this for you (Pragmatic Works BI xPress).

  1. What techniques would you consider to add notification to your packages?  You’re required to send emails to essential staff members immediately after a package fails.

This could either be set in the SQL Agent when the package runs or actually inside the package you could add a Send Mail Task in the Event Handlers to notify when a package fails.

There are also third party tools that can accomplish this for you (Pragmatic Works BI xPress).

  1. Demonstrate or whiteboard techniques you would use to for CDC (Change Data Capture)?  Tell how you would write a package that loads data but first detects if the data already exists, exists but has changes, or is brand new data for a destination.

If for some reason you’ve avoided using a whiteboard to show your ideas to this point then make sure you start on this question! For small amounts of data I may use the Slowly Changing Dimension.

More often than not the data is too large to use in such a slow transform. I prefer to do a lookup on the key of the target table and rows that don’t match are obviously new rows that can be inserted. If they do match it’s possible they are updates or duplicates. Determine this by using a conditional split comparing rows from the target to incoming rows. Send updates to a staging table that can then be updated in an Execute SQL Task.

Explain that putting updates in a staging table instead of updating using the OLE DB Command is much better for performance because the Execute SQL Task performs a bulk operation.

  1. Explain what breakpoints are and how you would use them.

Breakpoints put pauses in your package. It’s a great tool for debugging a package because you can place a breakpoint on a task and it will pause the package based on execution events.

A reason in which I have used breakpoints is when I have a looping container and I want to see how my variables are changed by the loop. I would place a watch window on the package and type the variable name in. Set a break point on the container the stop after each iteration of the loop.

Comments

Posted by villersk on 14 June 2009

question 1: I like to have a seperate config for each connection, but for all the records to be stored in one config table in sql server. very clean that way.

also nice for a dev-test-prod environment where there is one config file at each server

Posted by knight_devin@hotmail.com on 14 June 2009

That's a great option as well.  I like it mainly for security reasons.

Posted by surya on 17 June 2009

First of all thanks for your master piece

I too prefer for a single config file for many reasons, all settings will be at one place, so when migrating from development to production server, it will be easy to configure new package.

I never worked on check points, is it some thing like break point in visual studio, how it will works??

Posted by knight_devin@hotmail.com on 17 June 2009

In theory it is like breakpoints.  

When you use checkpoints though you specify a directory where a file can save the point at which your package is at.  Example file below.  There are a couple other tricks to it.  I'll try and get another blog out about using checkpoints.  Jamie Thomson has a great blog about it.  

blogs.conchango.com/.../SSIS_3A00_-Checkpoint-clarification-_2D00_-Recovering-failed-packages.aspx

<DTS:Checkpoint xmlns:DTS="www.microsoft.com/SqlServer/Dts" DTS:PackageID="{071E8FA0-5B66-40AE-A343-F136E7BA65DE}"><DTS:Variables DTS:ContID="{071E8FA0-5B66-40AE-A343-F136E7BA65DE}">

<DTS:Variable>

<DTS:Property DTS:Name="Expression"></DTS:Property>

<DTS:Property DTS:Name="EvaluateAsExpression">0</DTS:Property>

<DTS:Property DTS:Name="Namespace">User</DTS:Property>

<DTS:Property DTS:Name="ReadOnly">0</DTS:Property>

<DTS:Property DTS:Name="RaiseChangedEvent">0</DTS:Property>

<DTS:Property DTS:Name="IncludeInDebugDump">6789</DTS:Property><DTS:VariableValue DTS:DataType="11">0</DTS:VariableValue>

<DTS:Property DTS:Name="ObjectName">FileExistsFlag</DTS:Property>

<DTS:Property DTS:Name="DTSID">{D8A49A3C-CD0C-454C-9DFD-796118434F50}</DTS:Property>

<DTS:Property DTS:Name="Description"></DTS:Property>

<DTS:Property DTS:Name="CreationName"></DTS:Property></DTS:Variable>

<DTS:Variable>

<DTS:Property DTS:Name="Expression"></DTS:Property>

<DTS:Property DTS:Name="EvaluateAsExpression">0</DTS:Property>

<DTS:Property DTS:Name="Namespace">User</DTS:Property>

<DTS:Property DTS:Name="ReadOnly">0</DTS:Property>

<DTS:Property DTS:Name="RaiseChangedEvent">0</DTS:Property>

<DTS:Property DTS:Name="IncludeInDebugDump">2345</DTS:Property><DTS:VariableValue DTS:DataType="8">C:\Test.txt</DTS:VariableValue>

<DTS:Property DTS:Name="ObjectName">FilePath</DTS:Property>

<DTS:Property DTS:Name="DTSID">{D13EB993-DEE1-4C62-8D6D-D2E60CA29AA7}</DTS:Property>

<DTS:Property DTS:Name="Description"></DTS:Property>

<DTS:Property DTS:Name="CreationName"></DTS:Property></DTS:Variable></DTS:Variables><DTS:Container DTS:ContID="{5A260F2C-C3BF-4C5C-B35B-CFA0598EAE43}" DTS:Result="0" DTS:PrecedenceMap=""/></DTS:Checkpoint>

Posted by dbmaster on 18 June 2009

Good information. I see lots of content what it can do.

I would like to know what file structures are not suitable for SSIS.

Records, or units of information, can be packaged on one row or on several rows join logically by tags.

Also details can be repeated (lines of an invoice) while the header (invoice date,amount, Billing Address) can be on the first row of the invoice document record set. There of course could be multiple sets(invoice) per physical file.

Can tasks be parent and child tasks where the parent peals off the header stuff and the child loops through the details?

Thank You.

Posted by knight_devin@hotmail.com on 19 June 2009

Maybe you can give me a little more details about what you're interested in doing.  Feel free to email me with details I'd love to help.  dknight@pragmaticworks.com

Posted by Jay Mitchell on 22 June 2009

Hi there. I was interested in question 1 because I have worked in a scenario before where all productin connections were in one config file (SSIS_Connection.config) on which many packages depended. I don't recall ever having problems.  What exactly was the issue that arose from this?

Jay

Posted by Jay Mitchell on 22 June 2009

Just to be clear that one files was for SQL Server connections

Posted by knight_devin@hotmail.com on 22 June 2009

The only reason I suggest having multiple files, or one for each connection manager, is because if you do the one file approach then you have to add every connection manager that's referenced in the config file to all the packages even if they're not needed.  

If you have a connection string in a config file for a connection manager that doesn't exist in the package the package will error.

If you know every package using the config file has the same connection managers then you should be in good shape to use one file.

Posted by Jay Mitchell on 22 June 2009

SSIS does throw a warning that a connection is not being used however all of my packages here at my current work place reference a single connections config file with no problem.

e.g Description: The connection "ClickMart_US" is not found. This error is thrown by Connections collection when the specific connection element is not found.  End Error  Error: 2009-06-22 13:00:01.28

Apart from this the package will continue to run ok.

We've just found it easier and a bit tidyer to keep all the server connections together and having a file per environment.  Hopefully with explanation I'd have squeezed through Q 1 :)

Leave a Comment

Please register or log in to leave a comment.