Moving Data From Excel to SQL Server – 10 Steps to Follow

SQL Server Integration Services provide a versatile way of reading Excel files into SQL Server. A task like this illustrates the advantages of the graphical approach of SSIS. Andy Brown explains.

You need to create a package to import data from an Excel workbook into a SQL Server table, using SQL Server Integration Services as a tool. What could possibly go wrong? Well … plenty, as it turns out. The aim of this article is to help people to avoid all the pitfalls that I fell into when first learning SSIS.

This article uses SSIS 2012, the latest version at the time of writing, but you won’t notice many differences if you’re using 2008 or 2005. The article assumes that you’re using SQL Server Data Tools – Business Intelligence within Visual Studio 2012 to create SSIS packages: Business Intelligence Development Studio (BIDS) was used until SQL Server 2012. Do Microsoft have a whole department devoted to thinking up misleading names for software?

Our Simple Example

Let’s suppose that you want to import an Excel workbook of purchases into a SQL Server table (you can download this workbook at the bottom of the article):

1808-d89ee1bd-b126-49b4-936d-1bed37932d6

Our purchase ledger workbook, complete with2 unwanted title rows, a useful heading row and a couple of awkward blank rows at the bottom.

You could use Excel to manually delete the top two title rows and bottom two blank rows to make life easier, but this would be kind of cheating (and also kind of pointless, since whatever application produced the Excel workbook of purchases would just recreate the unwanted rows next time you ran it). To quote the words of Caiaphas in the musical Jesus Christ Superstar: “we need a more permanent solution to our problem”.

1808-image2.png

Here’s what we’d like our final SQL Server table to contain: the 5 purchases.

Before you begin, make sure that you’ve closed your Excel workbook down. If you run any SSIS package to import data from an Excel workbook which is open, you will get some horribly misleading error messages.

Step 1 – Create a Project

Before you can play about with data (sorry: extract, transform and load it), you need a project to do it in. Go into SQL Server Data Tools or Visual Studio, then choose to create a new project:

1808-1-17b7680e-f00e-4cef-ac11-768bc4b5b

You may be able to miss out this step if you’ve just gone into SSIS for the first time.

At the top of the dialog box which appears, make sure you create the right sort of project:

1808-image4.png

Choose to create a business intelligence SSIS project.

You can then give your project an exciting name (at least, more exciting than Integration Services Project1, which is what I’ve used!):

1808-image5.png

Choose a name and location for your new package.

SSIS will now create a new project, and also (by default) a new package too, imaginatively called Package.dtsx. There are two ways you can see that this is what’s happened. One is that you can see the package in Solution Explorer:

1808-1-f317090f-0fbb-4b0e-a853-75e2294d7

The new package created on your behalf (if you can’t see Solution Explorer, choose‘View SolutionExplorer’ from the menuto show it).

The other clue that SSIS has created a package for you is that it’s staring you in the face!

1808-image7.png

By default you are put in Control Flow view, which is like a flow diagram showing the sequence in which tasks that you create will execute.

Step 2 – Create a Connection to your SQL Server Database

Before you continue, you need to make sure that you’ve created a connection to your SQL Server database. To do this, first right-click on the ‘Connection Managerspart of Solution Explorer:

1808-1-0e529127-3db0-4363-bea0-27cc9c8f3

Right-click to create a new connection manager.

The most efficient way to link to SQL Server is using an OLEDB connection manager:

1808-1-3f814902-0fcd-4267-a0f8-820dc5624

Choose to add an OLEDB connection manager.

Now click on the ‘New… button to create your new connection manager:

1808-image10.png

Create a new connection manager as above.

Choose your server, authentication method and database on the next screen, then select ‘OK’ twice to see your new connection manager listed:

1808-e9347ed2-14f5-4a17-9dbe-55c952b9e1a

It makes sense to create this connection manager for the entire project, since it’s likely you’ll use the same connection in other packages within the same project.

I haven’t shown any more details about this here for two reasons: the settings will be different on your machine, and anyone reading this article is likely to have created connections many other times in many other software applications!

Step 3 – Create a Table

You can’t import data into a non-existent table, so the next thing we’ll do is to create the table shown below. We could do this manually within SQL Server Management Studio, but we’re aiming for an automated solution which we can run time and time again, so instead we’ll create the table as part of our SSIS package.

1808-f2d7ec33-9d7e-4adb-bdd0-c6b514d46bf

Our table will look something like this: we’ll import the item name, price and quantity, but the purchase id will be generated automatically. As for the total in column E of our spreadsheet – we’ll just choose not to import that, since it can be recreated by multiplying the Price and Quantity columns at any time.

To create the table, first double-click (or click and drag) on the ‘Execute SQL’ task to add a task to the control flow which should be visible on screen (we want to create the shell table within this task):

1808-ab3e59c7-57c0-417c-84d0-2ea610723c6

This task will run some SQL to remove any existing purchases table, and create a new one.

I tend to give my tasks long, descriptive names (geeks may prefer to use shorter meaningless names!):

1808-700ad6ea-a2b4-47cb-b2e7-352df89b28a

The Execute SQL task after renaming it.

You can also add sort-of-comments to packages using something called annotations:

1808-1-74dd2a71-4817-4776-8674-d3026de5b

You can right-click to add an annotation to your package – they appear like post-it notes:

1808-1-29aacd59-008d-4798-89a1-aac2f5281

Anyway, returning to the main story, you can now edit your Execute SQL task:

1808-a4b42d0c-62da-44d8-abf7-fb41b03e6cf

The easiest way to edit any SSIS task is to double-click on its icon, although you can also right-click on the task and choose ‘Edit… as above.

In the dialog box which appears, choose to connect to your database, using the connection manager that you’ve just created at project level:

1808-ca21e765-13c3-456f-b743-e775466f7a3

You can use a project-level connection manager in any package.

You can now enter the SQLStatement property, specifying the SQL that SSIS should run for this task. Here’s what I’ve used for this article.

This will first delete any table called tblPurchase which already exists, and then create a new, empty one. The PurchaseId column is an identity one, which will automatically take the values 1, 2, 3, etc. Here’s what the Excecute SQL task dialog box now looks like:

1808-64b87269-b10b-424d-9193-6196a3d7248

The SQLStatement property as it appears after you paste in the text.

It’s time now to test that this works by running your single-task package:

1808-1-17e88b84-fc60-4862-8221-2e2c1cdd0

Right-click on the package name in Solution Explorer and choose to execute it as shown here. SSIS will save your package automatically before executing it.

If all goes well, you should see this:

1808-1-58fbfe89-a167-4b82-b6ea-3e5f52931

The green tick means things went well!

If your package doesn’t run at this point, you may be trying to run it on a 64-bit computer. The default mode in SSIS on a 64-bit SQL Server installation is 64-bit. In this case, you have to specifically change the mode to run a package. I don’t want to clutter this article up with an explanation of how to do this so please refer to this article for how to do this.

You should now have a table, which you can view in SQL Server Management Studio if you should so wish:

1808-1-0cff8bf1-06ea-4c9b-8edd-418972ef2

The table has no records in – yet!

You now need to stop the package running:

1808-1-904ea390-d904-4f79-9257-cfc118632

Select the menu option (or press the keystroke above) to stop your package running, and wave goodbye to the green ticks for now!

It’s now time to create the data flow tasks – although first we need to create an Excel connection.

Step 4 – Create an Excel Connection

Before you can import data from an Excel workbook, you will need to create a connection to it. You should probably create this connection within your package, as it’s likely that this’ll be a one-off (you won’t need to use the same connection in any other package):

1808-0e68ab05-0c0e-4f5b-9119-724be5dce00

Right-click in the ‘Connection Managers section of your package, and choose to create a new connection.

Note that you could alternatively use the ‘Source Assistant to do this, but I always like to do things explicitly:

1808-1-15dbb8f2-9390-46dc-9424-e983147d7

The source assistant as it appears in the SSIS toolbox – it’s not the Voldemort of wizards, but it’s not the Dumbledore either.

You can now choose to create an Excel connection:

1808-7ee2252f-8501-4c0e-b68f-9548f2f14ac

There are a few other choices!

Browse to your Excel workbook and choose it:

1808-30da342b-a390-447c-bd5a-56867c815d0

Leave theFirst row has column namesoption ticked.

When you select ‘OK‘, you should see your Excel connection:

1808-9c59392a-d477-4d02-89ef-c4939de59bc

You could rename this connection manager, but we’ll leave it as it is.

Step 5 – Create a Data Flow Task

It’s time now to start the real work! We want to add a data flow task to the control flow tab of your package. This new data flow task should import data from the Excel workbook into the SQL Server table (although as we’ll see, things can go awry at this stage).

1808-5599e1b6-4d8b-4555-898f-6d6e2a8d636

Add a ‘Data Flow task to your package, and rename it to say ‘Import data (as above).

You now need to get the two tasks shown to run in sequence; you’ll firstly want to create a new table to hold your purchases, and then import the data into it. To do this, click on the first task and drag its arrow onto the second. This arrow is called a precedence constraint.

1808-a5016e7f-2716-40ea-a1a4-0b888abd13e

Connecting the tasks, so they follow on from each other.

You can now double-click on the data flow task to edit what it does – we’ll spend the rest of this article in the data flow tab of SSIS:

1808-c32b0f58-f395-43f0-ac84-a56b6e04f75

The (as yet empty) Data Flow tab for the Import data task.

Step 6 – Creating the Excel Source

Data has to come from somewhere, and in our case it’s from Excel:

1808-50d10e97-cb67-47cd-a50b-4cc6a79b7ff

Drag an Excel Source from the SSIS toolbox onto your empty data flow window (here we’ve also then renamed it).

You can now double-click on this source to edit it, and tell SSIS where it should get its data from:

1808-cfccf307-b029-4b1e-b994-63a6cf1d926

SSIS will automatically guess the connection manager if you only have one Excel connection manager for this package/project, but you’ll still need to choose the worksheet name (as shown above).

It’s a good idea now to preview your data, by clicking on the ‘Preview…button:

1808-imagesummat.png

We’ve got obvious problems with our first 2 and last 2 rows, but we can solve these by losing any rows for which the first column is null, which we’ll do shortly using a conditional split transform.

It’s a good idea now to rename all of the columns, so that you know what they refer to:

1808-1-93dd8d89-f3eb-4945-bbe2-a6664fbb2

Click on the ‘Columns‘ tab (as shown above), then give the output columns better names, as we’ve done here.

When you select ‘OK‘, you should have an Excel source with no errors shown for it:

1808-bfbec0be-5fae-4e55-9e3f-2c52e1fa3d0

Now to do something with this data!

Step 7 – Removing Rubbish Data

The next thing we want to do is to divert all of the purchases with nulls in to… nowhere, really! To do this, add a conditional split transform to your data flow:

1808-4ab3e953-acf1-451f-b554-af1b321f994

Add a Conditional Split as above (here we’ve renamed it also, to lose the nulls), and direct the output (or “data flow path”, if you want the technically correct name) from the Excel course into it.

You can now double-click on the ‘Conditional Split task to configure it. We’ll set up two flows out of it:

Data where the id column is null will go down a pipe called ‘Case 1 (which we won’t actually connect to anything); while

All other data will flow down a pipe called ‘OK Data’.

Here’s how to set this up:

1808-eefde527-76d6-4335-bd58-3934087777b

Set up an output (called ‘Case 1 by default) which tests the condition that the Id column is null. You can drag the ISNULL function and Id column down into the ‘Condition box to avoid having to type them in.

At the bottom of this dialog box you can type in a name for your default output:

1808-image37.png

Here we’ve called the default output ‘OK Data’.

Step 8 – Piping the ‘OK Data’ into a SQL Server Table

We should be getting near the end of our journey now – all that we should need to do is to send the good data into our purchases table. Here’s how to do this:

1808-c03efd8f-64f6-4ede-a658-a6b93cd57df

Add an OLE DB destination (as shown above) – here we’ve renamed ours asPurchases table.

You can now drag the green arrow from the ‘Lose the nulls transform into the Purchases table destination:

1808-3f928d93-5fdb-435c-a6d6-2088f469339

When you release the arrow, you’ll be asked which output you’re choosing: ‘Case 1 or ‘OK Data’ (the two outputs from the conditional split). Choose ‘OK Data’.

Having mapped data into the purchases table, it’s now time to configure it. Double-click on the Purchases tabledestination to edit it:

1808-e97d626b-fb28-4d97-98f3-d7e0a5da341

Firstly, choose the connection manager to use (although you probably won’t have to do this, as SSIS will assign it automatically if you’ve only got the one), and the table to target.

You can now choose which columns from Excel to map onto which columns in the SQL Server table:

1808-f8e4fe76-ec7d-4b01-9362-d9b52bc4e00

Be warned – the Item column will cause a problem soon… !

Here’s what you’ll be left looking at when you choose OK:

1808-1ac62b0e-77f9-4906-aa51-42acd2a12b9

There’s a problem with the ‘Purchases tabledestination.

If you mouse over the red circle, you’ll see what the problem is:

1808-b8a3d7ae-e6b1-49ed-a9b0-1d317a7be9d

The problem is that Excel uses Unicode data, and we’ve created a varchar column in SQL Server.

Step 9 – Identifying and Solving the Data Conversion Problem

When you’re creating columns in SQL Server, you can use either nvarchar or varchar for variable length strings:

1808-d36e29d4-4db7-419e-bcdc-c16796304fe

The varchar data type uses half the amount of bytes that nvarchar uses, because it can’t store extended characters. We could have used nvarchar and avoided this problem!

The benefit of using Unicode is that it allows you to store international characters: currently over 110,000 different characters from over 100 scripts, according to Wikipedia.

However, we used varchar, so we need to convert our Excel Unicode characters into normal ones. To do this we can use a ‘data conversion task. First, however, we need to break the link we’ve created:

1808-78d762b2-5fdc-4cdc-a2a8-2bc02678ac5

Right-click on the link between the transformation and the destination and delete it.

You can now add in a ‘data conversion task:

1808-f633fdad-d529-4132-937a-566575daf8f

Here we’ve added a ‘Data Conversion task (shown selected on the left), and renamed it to ‘Turn Unicode into varchar. The next thing is to pipe our data into it:

1808-a64bd846-16e8-41e2-8b1c-77270504a1e

Pipe the ‘OK Data’ from the conditional split transform into this further data conversion transform.

You can now double-click on the ‘Turn Unicode into varchar data conversion task, and say what it should do:

1808-a952a7d5-99b0-4ddb-b8bd-ac42b0b5b67

Here we’ve chosen to create a new column called ItemVarchar, which takes the Item column and turns it into a non-Unicode string using the default ANSI code page.

I’ve also changed the length to 50 characters at this point. This will mean that strings longer than 50 characters will be truncated, giving rise to truncation errors. Dealing with these is beyond the scope of this article – for now it’s sufficient to note that none of the purchase descriptions is long enough for our example to cause us any worries.

Nearly there! You can now take the output from this data conversion task and feed it into the Purchases table destination:

1808-fb4ae61c-c239-422f-92c8-d96f13e5baa

We’ve still got an error, as we haven’t redone the column mappings for the destination.

You can now double-click on the Purchases table destination to configure the column mappings:

1808-483097a1-2eb0-4f8b-af39-d89f3cd2dfc

Choose to map the newly derived ItemVarchar column onto the ItemName column in the SQL Server table.

All of your errors should now have disappeared, and you can run your package!

Step 10 – Running the Package

The final step is to import your data by executing the package:

1808-a6239b1d-ff31-4c5e-9228-aabd1d28051

Right-click on the package in Solution Explorer to execute it (wish we’d renamed it …).

Here’s what the data flow should look like:

1808-4b3b4648-b97e-40ac-8a71-8eace2b487a

Yeah! It’s four ticks from SSIS!

You should now have 5 purchases in your tblPurchase table:

1808-9a77d4d5-d1f2-40c4-9631-863385c3e86

OK, it would have been quicker to type them in on this occasion, but you’ve now got a package which you can run every month-end, and which will work whether there are 5 purchases or 500,000.

Conclusion

Integration Services is just one of those software applications which is a joy to use. I hope this has encouraged you to use it to automate moving data around in your company. There’s nothing quite so satisfying as seeing the green ticks appear next to all of the tasks in your packages when you run them!