In my next series of posts, I am going to look into processing files in Azure using the Azure Data Factory. Before processing the files, though, the files first have to get to Azure Blob Storage so that they can be picked up by the Azure Data Factory. In order to copy the files to Azure, I used the Microsoft AzCopy utility and a SQL Server Job. In this post, I'll go into detail about how I made everything happen.
The first thing that needs to be done is to install AzCopy on the SQL Server that is going to run the job. AzCopy is a command-line utility that allows you to copy files into Azure. Information about the syntax can be found here while the download itself can be found here. Once you download the installer and click “Next” a couple of times, you will have AzCopy installed on your server.
The next thing that needs to be done is to create a new Azure Storage Account and Container, which is easy to do using the Azure Preview Portal. Click on “Storage Accounts” on the left hand navigation and the click Add. You will then just have to provide the name, and the new storage account will be created.
After creating the Azure Storage Account, you will be taken to the Storage Account Page with information about the new account. At this point, you are going to want to copy the "Primary blob service endpoint" and paste it into Notepad (or Sublime Text which is just awesome) as we will need it later. We are going to store our files in Azure in Blob Storage. Blob storage is great for storing large amounts of unstructured data, which in this case is the import file. Within Azure Blob Storage, files must exist in Containers. In order to create a Container from the Storage Account Page, click on Blobs then Containers then Add.
From here, you will just have to enter a name for the Container. Remember to leave the Access Type as Private so that others cannot access the files in this container.
One other thing that you are going to need to collect at this point is the Primary Access Key for the container. From the Container Page, click Settings->Keys and copy the Primary Access Key.
Creating the Job
Now that we've installed AzCopy and created an Azure Storage Account and Container, we are ready to create a job using AzCopy to copy the file to Azure Blob Storage. In order to test out AzCopy, we are going to use AzCopy with the following parameters:
|/Source:||The folder of the file that we are copying to Azure Blob Storage. In this case we are using C:Upload|
|/Dest:||The "Primary blob service endpoint" + "/" + container for example https://storagename.blob.core.windows.net/containername|
|/DestKey:||The Primary Access Key that we copied above.|
|/Pattern:||The name of the file that we are going to upload to Azure Blob Storage. In this case we are using CustomerLoad.csv.|
By combining all of these commands we get
We can test this out easily enough using the Microsoft Azure Storage Command Line. We can see that it worked successfully because the “Total files transferred” is 1.
After we've tested that the upload works correctly, we would finally have to create a SQL Job with a cmdExec Job Step with the following as the Command Text:
The cmdExec Job Step will appear as follows:
It is pretty straight forward to use AzCopy to upload files to Azure Blob Storage. In the next couple of blog posts I'm going to have a look at using the Azure Data Factory to process the file.