Data Transfer from Azure Blob Storage to Snowflake Using Pipelines

By:   |   Updated: 2023-07-18   |   Comments   |   Related: > Azure Synapse Analytics


Problem

One of the premises of a modern warehouse is to be able to interconnect cloud services and process and analyze data from any provider with ease. Two of the top cloud services right now are Azure Synapse and Snowflake. Whatever your organization is implementing or if you need to transfer data from/to those services, it is important to know how to interconnect them and be able to transfer data between services.

Solution

Snowflake is a native-cloud, self-managed service for enterprise-level modern data applications. It is becoming one of the top data services in the world because of its capabilities and ease to use.

Azure Synapse is an enterprise-level analytics service. It provides analytics and big data capabilities using multiple languages and engines, with native support with other services like PowerBI, Databricks, and machine learning and cognitive services.

In a previous tip, we discussed how to pull data from Snowflake to Azure and in this tip will have two simple examples showing how to copy data from Azure Blob Storage (or any other source if you understand the basic steps) to Snowflake.

Setting Up the Example

For this example, we assume we have a Synapse Analytics service running, a Snowflake account, and at least one blob storage account.

How to create a Synapse Analytics service is outside the scope of this tip, but you can go to your Azure account at https://portal.azure.com and create a Synapse service:

Azure marketplace to create synapse service

For a Snowflake account, you can go to https://www.snowflake.com and create a new account using the Start for Free button:

Create a new snowflake trial account

For this example, I have some sample CSV files in my Azure Blob storage that we will use to load into Snowflake.

Azure Blob storage list

Also, we will use a SAS token and URL to access the blob storage for this example. You can configure it in the Azure portal here; for permissions, we will use Read and List:

Configure SAS permissions Storage Account

Example: Load File from Azure to Snowflake Using Snowflake Stages

Setting Up Snowflake Stage

A stage is a place in Snowflake where the data files can be stored before loading them to a table. For our first example, manually loading files directly into Snowflake, we must define a stage to Azure blob storage.

Login to Snowflake and create a new worksheet if you plan to use web GUI:

New Snowflake worksheet

First, we need to create a schema and a table if you don’t have it yet:

CREATE SCHEMA Staging;
 
------------------------------------
 
DROP TABLE IF EXISTS Staging.DimAccount;
 
CREATE TABLE Staging.DimAccount
(
   AccountKey int NOT NULL,
   ParentAccountKey int NULL,
   AccountCodeAlternateKey int NULL,
   ParentAccountCodeAlternateKey int NULL,
   AccountDescription nvarchar(50) NULL,
   AccountType nvarchar(50) NULL,
   Operator nvarchar(50) NULL,
   CustomMembers nvarchar(300) NULL,
   ValueType nvarchar(50) NULL,
   CustomMemberOptions nvarchar(200) NULL
);

Remember that the table should match the format of the file you plan to use.

Then we create a file format where we will define the CSV properties. Use this link to learn more about Snowflake file formatting: Create File Format.

----------------------------------
CREATE OR REPLACE FILE FORMAT CSV_tips
  TYPE = CSV
  FIELD_DELIMITER = '|'
  SKIP_HEADER = 0
  NULL_IF = ('NULL', 'null')
  EMPTY_FIELD_AS_NULL = true;

In our case, no header and a pipe delimiter.

Then, and this is one of the most important parts of the example, we set up the stage.

For URL, we use the location of our blob storage + the folders we want to retrieve the data and replace the “HTTPS://” part with “azure://”.

In my case, this is my URL:

Obtain Blob Storage URL

So, I need to add /CSV/ at the end of it and replace the https for azure.

For the Azure SAS token, use the one we generated earlier (BLOB SAS TOKEN).

And last, for file format, use the one we created in the previous step:

CREATE OR REPLACE STAGE Staging_files
    url='azure://saeduardo.blob.core.windows.net/data-files/CSV/'
    credentials=(azure_sas_token='<your blob SAS token>')
    file_format = CSV_tips;

Once created, and if the permissions and path are ok, you can verify that you can access the files by listing the stage:

list '@Staging_files';
Validate Stage creation

Once you have validated your access, we can copy the file to the table. First, we validate that the table is empty:

SELECT * FROM Staging.DimAccount LIMIT 500;
Check table is empty

Next, we load the data from the stage to the actual table using the COPY INTO command:

COPY INTO Staging.DimAccount FROM '@Staging_files/DimAccount.csv';
Output of the COPY command

We can see that the copy was successful. We can browse the data by running the SELECT again:

Validate that the data has been copied

Our first example was successful.

If this meets your needs and you want to schedule this simple approach, you can use TASKS inside Snowflake.

Example: Copying Data from Snowflake to Blob Storage Using Synapse Analytics

We have validated that we have communication from Azure to Snowflake and verified that the information is ok.

What if, like the title of this tip, you want to add the copy task inside a Synapse pipeline?

We need to create the linked services for our blob storage and Snowflake. Check out how to do this in this tip: Data Transfer from Snowflake to Azure Blob using Synapse Analytics.

Note: A the time of this writing, only SAS authentication is supported to direct copy from Blob Storage to Snowflake. You need to add these permissions in the storage account for this to work from Synapse:

Generate SAS token

Once you have created your linked services, we can create the test pipeline.

Create Linked Services

Creating the Pipeline

In Synapse Studio, go to Integrate, and create a new pipeline:

New Pipeline

Provide a descriptive name:

Name of the pipeline

Add a copy data task and add a meaningful description:

new Copy Data Task

In Source, select a new blob storage, and select CSV file:

Configuring CSV source

Create the file with the linked service selected, and configure the CSV file format:

Configuring CSV source , file format

TIP: You can use the Detect Format option to help fill in the fields.

For the Sink, select the Snowflake connection, locate your linked service, and select the table where you want to load the data:

Configure SnowFlake Sink

Leave everything else as default and validate/publish your pipeline:

Validate and publish pipeline

Once the pipeline is published, we can test it with a manual trigger. But before that, since we are using the same Snowflake table, we need to truncate it in our worksheet:

Truncate table on snowflake before testing again

Run a manual trigger or a debug, as you prefer:

Execute Pipeline

If everything is ok, the pipeline will execute successfully:

Execution Successfull

Now, we need to validate it again inside Snowflake. We have the data copied:

Snowflake validation
Next Steps
  • Now that you know how to copy data from/to Snowflake to Synapse, you can create more robust pipelines, for example, incremental or with additional business logic and validations.
  • The basic idea was explained using blob storage, but you can easily change it to use any other service (AzureSQLDB, CosmoDB, etc).
  • You can check Azure Synapse documentation.
  • You can check Snowflake documentation.
  • Check other Synapse tips.


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Eduardo Pivaral Eduardo Pivaral is an MCSA, SQL Server Database Administrator and Developer with over 15 years experience working in large environments.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips


Article Last Updated: 2023-07-18

Comments For This Article

















get free sql tips
agree to terms