Export/Import Data using BCP - SQL Server on Linux

,

Introduction

Loading smalls set of data into SQL Server has been always easy on Windows. Moreover if you like GUI based tools, there are plenty of options. We have SSMS, Azure Data Studio, Visual Studio (SSDT), SSIS, Toad for SQL Server, RapidSQL to name a few. If you are interested to learn how to use Azure Data Studio, I recommend you look at my previous article.

Now, let's take a step back and imagine you have no access to any of these GUI based tools for a moment. To make things complex, one of your clients is asking you to export/import data from/into a SQL Server instance running on Linux (VM or container).

Luckily for us, the BCP (Bulk Copy Program) utility is available for Linux as part of the command line tools package called mssql-tools.

Installing mssql-tools

The mssql-tools package comes installed by default when using a Docker container image of SQL Server, unfortunately, this is not what happens when SQL Server is manually installed on a Linux VM or bare metal machine. We have to install mssql-tools separately.

I will be using Ubuntu for this article, this is very important to note because the installation steps are a little bit different if you want to use RedHat.

The first step is to register the public repository key used by apt to authenticate the packages, then we just simply register Microsoft's official repository for Ubuntu, here are the commands:

[dba mastery] $ curl https://packages.microsoft.com/keys/microsoft.asc | sudo apt-key add -
[dba mastery] $ curl https://packages.microsoft.com/config/ubuntu/16.04/prod.list | sudo tee /etc/apt/sources.list.d/msprod.list

Now that we have the repository ready, we have to update the source list and then proceed with the installation of the mussel-tools package as follows:

[dba mastery] $ apt-get update 
[dba mastery] $ apt-get install mssql-tools

Once the installation is complete, we have sqlcmd and bcp ready to use. Both command-line utilities will be installed under this path: /opt/mssql-tools/bin/. As an additional step, you can add this folder as part of the $PATH environment variable, but I will skip that part for now.

Exploring data

There are multiple sources to get CSV files with sample data all over the internet, however, in this example I will use the filename "hw_25000.csv". The file in question is composed of biometric information from 25,000 individuals separated in three columns: ID, height and weight.

Because I will be using command line base tools I want to make sure my data is clean, so the very first thing I want to do is to learn more about my dataset. Checking the first and last ten rows of the file will be the easiest way to do a quick check, I will use the head and tail commands to make that happen.

I would check the first ten rows first using the "head -10" command followed by the filename:

[dba mastery] $ head -10 hw_25000.csv 
1, 65.78331, 112.9925
2, 71.51521, 136.4873
3, 69.39874, 153.0269
4, 68.2166, 142.3354
5, 67.78781, 144.2971
6, 68.69784, 123.3024
7, 69.80204, 141.4947
8, 70.01472, 136.4623
9, 67.90265, 112.3723
10,66.78236,120.6672

At glance, the first column data corresponds to an ID or some kind of correlative number. The second column displays height data, while the third columns holds the weight data. I can confirm, I don't have dirty data in the first ten rows.

Let's move to check the last ten rows of the file now. This is pretty much the same as the "head" command, specifying the number of rows we want to see but in reverse. That means we will see the last ten rows of the file:

[dba mastery] $ tail -10 hw_25000.csv 
24991, 69.97767, 125.3672
24992, 71.91656, 128.284
24993, 70.96218, 146.1936
24994, 66.19462, 118.7974
24995, 67.21126, 127.6603
24996, 69.50215, 118.0312
24997, 64.54826, 120.1932
24998, 64.69855, 118.2655
24999, 67.52918, 132.2682
25000, 68.87761, 124.8742

This concludes, the first and last ten rows of my file has accurate and clean data. The most important, I was able to confirm the ID, height and weight structure is looking good.

Let's move on to import some data, using the "hw_25000.csv" source file with BCP.

Pre-requisites

  • Have a database created before the import process
  • Have a target table created before the import process

Importing data with BCP

I will provide you with the basic example of using BCP to import data in Linux. There are many more complex scenarios, for what I strongly recommend to check BCP's utility documentation at Microsoft Docs. The BCP utility requires a few arguments when importing data. Let's take a look at each one of them:

  • -S: The server name or IP address to connect
  • -U: SQL Server user name, this is the login we will use to connect
  • -P: Password of the SQL Server login used for the connection
  • -d: The target database
  • -c: It specifies the operation is made using a character data type (optional)
  • -t: It specifies the field terminator, it can be a comma or a tab
  • in: Specifies we are importing data into a database

Here is how the BCP command looks like when importing data into a table called "HW" that belongs to the "Biometrics" database using a comma-separated CSV file called hw_25000.csv:

bcp HW in hw_25000.csv -S localhost -U sa -P MyP@ssw0rd# -d Biometrics -c -t  ','

Right after executing the command I can see the output on screen:

Starting copy...
1000 rows sent to SQL Server. Total sent: 1000
1000 rows sent to SQL Server. Total sent: 2000
1000 rows sent to SQL Server. Total sent: 3000
1000 rows sent to SQL Server. Total sent: 4000
1000 rows sent to SQL Server. Total sent: 5000
1000 rows sent to SQL Server. Total sent: 6000
1000 rows sent to SQL Server. Total sent: 7000
1000 rows sent to SQL Server. Total sent: 8000
1000 rows sent to SQL Server. Total sent: 9000
1000 rows sent to SQL Server. Total sent: 10000
1000 rows sent to SQL Server. Total sent: 11000
1000 rows sent to SQL Server. Total sent: 12000
1000 rows sent to SQL Server. Total sent: 13000
1000 rows sent to SQL Server. Total sent: 14000
1000 rows sent to SQL Server. Total sent: 15000
1000 rows sent to SQL Server. Total sent: 16000
1000 rows sent to SQL Server. Total sent: 17000
1000 rows sent to SQL Server. Total sent: 18000
1000 rows sent to SQL Server. Total sent: 19000
1000 rows sent to SQL Server. Total sent: 20000
1000 rows sent to SQL Server. Total sent: 21000
1000 rows sent to SQL Server. Total sent: 22000
1000 rows sent to SQL Server. Total sent: 23000
1000 rows sent to SQL Server. Total sent: 24000
1000 rows sent to SQL Server. Total sent: 25000
25000 rows copied.
Network packet size (bytes): 4096
Clock Time (ms.) Total     : 693    Average : (36075.0 rows per sec.)

It took 693 milliseconds to import 25,000 records, not bad at all. I'm using a SQL Server 2017 container with one CPU and two gigabytes of RAM.

We can see from the output that there were no errors, however, I would like to verify the data running a simple query to check the first ten rows of the HW table:

sqlcmd -S localhost -d Biometrics -U sa -P MyP@ssw0rd# -I -Q "SELECT TOP 10 * FROM HW;"
Id          Height           Weight          
----------- ---------------- ----------------
          1 65.78331         112.9925
          2 71.51521         136.4873
          3 69.39874         153.0269
          4 68.2166          142.3354
          5 67.78781         144.2971
          6 68.69784         123.3024
          7 69.80204         141.4947
          8 70.01472         136.4623
          9 67.90265         112.3723
         10 66.78236         120.6672
(10 rows affected)

These first ten rows match perfectly with the ones I checked using the "head -10" command, this is a really good indicator my data looks as expected.

Exporting data with BCP

Exporting data is pretty straightforward, in this example, I will "dump" all the data from the table created in the previous section. The arguments are basically the same with a slight change, we are exporting data (out):

  • out: Specifies we are exporting data into a database

Once again, I will use the same database, table this time exporting the data into a comma-separated file called hw_bcp_out.csv. Here is how the BCP command looks:

bcp HW out hw_bcp_out.csv -S localhost -U sa -P MyP@ssw0rd# -d Biometrics -c -t  ','

Right after executing the command I can see the output on screen:

Starting copy...
1000 rows successfully bulk-copied to host-file. Total received: 1000
1000 rows successfully bulk-copied to host-file. Total received: 2000
1000 rows successfully bulk-copied to host-file. Total received: 3000
1000 rows successfully bulk-copied to host-file. Total received: 4000
1000 rows successfully bulk-copied to host-file. Total received: 5000
1000 rows successfully bulk-copied to host-file. Total received: 6000
1000 rows successfully bulk-copied to host-file. Total received: 7000
1000 rows successfully bulk-copied to host-file. Total received: 8000
1000 rows successfully bulk-copied to host-file. Total received: 9000
1000 rows successfully bulk-copied to host-file. Total received: 10000
1000 rows successfully bulk-copied to host-file. Total received: 11000
1000 rows successfully bulk-copied to host-file. Total received: 12000
1000 rows successfully bulk-copied to host-file. Total received: 13000
1000 rows successfully bulk-copied to host-file. Total received: 14000
1000 rows successfully bulk-copied to host-file. Total received: 15000
1000 rows successfully bulk-copied to host-file. Total received: 16000
1000 rows successfully bulk-copied to host-file. Total received: 17000
1000 rows successfully bulk-copied to host-file. Total received: 18000
1000 rows successfully bulk-copied to host-file. Total received: 19000
1000 rows successfully bulk-copied to host-file. Total received: 20000
1000 rows successfully bulk-copied to host-file. Total received: 21000
1000 rows successfully bulk-copied to host-file. Total received: 22000
1000 rows successfully bulk-copied to host-file. Total received: 23000
1000 rows successfully bulk-copied to host-file. Total received: 24000
1000 rows successfully bulk-copied to host-file. Total received: 25000
25000 rows copied.
Network packet size (bytes): 4096
Clock Time (ms.) Total     : 123    Average : (203252.0 rows per sec.)

Nice! The 25,00 rows were copied into a comma-separated file that contains all the data from the HW table. Let's check my current folder to make the hw_bcp_out.csv file exists using the list (ls -l) with details command:

[dba mastery] $ ls -l
total 1212
-rw-r--r-- 1 root root 608306 Dec 13 04:38 hw_25000.csv
-rw-r--r-- 1 root root 608307 Dec 13 05:37 hw_bcp_out.csv

I can see two files, the one I used during the import example (hw_25000.csv) and the other created by the BCP out (hw_bcp_out.csv). In case you want to go the extra mile, you can check the first and last ten rows using the "head" and "tail" commands as before, but I will skip that for now.

Conclusion

The BCP utility is a very powerful cross-platform command-line tool we can use to import/export data into/from a SQL Server instance running on any environment (Windows, Linux, Docker containers).

Rate

4.6 (5)

Share

Share

Rate

4.6 (5)