Blog Post

Loading All CSV Files with PowerShell

,

I ran across Denis Gobo’s post about working with names from Social Security cards and wanted to play with the dataset. However, rather than use xp_cmdshell, which I might have, I decided to use PowerShell. I need to repeat this with Python, but for now, this worked.

There is a file you can download that has a lot of .txt files, each one with a CSV. You can read about this in more detail in Denis’ post. Essentially there were three fields, and the year of the file in the name. I downloaded and unzipped the files into a folder.

Here was my process:

  1. Loop through the files
  2. Extract the year from the file name
  3. bcp in the file into a staging table
  4. call a proc that takes the year as a parameter and moves all data

With that in mind, here’s how I set things up.

The SQL Server Side

I started by creating two tables.

CREATE TABLE Names

(   FirstName  VARCHAR(500),
     Gender     CHAR(1),
     NameCount  INT,
     YearInFile INT

);

CREATE TABLE Names2

(   FirstName VARCHAR(500),
     Gender    CHAR(1),
     NameCount INT

);

With these in place, I created a procedure:

CREATE PROCEDURE MoveNames @year INT

AS

INSERT Names SELECT FirstName, Gender, NameCount, @year FROM Names2;

TRUNCATE TABLE Names2;

GO

I tested these and they worked fine for my process. I want to load into Names2, then to get the year, I move the data to the other table, adding the year. There are probably better ways, but this worked fine for a relatively small load (few million rows).

PowerShell

With PoSh, I started with a simple loop. I tend to set variables first since this makes this easier to turn into a function.

$sourcefolder = “E:\Downloads\names”

$sourcefiles = Get-ChildItem $sourcefolder -Filter “*.txt”

foreach($file in $sourcefiles ){
   $yearinfile = $file.Name.Substring(3, 4)
   write-host “Loading File:” $file.Name ” year: ” $yearinfile

}

Running this gets me a list of all files along with the  years. That’s what I want, since I’m going to process each file and load it in with bcp. Thanks to Mike Fal for the outline of how I’ll do this.

Next, I set more variables at the top:

$InstanceName = “.\SQL2016”

$DatabaseName = “Sandbox”

$StagingTableName = “Names2”

$StagingProc = “MoveNames”

I’ll use these in this code. I call Invoke-Expression to run bcp and then Invoke-Sqlcmd to run my proc.

$cmd = “bcp ‘$DatabaseName.dbo.[$StagingTableName]’ in ‘$file’ -S’$InstanceName’ -T -c -t’,'”

Invoke-Expression $cmd

Invoke-Sqlcmd -ServerInstance $InstanceName -Database $DatabaseName -Query “$StagingProc $yearinfile”

From here, I put this in a file and ran it from my download location. The result:

2017-06-16 12_41_08-Windows PowerShell ISE

and from SQL Server:

2017-06-16 12_41_50-SQLQuery7.sql - (local)_SQL2016.sandbox (PLATO_Steve (56))_ - Microsoft SQL Serv

Now I can run some queries, and find out where I stand. #26, it turns out.

2017-06-16 12_43_35-SQLQuery7.sql - (local)_SQL2016.sandbox (PLATO_Steve (56))_ - Microsoft SQL Serv

Filed under: Blog Tagged: ETL, powershell, syndicated

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating