SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Database import from text/csv file and many to many relationship


Database import from text/csv file and many to many relationship

Author
Message
zqckfair
zqckfair
SSC Journeyman
SSC Journeyman (80 reputation)SSC Journeyman (80 reputation)SSC Journeyman (80 reputation)SSC Journeyman (80 reputation)SSC Journeyman (80 reputation)SSC Journeyman (80 reputation)SSC Journeyman (80 reputation)SSC Journeyman (80 reputation)

Group: General Forum Members
Points: 80 Visits: 19
I have a Windows SQL Server database with a many-to-many relationship between some articles and categories, the relation is made by an ArticleCategorie table with two columns => [ArticleID] - [CategorieID].

The problem now is that I have a CSV file which look like this (*picture) with two columns article id and article tags (let's say the article id is 5; and that it has 4 different categories which has respectively id 1,4,6 and 7) so here is how the CSV looks like* =>



The best way I found was to manually add to the table all data (articles + categories) like this :



So this is above the final result that I want. Unfortunately I have more than seven hundreds articles, and can't find any faster way to import them, any solutions ? How can I import my data in a fast way? Maybe I could go for a better database design ?
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (908K reputation)SSC Guru (908K reputation)SSC Guru (908K reputation)SSC Guru (908K reputation)SSC Guru (908K reputation)SSC Guru (908K reputation)SSC Guru (908K reputation)SSC Guru (908K reputation)

Group: General Forum Members
Points: 908705 Visits: 48715
I'm assuming that you've already imported the data into a staging or other table. If you're also having problems with the import, we need to know what is being used for the delimiter between the two "columns" is in the file.

In the meantime, here's a test table. Details are in the comments.


--===== This would simulate the table that you have the data in.
-- I'm assuming that you've imported "the file" into such a table.
-- If you need help with the actual import, we'll need more info about the file.
-- I've simulated 4 rows but this will quickly work for many, many rows.
-- This is also the wy you should have posted your data so people can test
-- their solutions for you. See the first link in my signature line below
-- for more information on that.
SELECT d.ArticleID, d.ArticleCategories
INTO #YourTable
FROM
(
SELECT 5,'1,4,6,7' UNION ALL
SELECT 6,'8,3,2' UNION ALL
SELECT 10,'9,2,1,5,2' UNION ALL
SELECT 1,'4'
) d (ArticleID,ArticleCategories)
;
--===== This just shows what the table above looks like.
SELECT * FROM #Yourtable
;




The solution for this problem is wicked easy if you have a decent, fast splitter. The following code will produce the output you've asked for for all rows. You can store the output in a table or use it as a CTE to join to.

--===== The solution for this is quite easy with the help
-- of a decent "splitter" function
SELECT yt.ArticleID, ArticleCategory = split.Item
FROM #YourTable yt
CROSS APPLY dbo.DelimitedSplit8K(ArticleCategories,',') split
;




That's produces the following output from the given data... just like you wanted.
ArticleID   ArticleCategory
----------- ---------------
5 1
5 4
5 6
5 7
6 8
6 3
6 2
10 9
10 2
10 1
10 5
10 2
1 4

(13 row(s) affected)



You can get the DelimitedSplit8K function from "RESOURCES" links at the end of the following article...
http://www.sqlservercentral.com/articles/Tally+Table/72993/

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

When you put the right degree of spin on it, the number 318 is also a glyph that describes the nature of a DBAs job. Wink

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
zqckfair
zqckfair
SSC Journeyman
SSC Journeyman (80 reputation)SSC Journeyman (80 reputation)SSC Journeyman (80 reputation)SSC Journeyman (80 reputation)SSC Journeyman (80 reputation)SSC Journeyman (80 reputation)SSC Journeyman (80 reputation)SSC Journeyman (80 reputation)

Group: General Forum Members
Points: 80 Visits: 19

Thank you for this very well explained answer and that useful link.

Sorry for the ambiguity, but unfortunately I'm also having problem with the import, in fact, it is my main problem.

I'm just trying to import a CSV file to the database and getting the result shown on the second image, which is as you can see not a excel sheet (the first one is) but the real sql table, from the visual studio 2012 sql table manager, so that's what I wanted to say, the fastest way that I found to import all my articles was to add them one by one using VS2012 SQL Manager as I don't know how to import the CSV file..)


Here is that CSV file :

Article ID;Tags ID
1500;7,8,9,5
1501;4,3
1502;8,1
1503;8,9
....


Columns are ";" separated and tags id are "," separated.

So if I've to import this CSV file to "MyTable"

Article ID;Tags ID
Jeff;Moden,Zidane,Headbutt
5;1,4,6


This would be the structure expected after the import :


But I do not have a clue how to do it.
Erland Sommarskog
Erland Sommarskog
SSC-Insane
SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)

Group: General Forum Members
Points: 21718 Visits: 886
The neatest solution is to write a small console-mode program that parses the file, and then sends the comma-separated to a table-valued parameter, which you pass to a stored procedure which inserts the data.

This article on my web site should give you a head start, both how to read the CSV file and to crack that list, although you will need to combine the examples. http://www.sommarskog.se/arrays-in-sql-2008.html

Erland Sommarskog, SQL Server MVP, www.sommarskog.se
zqckfair
zqckfair
SSC Journeyman
SSC Journeyman (80 reputation)SSC Journeyman (80 reputation)SSC Journeyman (80 reputation)SSC Journeyman (80 reputation)SSC Journeyman (80 reputation)SSC Journeyman (80 reputation)SSC Journeyman (80 reputation)SSC Journeyman (80 reputation)

Group: General Forum Members
Points: 80 Visits: 19
Wow at first glance I must admit that it seems a bit hard to me as a beginner, I'll check it out tomorrow thank you for your answer.
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (908K reputation)SSC Guru (908K reputation)SSC Guru (908K reputation)SSC Guru (908K reputation)SSC Guru (908K reputation)SSC Guru (908K reputation)SSC Guru (908K reputation)SSC Guru (908K reputation)

Group: General Forum Members
Points: 908705 Visits: 48715
zqckfair (7/28/2013)

Thank you for this very well explained answer and that useful link.

Sorry for the ambiguity, but unfortunately I'm also having problem with the import, in fact, it is my main problem.

I'm just trying to import a CSV file to the database and getting the result shown on the second image, which is as you can see not a excel sheet (the first one is) but the real sql table, from the visual studio 2012 sql table manager, so that's what I wanted to say, the fastest way that I found to import all my articles was to add them one by one using VS2012 SQL Manager as I don't know how to import the CSV file..)


Here is that CSV file :

Article ID;Tags ID
1500;7,8,9,5
1501;4,3
1502;8,1
1503;8,9
....


Columns are ";" separated and tags id are "," separated.

So if I've to import this CSV file to "MyTable"

Article ID;Tags ID
Jeff;Moden,Zidane,Headbutt
5;1,4,6


This would be the structure expected after the import :


But I do not have a clue how to do it.


No problem. Erland's excellent article is to die with several great methods with sample code and great explanations.

If you're interested in a "T-SQL Only" solution, we can do that quite simply, as well...

In the following, there's a lot of simple techniques being used. That is, they're "simple" for people that are familiar with them. There's no way that I could explain them all to a "newbie" to a level that I'd like to without writing a book on the subject. With that in mind, I explained the sections at a high level and a mid level. For details on each command, I strongly recommend that you look them up in Books Online (especially the Bulk Insert... there so much more you can do with it including sequestering bad rows in a separate file without aborting the whole run, etc). I also strongly recommend that you do a Google search on "SQL Injection" and "DOS Injection" to see not only what you need to watch out for, but to understand why so many DBAs go bananas at the mere mention of dynamic SQL. The method I used prevents both DOS and SQL Injection.

First things first. You provided two listings as to what a file may contain. I saved them as two separate files in my C:\Temp directory (you can use any drive directory or UNC that SQL Server has access to but I used C:\Temp) for demonstration purposes. I called them "JBMDemo1.csv" and "JBMDemo2.txt" respectively. Notice the differences in extensions. I just wanted to prove that the extension means nothing for these imports. If you want to run the demonstration, as is, you'll need to create these two files (you can easily copy and paste into NOTEPAD to do this).

So, the first file, JBMDemo1.csv, contains the following:

Article ID;Tags ID
1500;7,8,9,5
1501;4,3
1502;8,1
1503;8,9



The second file, JBMDemo2.txt, contains the following:

Article ID;Tags ID
Jeff;Moden,Zidane,Headbutt
5;1,4,6



You also mentioned a "final" table called "MyTable". Here's the code I used to create that table for this demonstration. Note that I had to use VARCHAR for both columns because of the content of the files you provided. If you want to run the demonstration, as is, you'll need to execute the following code to create the "dbo.MyTable" table.


CREATE TABLE dbo.MyTable
(
ArticleID VARCHAR(50),
CategorieID VARCHAR(50)
)
;




Now, here's the stored procedure I created to demo all of this. Running the following code will create a stored procedure called "dbo.ImportArticleCategory". You can call it anything you want but that's what I called it for this demonstration. You'll need to change it to point to the correct table if you put it into production because I pointed it at the demonstration table called "dbo.MyTable", which we created above.

I also did the normal thing that I do with some fairly heavy commenting. Again, being a "newbie", you should lookup the things that I used in Books Online and study them. And be sure to Google for both "DOS Injection" and "SQL Injection". EVERYONE should become keenly aware of what they are and how to prevent them because it's STILL the leading attack vector in the world of software.

I also used your latest column names.

Here's the code for the proc...
 CREATE PROCEDURE dbo.ImportArticleCategory
/******************************************************************************
Purpose:
This stored procedure accepts a full file path, checks it for both DOS and
SQL "Injection", imports the file to a temporary staging table, parses the
category tags, and inserts the results into a table called "MyTable".

Usage Examples:
EXEC dbo.ImportArticleCategory 'C:\Temp\SomeFile.csv';
EXEC dbo.ImportArticleCategory @pFilePath;


Programmer's Notes'
1. Example input file might look like the following. Semi-Colon is the
delimiter. The CSV column will be parsed.

Article ID;Tags ID
1500;7,8,9,5
1501;4,3
1502;8,1
1503;8,9
Jeff;Moden,Zidane,Headbutt
5;1,4,6

1. (continued)The data stored in the "MyTable" table from that file will be
the following:

ArticleID CategorieID
--------- -----------
1500 7
1500 8
1500 9
1500 5
1501 4
1501 3
1502 8
1502 1
1503 8
1503 9
Jeff Moden
Jeff Zidane
Jeff Headbutt
5 1
5 4
5 6

2. If the given file path contains any DOS or SQL Injection, the proc will
only identify that it has completed so that it gives an attacker no hints.
3. If the given file path does not exist, an error to that nature will be
returned.
4. The file path may be either a "drive path" or a "UNC".

Revision History:
Rev 00 - 28 Jul 2013 - Jeff Moden - Initial creation and unit test.
******************************************************************************/
--===== Declare the parameters for this proc.
(@pFilePath VARCHAR(8000))
AS
--=============================================================================
-- Presets
--=============================================================================
--===== Environmental presets
SET NOCOUNT ON; --Suppress the autodisplay of rowcounts to prevent them
--from being interpreted as errors by a GUI.
SET XACT_ABORT ON; --Not required here but a good habit to get into for
--when you use explict transactions to short-circuit
--when an error occurs.

--===== Declare obviously named variable(s)
DECLARE @SQL VARCHAR(8000)
;
--===== Conditionally drop the staging table to make reruns in SSMS Easier.
-- This section may be commented out for production.
IF OBJECT_ID('tempdb..#ImportStaging','U') IS NOT NULL
DROP TABLE #ImportStaging
;
--===== Create the staging table
CREATE TABLE #ImportStaging
(
ArticleID VARCHAR(50),
TagsID VARCHAR(8000)
)
;
--=============================================================================
-- Load the given file into the staging table using the two columns that
-- are separated by the semi-colon.
--=============================================================================
--===== Create the dynamic SQL using the given filepath.
-- The WHERE clause prevents both SQL and DOS injection.
SELECT @SQL = '
BULK INSERT #ImportStaging
FROM ''' + @pFilePath + '''
WITH (
FIELDTERMINATOR = '';'',
FIRSTROW = 2,
TABLOCK
)
;'
WHERE @pFilePath NOT LIKE '%[^a-zA-Z0-9\:. ]%'
;
--===== Execute the dynamic SQL from above to do the actual file load
-- into the staging table
EXEC (@SQL)
;
--=============================================================================
-- Using the splitter function previously shown, split the tags for each
-- ArticleID to produce the list you wanted just like before except that
-- we're inserting the result into "MyTable" (for this example).
--=============================================================================
INSERT INTO dbo.MyTable
(ArticleID, CategorieID)
SELECT s.ArticleID, CategorieID = split.Item
FROM #ImportStaging s
CROSS APPLY dbo.DelimitedSplit8K(s.TagsID,',') split
;
GO




Ok, if you've been following along by running the code I've provided so far, run the following code to verify that the dbo.MyTable is empty.

 SELECT * FROM dbo.MyTable;




That returns the following:

ArticleID CategorieID
--------- -----------

(0 row(s) affected)


Now, to import the first file (C:\Temp\JBMDemo1.csv), just execute the stored procedure with the proper path in quotes, like this...


EXEC dbo.ImportArticleCategory 'C:\Temp\JBMDemo1.csv';




If you run the following code (again), you'll see that the file has been successfully imported into the "MyTable" table.

 SELECT * FROM dbo.MyTable;




Results:
ArticleID CategorieID
--------- -----------
1500 7
1500 8
1500 9
1500 5
1501 4
1501 3
1502 8
1502 1
1503 8
1503 9

(10 row(s) affected)



Now, use the proc to import the next file like this...

 EXEC dbo.ImportArticleCategory 'C:\Temp\JBMDemo2.txt';




When we run the SELECT on MyTable again, we can see that the second file's information has been imported without changing anything previously imported.


SELECT * FROM dbo.MyTable;




Results:
ArticleID CategorieID
--------- -----------
1500 7
1500 8
1500 9
1500 5
1501 4
1501 3
1502 8
1502 1
1503 8
1503 9
Jeff Moden
Jeff Zidane
Jeff Headbutt
5 1
5 4
5 6

(16 row(s) affected)



I'm sure that you have a million questions but, as you can see, this "problem" is really easy to crack with that DelimitedSplit8K splitter and some relatively easy code. Study the things I used in "Books Online" (press the {f1} key in SSMS to get there) and you're well on your way to shaking off the "newbie fur". :-P

Don't forget to Google and study about "DOS Injection" and "SQL Injection".

Let us know if you have more questions about this.

P.S. You could also use "SSIS" for this but someone else will have to cover how to do that because, well, I just use T-SQL to do things like this because it's easier for me. In other words, I've never used "SSIS" for such an easy thing and certainly can't explain how to do it with "SSIS". :-D

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

When you put the right degree of spin on it, the number 318 is also a glyph that describes the nature of a DBAs job. Wink

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Erland Sommarskog
Erland Sommarskog
SSC-Insane
SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)

Group: General Forum Members
Points: 21718 Visits: 886
Only to add what Jeff suggests: there are multiple ways to skin this cat, and which one that fits you depends on a lot things. My suggestion assumes that you already know C# or VB .Net, but since you had Visual Studio in your mix, it seemed like a reasonable suggestion.

A potential problem with Jeff's solution with using BULK INSERT is that the file must be visible from your SQL Server instance. Which is trivial, if you have SQL Server running on your own machine. But if you are working in a corporate environment, this may be more difficult to arrange. You may also encounter permission problems.

If your main interest is in learning, I would suggest that you study both suggestions and try to implement solutions with both. In the end, you may find that one fit into your overall framework better than the other.

Erland Sommarskog, SQL Server MVP, www.sommarskog.se
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (908K reputation)SSC Guru (908K reputation)SSC Guru (908K reputation)SSC Guru (908K reputation)SSC Guru (908K reputation)SSC Guru (908K reputation)SSC Guru (908K reputation)SSC Guru (908K reputation)

Group: General Forum Members
Points: 908705 Visits: 48715
I absolutely agree with Erland. There are dozens of ways to skin this cat and this is an excellent opportunity to compare the different methods and learn their strengths and weaknesses. It's always good to have multiple options especially if you run into a picky DBA or some picky company standards.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

When you put the right degree of spin on it, the number 318 is also a glyph that describes the nature of a DBAs job. Wink

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
zqckfair
zqckfair
SSC Journeyman
SSC Journeyman (80 reputation)SSC Journeyman (80 reputation)SSC Journeyman (80 reputation)SSC Journeyman (80 reputation)SSC Journeyman (80 reputation)SSC Journeyman (80 reputation)SSC Journeyman (80 reputation)SSC Journeyman (80 reputation)

Group: General Forum Members
Points: 80 Visits: 19
Thank you so much for those both constructive answers.
You're right, although I'm beginning I'll try both ways, I'll get back here if I've any problems about that.
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum









































































































































































SQLServerCentral


Search