Database import from text/csv file and many to many relationship

  • I have a Windows SQL Server database with a many-to-many relationship between some articles and categories, the relation is made by an ArticleCategorie table with two columns => [ArticleID] - [CategorieID].

    The problem now is that I have a CSV file which look like this (*picture) with two columns article id and article tags (let's say the article id is 5; and that it has 4 different categories which has respectively id 1,4,6 and 7) so here is how the CSV looks like* =>

    The best way I found was to manually add to the table all data (articles + categories) like this :

    So this is above the final result that I want. Unfortunately I have more than seven hundreds articles, and can't find any faster way to import them, any solutions ? How can I import my data in a fast way? Maybe I could go for a better database design ?

  • I'm assuming that you've already imported the data into a staging or other table. If you're also having problems with the import, we need to know what is being used for the delimiter between the two "columns" is in the file.

    In the meantime, here's a test table. Details are in the comments.

    --===== This would simulate the table that you have the data in.

    -- I'm assuming that you've imported "the file" into such a table.

    -- If you need help with the actual import, we'll need more info about the file.

    -- I've simulated 4 rows but this will quickly work for many, many rows.

    -- This is also the wy you should have posted your data so people can test

    -- their solutions for you. See the first link in my signature line below

    -- for more information on that.

    SELECT d.ArticleID, d.ArticleCategories

    INTO #YourTable

    FROM

    (

    SELECT 5,'1,4,6,7' UNION ALL

    SELECT 6,'8,3,2' UNION ALL

    SELECT 10,'9,2,1,5,2' UNION ALL

    SELECT 1,'4'

    ) d (ArticleID,ArticleCategories)

    ;

    --===== This just shows what the table above looks like.

    SELECT * FROM #Yourtable

    ;

    The solution for this problem is wicked easy if you have a decent, fast splitter. The following code will produce the output you've asked for for all rows. You can store the output in a table or use it as a CTE to join to.

    --===== The solution for this is quite easy with the help

    -- of a decent "splitter" function

    SELECT yt.ArticleID, ArticleCategory = split.Item

    FROM #YourTable yt

    CROSS APPLY dbo.DelimitedSplit8K(ArticleCategories,',') split

    ;

    That's produces the following output from the given data... just like you wanted.

    ArticleID ArticleCategory

    ----------- ---------------

    5 1

    5 4

    5 6

    5 7

    6 8

    6 3

    6 2

    10 9

    10 2

    10 1

    10 5

    10 2

    1 4

    (13 row(s) affected)

    You can get the DelimitedSplit8K function from "RESOURCES" links at the end of the following article...

    http://www.sqlservercentral.com/articles/Tally+Table/72993/

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thank you for this very well explained answer and that useful link.

    Sorry for the ambiguity, but unfortunately I'm also having problem with the import, in fact, it is my main problem.

    I'm just trying to import a CSV file to the database and getting the result shown on the second image, which is as you can see not a excel sheet (the first one is) but the real sql table, from the visual studio 2012 sql table manager, so that's what I wanted to say, the fastest way that I found to import all my articles was to add them one by one using VS2012 SQL Manager as I don't know how to import the CSV file..)

    Here is that CSV file :

    Article ID;Tags ID

    1500;7,8,9,5

    1501;4,3

    1502;8,1

    1503;8,9

    ....

    Columns are ";" separated and tags id are "," separated.

    So if I've to import this CSV file to "MyTable"

    Article ID;Tags ID

    Jeff;Moden,Zidane,Headbutt

    5;1,4,6

    This would be the structure expected after the import :

    But I do not have a clue how to do it.

  • The neatest solution is to write a small console-mode program that parses the file, and then sends the comma-separated to a table-valued parameter, which you pass to a stored procedure which inserts the data.

    This article on my web site should give you a head start, both how to read the CSV file and to crack that list, although you will need to combine the examples. http://www.sommarskog.se/arrays-in-sql-2008.html

    [font="Times New Roman"]Erland Sommarskog, SQL Server MVP, www.sommarskog.se[/font]

  • Wow at first glance I must admit that it seems a bit hard to me as a beginner, I'll check it out tomorrow thank you for your answer.

  • zqckfair (7/28/2013)


    Thank you for this very well explained answer and that useful link.

    Sorry for the ambiguity, but unfortunately I'm also having problem with the import, in fact, it is my main problem.

    I'm just trying to import a CSV file to the database and getting the result shown on the second image, which is as you can see not a excel sheet (the first one is) but the real sql table, from the visual studio 2012 sql table manager, so that's what I wanted to say, the fastest way that I found to import all my articles was to add them one by one using VS2012 SQL Manager as I don't know how to import the CSV file..)

    Here is that CSV file :

    Article ID;Tags ID

    1500;7,8,9,5

    1501;4,3

    1502;8,1

    1503;8,9

    ....

    Columns are ";" separated and tags id are "," separated.

    So if I've to import this CSV file to "MyTable"

    Article ID;Tags ID

    Jeff;Moden,Zidane,Headbutt

    5;1,4,6

    This would be the structure expected after the import :

    But I do not have a clue how to do it.

    No problem. Erland's excellent article is to die with several great methods with sample code and great explanations.

    If you're interested in a "T-SQL Only" solution, we can do that quite simply, as well...

    In the following, there's a lot of simple techniques being used. That is, they're "simple" for people that are familiar with them. There's no way that I could explain them all to a "newbie" to a level that I'd like to without writing a book on the subject. With that in mind, I explained the sections at a high level and a mid level. For details on each command, I strongly recommend that you look them up in Books Online (especially the Bulk Insert... there so much more you can do with it including sequestering bad rows in a separate file without aborting the whole run, etc). I also strongly recommend that you do a Google search on "SQL Injection" and "DOS Injection" to see not only what you need to watch out for, but to understand why so many DBAs go bananas at the mere mention of dynamic SQL. The method I used prevents both DOS and SQL Injection.

    First things first. You provided two listings as to what a file may contain. I saved them as two separate files in my C:\Temp directory (you can use any drive directory or UNC that SQL Server has access to but I used C:\Temp) for demonstration purposes. I called them "JBMDemo1.csv" and "JBMDemo2.txt" respectively. Notice the differences in extensions. I just wanted to prove that the extension means nothing for these imports. If you want to run the demonstration, as is, you'll need to create these two files (you can easily copy and paste into NOTEPAD to do this).

    So, the first file, JBMDemo1.csv, contains the following:

    Article ID;Tags ID

    1500;7,8,9,5

    1501;4,3

    1502;8,1

    1503;8,9

    The second file, JBMDemo2.txt, contains the following:

    Article ID;Tags ID

    Jeff;Moden,Zidane,Headbutt

    5;1,4,6

    You also mentioned a "final" table called "MyTable". Here's the code I used to create that table for this demonstration. Note that I had to use VARCHAR for both columns because of the content of the files you provided. If you want to run the demonstration, as is, you'll need to execute the following code to create the "dbo.MyTable" table.

    CREATE TABLE dbo.MyTable

    (

    ArticleID VARCHAR(50),

    CategorieID VARCHAR(50)

    )

    ;

    Now, here's the stored procedure I created to demo all of this. Running the following code will create a stored procedure called "dbo.ImportArticleCategory". You can call it anything you want but that's what I called it for this demonstration. You'll need to change it to point to the correct table if you put it into production because I pointed it at the demonstration table called "dbo.MyTable", which we created above.

    I also did the normal thing that I do with some fairly heavy commenting. Again, being a "newbie", you should lookup the things that I used in Books Online and study them. And be sure to Google for both "DOS Injection" and "SQL Injection". EVERYONE should become keenly aware of what they are and how to prevent them because it's STILL the leading attack vector in the world of software.

    I also used your latest column names.

    Here's the code for the proc...

    CREATE PROCEDURE dbo.ImportArticleCategory

    /******************************************************************************

    Purpose:

    This stored procedure accepts a full file path, checks it for both DOS and

    SQL "Injection", imports the file to a temporary staging table, parses the

    category tags, and inserts the results into a table called "MyTable".

    Usage Examples:

    EXEC dbo.ImportArticleCategory 'C:\Temp\SomeFile.csv';

    EXEC dbo.ImportArticleCategory @pFilePath;

    Programmer's Notes'

    1. Example input file might look like the following. Semi-Colon is the

    delimiter. The CSV column will be parsed.

    Article ID;Tags ID

    1500;7,8,9,5

    1501;4,3

    1502;8,1

    1503;8,9

    Jeff;Moden,Zidane,Headbutt

    5;1,4,6

    1. (continued)The data stored in the "MyTable" table from that file will be

    the following:

    ArticleID CategorieID

    --------- -----------

    1500 7

    1500 8

    1500 9

    1500 5

    1501 4

    1501 3

    1502 8

    1502 1

    1503 8

    1503 9

    Jeff Moden

    Jeff Zidane

    Jeff Headbutt

    5 1

    5 4

    5 6

    2. If the given file path contains any DOS or SQL Injection, the proc will

    only identify that it has completed so that it gives an attacker no hints.

    3. If the given file path does not exist, an error to that nature will be

    returned.

    4. The file path may be either a "drive path" or a "UNC".

    Revision History:

    Rev 00 - 28 Jul 2013 - Jeff Moden - Initial creation and unit test.

    ******************************************************************************/

    --===== Declare the parameters for this proc.

    (@pFilePath VARCHAR(8000))

    AS

    --=============================================================================

    -- Presets

    --=============================================================================

    --===== Environmental presets

    SET NOCOUNT ON; --Suppress the autodisplay of rowcounts to prevent them

    --from being interpreted as errors by a GUI.

    SET XACT_ABORT ON; --Not required here but a good habit to get into for

    --when you use explict transactions to short-circuit

    --when an error occurs.

    --===== Declare obviously named variable(s)

    DECLARE @SQL VARCHAR(8000)

    ;

    --===== Conditionally drop the staging table to make reruns in SSMS Easier.

    -- This section may be commented out for production.

    IF OBJECT_ID('tempdb..#ImportStaging','U') IS NOT NULL

    DROP TABLE #ImportStaging

    ;

    --===== Create the staging table

    CREATE TABLE #ImportStaging

    (

    ArticleID VARCHAR(50),

    TagsID VARCHAR(8000)

    )

    ;

    --=============================================================================

    -- Load the given file into the staging table using the two columns that

    -- are separated by the semi-colon.

    --=============================================================================

    --===== Create the dynamic SQL using the given filepath.

    -- The WHERE clause prevents both SQL and DOS injection.

    SELECT @SQL = '

    BULK INSERT #ImportStaging

    FROM ''' + @pFilePath + '''

    WITH (

    FIELDTERMINATOR = '';'',

    FIRSTROW = 2,

    TABLOCK

    )

    ;'

    WHERE @pFilePath NOT LIKE '%[^a-zA-Z0-9\:. ]%'

    ;

    --===== Execute the dynamic SQL from above to do the actual file load

    -- into the staging table

    EXEC (@SQL)

    ;

    --=============================================================================

    -- Using the splitter function previously shown, split the tags for each

    -- ArticleID to produce the list you wanted just like before except that

    -- we're inserting the result into "MyTable" (for this example).

    --=============================================================================

    INSERT INTO dbo.MyTable

    (ArticleID, CategorieID)

    SELECT s.ArticleID, CategorieID = split.Item

    FROM #ImportStaging s

    CROSS APPLY dbo.DelimitedSplit8K(s.TagsID,',') split

    ;

    GO

    Ok, if you've been following along by running the code I've provided so far, run the following code to verify that the dbo.MyTable is empty.

    SELECT * FROM dbo.MyTable;

    That returns the following:

    ArticleID CategorieID

    --------- -----------

    (0 row(s) affected)

    Now, to import the first file (C:\Temp\JBMDemo1.csv), just execute the stored procedure with the proper path in quotes, like this...

    EXEC dbo.ImportArticleCategory 'C:\Temp\JBMDemo1.csv';

    If you run the following code (again), you'll see that the file has been successfully imported into the "MyTable" table.

    SELECT * FROM dbo.MyTable;

    Results:

    ArticleID CategorieID

    --------- -----------

    1500 7

    1500 8

    1500 9

    1500 5

    1501 4

    1501 3

    1502 8

    1502 1

    1503 8

    1503 9

    (10 row(s) affected)

    Now, use the proc to import the next file like this...

    EXEC dbo.ImportArticleCategory 'C:\Temp\JBMDemo2.txt';

    When we run the SELECT on MyTable again, we can see that the second file's information has been imported without changing anything previously imported.

    SELECT * FROM dbo.MyTable;

    Results:

    ArticleID CategorieID

    --------- -----------

    1500 7

    1500 8

    1500 9

    1500 5

    1501 4

    1501 3

    1502 8

    1502 1

    1503 8

    1503 9

    Jeff Moden

    Jeff Zidane

    Jeff Headbutt

    5 1

    5 4

    5 6

    (16 row(s) affected)

    I'm sure that you have a million questions but, as you can see, this "problem" is really easy to crack with that DelimitedSplit8K splitter and some relatively easy code. Study the things I used in "Books Online" (press the {f1} key in SSMS to get there) and you're well on your way to shaking off the "newbie fur". 😛

    Don't forget to Google and study about "DOS Injection" and "SQL Injection".

    Let us know if you have more questions about this.

    P.S. You could also use "SSIS" for this but someone else will have to cover how to do that because, well, I just use T-SQL to do things like this because it's easier for me. In other words, I've never used "SSIS" for such an easy thing and certainly can't explain how to do it with "SSIS". 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Only to add what Jeff suggests: there are multiple ways to skin this cat, and which one that fits you depends on a lot things. My suggestion assumes that you already know C# or VB .Net, but since you had Visual Studio in your mix, it seemed like a reasonable suggestion.

    A potential problem with Jeff's solution with using BULK INSERT is that the file must be visible from your SQL Server instance. Which is trivial, if you have SQL Server running on your own machine. But if you are working in a corporate environment, this may be more difficult to arrange. You may also encounter permission problems.

    If your main interest is in learning, I would suggest that you study both suggestions and try to implement solutions with both. In the end, you may find that one fit into your overall framework better than the other.

    [font="Times New Roman"]Erland Sommarskog, SQL Server MVP, www.sommarskog.se[/font]

  • I absolutely agree with Erland. There are dozens of ways to skin this cat and this is an excellent opportunity to compare the different methods and learn their strengths and weaknesses. It's always good to have multiple options especially if you run into a picky DBA or some picky company standards.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thank you so much for those both constructive answers.

    You're right, although I'm beginning I'll try both ways, I'll get back here if I've any problems about that.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply