Transposing Information

  • I'm going to try and simplify this a little.  I have one of our business users comming to me and wanting me to set something up to help them analyze some information they will be getting on a regular basis.  What they have is a flat file with some general information followed by 120 columns of survey type data.

    So the structure of their flat file looks something like this:

    ID

    Name

    City

    State

    Email

    Q1

    Q2

    Q3

    Q4

    Q5

    Q6

    And so on through Q120

     

    What I want to end up with is 2 tables that resemble the following:

     

    Table 1

    ID

    Name

    City

    State

    Email

     

    Table 2

    ID

    Q

     

    These would obviously be a one to many relationship but I think having the Q1 – Q120 in a separate table like this will make the reporting I need to do much easier.

     

    Any suggestions on how to attack this?

    Thanks

     

  • Well, the solution of having a one to many relationship looks good and you have yourself suggested it.

    What else info do you need on this?

  • What I'm looking for is the SQL code to actually transpose the data like this.  I've tried a couple of things this morning but without much luck so I thought if anyone has done this before it would help in providing me some direction.

  • Hi,

    So, lets see if I understand: You need a query that does this transformation for you. Then, you would like to put together a query execute an INSERT INTO SELECT FROM statement?

    As far as I know, there is no way to do this using standard SQL. No query can create the view you need to insert into the target table.

    I know of some ways to do it, maybe I can help if you provide us some details:

    - How many rows are there in the source table?

    - Do you plan to load the flat file into a database table?

    - Is this a run-once problem or will you need to convert this structure continuously?

    cheers,

    paulo.

  • Paulo,

    You are correct, I'm looking for a query to help me do this to get it into relational tables.

    As far as your other questions the number of rows in the source table could be as small as a few to hundreds.  I can load the flat file into a database first (a temporary holding area if you will).  Finally, this is something we will have to do on a continual basis.  It is a file we are receiving from a survey company on a regular basis.

    Thanks,

    Curt

  • You need to "unpivot" the data. This can be done in T-SQL using a UNION ALL query that converts columns to rows.

    eg:

    SELECT Name, City, State, Email, 1 As ID, Q1 As Q

    FROM YourTable

    UNION ALL

    SELECT Name, City, State, Email, 2 As ID, Q2 As Q

    FROM YourTable

    UNION ALL

    ...

    SELECT Name, City, State, Email, 120 As ID, Q120 As Q

    FROM YourTable

    This would produce 120 rows for each Name, City, State & Email. If the columns are named as you suggest (Q1 - Q120), you don't have to hand code all this, you can write T-SQL against the INFORMATION_SCHEMA views (or syscolumns) to generate the SQL for you.

     

  • Thanks.  Can you get me started on how to do it by writing the T-SQL against the schema?

  • You already got the UNION ALL solution. I presume that if the table does not fit in memory it can be slow. This is why I asked about the size.

    Another way to do it, which btw has the advantage of being more concise and in principle faster to execute, consists of declaring a cursor over the source table and inserting the multiple records in the target table. Now, you can do this in two ways: Either code the 120 inserts or use a nested cursor over syscolumns (usually not a good idea because you would be locking system tables).

    hope this helps.

    paulo.

  • Ehlinger,

    As strange as it sounds, if the flat file is VERY consistant, you can actually use BCP or Bulk Insert to transpose the input from the flat file directly into a table.  You will, however, have to make a BCP format file to deal with the unusual format... it would start out looking like this...

    8.0

    125

    1

    SQLCHAR

    0

    500

    "\r\n"

    1

    ID

    ""

    2

    SQLCHAR

    0

    500

    "\r\n"

    2

    Name

    ""

    3

    SQLCHAR

    0

    500

    "\r\n"

    3

    City

    ""

    4

    SQLCHAR

    0

    500

    "\r\n"

    4

    State

    ""

    5

    SQLCHAR

    0

    500

    "\r\n"

    5

    Email

    ""

    6

    SQLCHAR

    0

    500

    "\r\n"

    6

    Q1

    ""

    7

    SQLCHAR

    0

    500

    "\r\n"

    7

    Q2

    ""

    8

    SQLCHAR

    0

    500

    "\r\n"

    8

    Q3

    ""

    9

    SQLCHAR

    0

    500

    "\r\n"

    9

    Q4

    ""

    10

    SQLCHAR

    0

    500

    "\r\n"

    10

    Q5

    ""

    11

    SQLCHAR

    0

    500

    "\r\n"

    11

    Q6

    ""

    125

    SQLCHAR

    0

    500

    "\r\n"

    125

    Q120

    ""

    Notice that the "column delimiters" are all CrLf's...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • If the transformation during import suggested by Jeff works with your files, it will be the best way to do this. If you have problems, and need to import the data into a staging table first, here is what you could do with it using dynamic SQL.

    Disclaimer : Please understand that most of the time it is best to avoid dynamic SQL and use standard, set-based solutions (like the one with UNION). If you know that you will always be working with a table that has 120 "Q" columns, I would suggest to take the time and write the query using 119 UNIONs. However, if you know that the table will always be of the same general structure, but the number of "Q" columns may vary, it would be handy to have SQL that can be quickly modified or parametrized to run on any number of columns.

    I can not guarantee that this SQL will perform correctly on your data - it is just an example how it could be done. It does not take into account any previously inserted rows and supposes that all tables are freshly created and empty. I only used 4 Q columns to keep it simple, but you can use as many as you need (or rather as SQL Server allows ). Also, I understand that this action will be performed on a staging table with max. hundreds of rows; for greater amounts of data, I'm not sure what the performance would be.

    --staging table to which you import data from flat file

    CREATE TABLE #import (imp_ID int identity, cust_name varchar(50), City varchar(50), State varchar(30),

    Email varchar(50), Q1 char(1), Q2 char(1), Q3 char(1), Q4 char(1))

    --test values

    INSERT INTO #import VALUES ('Bill','Houston','Texas','bill@company.com', 'Y','N','1','2')

    --tables to hold results

    CREATE TABLE #contacts (id_contact int, cust_name varchar(50), City varchar(50), State varchar(30),

    Email varchar(50))

    CREATE TABLE #questions (id_contact int, Q_id int, Q char(1))

    --customers in survey and their contact data

    INSERT INTO #contacts (id_contact, cust_name, city, state, email)

    SELECT imp_id, cust_name, city, state, email

    FROM #import

    --bring all Q columns into one to normalize data

    DECLARE @counter int, @sql nvarchar(1024)

    SET @counter = 0

    WHILE @counter < 4

    BEGIN

    SET @counter = @counter+1

    SET @sql = 'INSERT INTO #questions (id_contact, Q_id, Q)

    SELECT imp_ID,' + CAST(@counter as varchar(3))+ ',Q'+ CAST(@counter as varchar(3))

    + ' FROM #import'

    EXEC sp_executesql @sql

    END

    drop table #import

    drop table #questions

    drop table #contacts

    HTH, Vladan

    EDIT : P.S. To determine the number of Q columns dynamically (supposed no other column begins with 'Q' !!) you can use this. It can be easily modified to use non-temporary tables:

    SELECT count(*) from tempdb..syscolumns

    where [id] = object_id('tempdb..#import')

    and LEFT([name],1) = 'Q'

    That means, if you declare a variable, fill it with count of columns and use it instead of the fixed value (4 in my example) in the WHILE clause, you can use the same SQL to transform table with any number of Q columns.

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply