Yet another pivot question

  • Hi everyone,

    I have been searching the everywhere in the hope of finding an answer to a problem that would have been very simple to solve in MS Access, but is a bit more problematic in SQL server. I have a simple two column table, both of which are varchar(1000). Column 1 is called "field" and Column 2 is called "fieldValue"

    The shape of the record set is currently this:

    Field ............... FieldValue

    LastName ......... Davis

    First Name ......... Ken

    Salutation .......... Mr.

    I want to change transform it to this:

    Lastname. ........ FirstName .......... Salutation

    Davis .............. Ken ................... Mr.

    I have a constant number of columns (72) so I do not have a problem with hard coding the column values. I have read many of the posts here, (and elsewhere) but cannot seem to find a solution that fits my specific case. Can anyone point me in the right direction? Many Thanks in advance.

  • You can take a look at cross tabs in this article: http://www.sqlservercentral.com/articles/T-SQL/63681/

    And here's an example:

    SELECT ID

    ,MAX(CASE WHEN field = 'LastName' THEN FieldValue END) AS LastName

    ,MAX(CASE WHEN field = 'FirstName' THEN FieldValue END) AS FirstName

    ,MAX(CASE WHEN field = 'Salutation' THEN FieldValue END) AS Salutation

    --And so on

    FROM MyTable

    GROUP BY ID

    The best option is to get rid of this design and have properly normalized tables. 😉

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Awesome! Thanks! I wish I could get the stupid thing normalized. I am getting the data from an ETL operation off a webpage application form, so I am pretty much stuck with what I have. The only other question I have is: How do I derive the ID value. I have two columns in the table, that is all.

  • How do I derive the ID value. I have two columns in the table, that is all.

    Ok, that is really bad. Do you get only one person's record in the table at a time? If not, how would you tell them apart? Just by the order of the fields as they come to you?

    And when you say its an ETL operation, what kind? Do you have no control over it?

  • Yes, it is bad. I am pulling the data from MySQL database. I do have a submission ID so I can tell who's who, but the submission id will do nothing for me until I can flatten out the crappy dataset (all of the attributes will have the same submission id, so for right now, it seems irrelevant. I am about to despair though. The snippet below does not flatten out the data (first name and last name land on two different rows, obviously.)

    CRAP

    .

    .

    .

    .

    .

    SELECT

    ROW_NUMBER ( ) OVER (

    ORDER BY

    DERIVEDFIELD ) AS [ROW]

    ,CAST ( DERIVEDFIELD AS VARCHAR ( 255 ) ) AS DERIVEDFIELD

    ,CAST ( FIELDVALUE AS VARCHAR ( 255 ) ) AS FIELDVALUE

    INTO #T3

    FROM

    #T2

    SELECT

    [ROW]

    ,MAX ( CASE

    WHEN DERIVEDFIELD = 'Activities & Achievements' THEN FIELDVALUE

    END ) AS [ACTIVITIES & ACHIEVEMENTS]

    ,MAX ( CASE

    WHEN DERIVEDFIELD = 'Anticipated major' THEN FIELDVALUE

    END ) AS [Anticipated major]

    ,MAX ( CASE

    WHEN DERIVEDFIELD = 'given' THEN FIELDVALUE

    END ) AS [FirstName]

    ,MAX ( CASE

    WHEN DERIVEDFIELD = 'Family' THEN FIELDVALUE

    END ) AS [LastName]

    INTO #T4

    FROM

    #T3

    GROUP BY

    [ROW]

    .

    .

    .

    .

    More Crap

  • DECLARE @Pvt TABLE (Field VARCHAR(20),FieldValue VARCHAR(10))

    INSERT INTO @Pvt VALUES

    ('LastName','Davis'),('First Name','Ken'),('Salutation','Mr.')

    SELECT [LastName],[First Name],[Salutation] FROM

    (SELECT Field,FieldValue FROM @Pvt)e

    PIVOT (MAX(FieldValue) FOR Field IN ([LastName],[First Name],[Salutation])) AS PVT

  • DECLARE @Pvt TABLE (Field VARCHAR(20),FieldValue VARCHAR(10))

    INSERT INTO @Pvt VALUES

    ('LastName','Davis'),('First Name','Ken'),('Salutation','Mr.')

    SELECT [LastName],[First Name],[Salutation] FROM

    (SELECT Field,FieldValue FROM @Pvt)e

    PIVOT (MAX(FieldValue) FOR Field IN ([LastName],[First Name],[Salutation])) AS PVT

  • kmdavisjr (3/14/2014)


    Yes, it is bad. I am pulling the data from MySQL database. I do have a submission ID so I can tell who's who, but the submission id will do nothing for me until I can flatten out the crappy dataset (all of the attributes will have the same submission id, so for right now, it seems irrelevant. I am about to despair though. The snippet below does not flatten out the data (first name and last name land on two different rows, obviously.)

    CRAP

    .

    .

    .

    .

    .

    SELECT

    ROW_NUMBER ( ) OVER (

    ORDER BY

    DERIVEDFIELD ) AS [ROW]

    ,CAST ( DERIVEDFIELD AS VARCHAR ( 255 ) ) AS DERIVEDFIELD

    ,CAST ( FIELDVALUE AS VARCHAR ( 255 ) ) AS FIELDVALUE

    INTO #T3

    FROM

    #T2

    SELECT

    [ROW]

    ,MAX ( CASE

    WHEN DERIVEDFIELD = 'Activities & Achievements' THEN FIELDVALUE

    END ) AS [ACTIVITIES & ACHIEVEMENTS]

    ,MAX ( CASE

    WHEN DERIVEDFIELD = 'Anticipated major' THEN FIELDVALUE

    END ) AS [Anticipated major]

    ,MAX ( CASE

    WHEN DERIVEDFIELD = 'given' THEN FIELDVALUE

    END ) AS [FirstName]

    ,MAX ( CASE

    WHEN DERIVEDFIELD = 'Family' THEN FIELDVALUE

    END ) AS [LastName]

    INTO #T4

    FROM

    #T3

    GROUP BY

    [ROW]

    .

    .

    .

    .

    More Crap

    You cant group by the row. You need to group by the submission id to flatten the data. Then you can either use that id as your unique identifier, generate one, or insert into a table with an identity column

    If you only want one record per file then you can cross apply a single value to all rows and then group by it.

    Its tough to advise you better unless you give us a more detailed sample like a file which tests your requirements and shows your input and a sample output.

  • Nevyn (3/15/2014)


    kmdavisjr (3/14/2014)


    Yes, it is bad. I am pulling the data from MySQL database. I do have a submission ID so I can tell who's who, but the submission id will do nothing for me until I can flatten out the crappy dataset (all of the attributes will have the same submission id, so for right now, it seems irrelevant. I am about to despair though. The snippet below does not flatten out the data (first name and last name land on two different rows, obviously.)

    CRAP

    .

    .

    .

    .

    .

    SELECT

    ROW_NUMBER ( ) OVER (

    ORDER BY

    DERIVEDFIELD ) AS [ROW]

    ,CAST ( DERIVEDFIELD AS VARCHAR ( 255 ) ) AS DERIVEDFIELD

    ,CAST ( FIELDVALUE AS VARCHAR ( 255 ) ) AS FIELDVALUE

    INTO #T3

    FROM

    #T2

    SELECT

    [ROW]

    ,MAX ( CASE

    WHEN DERIVEDFIELD = 'Activities & Achievements' THEN FIELDVALUE

    END ) AS [ACTIVITIES & ACHIEVEMENTS]

    ,MAX ( CASE

    WHEN DERIVEDFIELD = 'Anticipated major' THEN FIELDVALUE

    END ) AS [Anticipated major]

    ,MAX ( CASE

    WHEN DERIVEDFIELD = 'given' THEN FIELDVALUE

    END ) AS [FirstName]

    ,MAX ( CASE

    WHEN DERIVEDFIELD = 'Family' THEN FIELDVALUE

    END ) AS [LastName]

    INTO #T4

    FROM

    #T3

    GROUP BY

    [ROW]

    .

    .

    .

    .

    More Crap

    You cant group by the row. You need to group by the submission id to flatten the data. Then you can either use that id as your unique identifier, generate one, or insert into a table with an identity column

    If you only want one record per file then you can cross apply a single value to all rows and then group by it.

    Its tough to advise you better unless you give us a more detailed sample like a file which tests your requirements and shows your input and a sample output.

    Hi Nevyn

    Thanks. I am pulling the data from a MySQL database (via a linked server in SSMS) that hosts a website created with Drupal. In that database, there are two tables that are important to the task at hand: webform_submissions and webform_submissions_data. I have been searching for a solution involves straight SQL rather than exporting the data via an API, but I do not think the straight SQL solution is possible given the nature of the way Drupal stores the data in these two tables. I would appreciate any suggestions you have. If you need me to flesh out the problem in more detail I would be happy to. For example I can export the two tables with the sample data and save them to csv or another format. Thanks again!

  • I'm pretty sure a solution is possible, but you have to give us table definitions. You can make up the sample data but it needs to have all fields.

    Also, are you trying to transform it in the query to mysql, or are you copying it as it, and then trying to transform it from sql server?

    The basic outline of the solution is that webform_submissions_data should include a submission_id, a field name column, and a field value column. The submission id tells you which fields relate.

    What you want to do is select the submission id, and the max case statements from the example above in the query, and then group by the submission id.

    Here is a quickly thrown together sample. For anything more we need to know what we're dealing with:

    -- Create the sample table

    CREATE TABLE [dbo].[webform_submissions_data](

    [SubmissionId] [int] NOT NULL,

    [FieldName] [varchar](10) NOT NULL,

    [FieldValue] [varchar](25) NULL,

    )

    GO

    -- Add sample data

    INSERT webform_submissions_data (SubmissionId,FieldName,FieldValue)

    SELECT 1,'First Name','John'

    UNION ALL

    SELECT 1,'Last Name','Smith'

    UNION ALL

    SELECT 2,'First Name','Jim'

    UNION ALL

    SELECT 2,'Last Name','Jones'

    -- Create target table

    CREATE TABLE [dbo].[WebformNormalized](

    [SubmissionId] [int] NOT NULL,

    [FirstName] [varchar](25) NULL,

    [LastName] [varchar](25) NULL,

    CONSTRAINT [PK_WebformNormalized] PRIMARY KEY CLUSTERED

    (

    [SubmissionId] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    GO

    -- Do the transformation

    INSERT WebformNormalized (SubmissionId,FirstName,LastName)

    SELECT SubmissionId,

    MAX (CASE WHEN FieldName = 'First Name' THEN FieldValue ELSE NULL END) FirstName,

    MAX (CASE WHEN Fieldname = 'Last Name' THEN FieldValue ELSE NULL END) LastName

    FROM webform_submissions_data

    GROUP BY SubmissionId

    You need the submission id because it tells you which rows are related.

  • Nevyn (3/15/2014)


    I'm pretty sure a solution is possible, but you have to give us table definitions. You can make up the sample data but it needs to have all fields.

    Also, are you trying to transform it in the query to mysql, or are you copying it as it, and then trying to transform it from sql server?

    The basic outline of the solution is that webform_submissions_data should include a submission_id, a field name column, and a field value column. The submission id tells you which fields relate.

    What you want to do is select the submission id, and the max case statements from the example above in the query, and then group by the submission id.

    Here is a quickly thrown together sample. For anything more we need to know what we're dealing with:

    -- Create the sample table

    CREATE TABLE [dbo].[webform_submissions_data](

    [SubmissionId] [int] NOT NULL,

    [FieldName] [varchar](10) NOT NULL,

    [FieldValue] [varchar](25) NULL,

    )

    GO

    -- Add sample data

    INSERT webform_submissions_data (SubmissionId,FieldName,FieldValue)

    SELECT 1,'First Name','John'

    UNION ALL

    SELECT 1,'Last Name','Smith'

    UNION ALL

    SELECT 2,'First Name','Jim'

    UNION ALL

    SELECT 2,'Last Name','Jones'

    -- Create target table

    CREATE TABLE [dbo].[WebformNormalized](

    [SubmissionId] [int] NOT NULL,

    [FirstName] [varchar](25) NULL,

    [LastName] [varchar](25) NULL,

    CONSTRAINT [PK_WebformNormalized] PRIMARY KEY CLUSTERED

    (

    [SubmissionId] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    GO

    -- Do the transformation

    INSERT WebformNormalized (SubmissionId,FirstName,LastName)

    SELECT SubmissionId,

    MAX (CASE WHEN FieldName = 'First Name' THEN FieldValue ELSE NULL END) FirstName,

    MAX (CASE WHEN Fieldname = 'Last Name' THEN FieldValue ELSE NULL END) LastName

    FROM webform_submissions_data

    GROUP BY SubmissionId

    You need the submission id because it tells you which rows are related.

    Thanks so much for your help. I can expose the submission id. Right now,there is only one submission. What I want to do is flatten the file to get individual rows for each sid. I will then construct a SQL Server table to host the data, and then update or append that table using a SQL Server linked table and and distributed queries. I have pasted the table definitions and have attached sample data. I do have the ability to create views on the MySQL database.

    CREATE TABLE `791650_moc_content`.drupal_Re7RA_webform_submissions (

    sid int(10) UNSIGNED NOT NULL AUTO_INCREMENT COMMENT 'The unique identifier for this submission.',

    nid int(10) UNSIGNED NOT NULL DEFAULT 0 COMMENT 'The node identifier of a webform.',

    uid int(10) UNSIGNED NOT NULL DEFAULT 0 COMMENT 'The id of the user that completed this submission.',

    is_draft tinyint(4) NOT NULL DEFAULT 0 COMMENT 'Is this a draft of the submission?',

    submitted int(11) NOT NULL DEFAULT 0 COMMENT 'Timestamp of when the form was submitted.',

    remote_addr varchar(128) DEFAULT NULL COMMENT 'The IP address of the user that submitted the form.',

    PRIMARY KEY (sid),

    INDEX nid_sid (nid, sid),

    INDEX nid_uid_sid (nid, uid, sid),

    UNIQUE INDEX sid_nid (sid, nid)

    )

    ENGINE = INNODB

    AUTO_INCREMENT = 1339

    AVG_ROW_LENGTH = 79

    CHARACTER SET utf8

    COLLATE utf8_general_ci

    COMMENT = 'Holds general information about submissions outside of...';

    CREATE TABLE `791650_moc_content`.drupal_Re7RA_webform_submissions (

    sid int(10) UNSIGNED NOT NULL AUTO_INCREMENT COMMENT 'The unique identifier for this submission.',

    nid int(10) UNSIGNED NOT NULL DEFAULT 0 COMMENT 'The node identifier of a webform.',

    uid int(10) UNSIGNED NOT NULL DEFAULT 0 COMMENT 'The id of the user that completed this submission.',

    is_draft tinyint(4) NOT NULL DEFAULT 0 COMMENT 'Is this a draft of the submission?',

    submitted int(11) NOT NULL DEFAULT 0 COMMENT 'Timestamp of when the form was submitted.',

    remote_addr varchar(128) DEFAULT NULL COMMENT 'The IP address of the user that submitted the form.',

    PRIMARY KEY (sid),

    INDEX nid_sid (nid, sid),

    INDEX nid_uid_sid (nid, uid, sid),

    UNIQUE INDEX sid_nid (sid, nid)

    )

    ENGINE = INNODB

    AUTO_INCREMENT = 1339

    AVG_ROW_LENGTH = 78

    CHARACTER SET utf8

    COLLATE utf8_general_ci

    COMMENT = 'Holds general information about submissions outside of...';

    CREATE TABLE `791650_moc_content`.drupal_Re7RA_webform_submitted_data (

    nid int(10) UNSIGNED NOT NULL DEFAULT 0 COMMENT 'The node identifier of a webform.',

    sid int(10) UNSIGNED NOT NULL DEFAULT 0 COMMENT 'The unique identifier for this submission.',

    cid smallint(5) UNSIGNED NOT NULL DEFAULT 0 COMMENT 'The identifier for this component within this node, starts at 0 for each node.',

    no varchar(128) NOT NULL DEFAULT '0' COMMENT 'Usually this value is 0, but if a field has multiple values (such as a time or date), it may require multiple rows in the database.',

    data mediumtext NOT NULL COMMENT 'The submitted value of this field, may be serialized for some components.',

    PRIMARY KEY (nid, sid, cid, no),

    INDEX nid (nid),

    INDEX sid_nid (sid, nid)

    )

    ENGINE = INNODB

    AVG_ROW_LENGTH = 123

    CHARACTER SET utf8

    COLLATE utf8_general_ci

    COMMENT = 'Stores all submitted field data for webform submissions.';

    CREATE TABLE `791650_moc_content`.drupal_Re7RA_webform_component (

    nid int(10) UNSIGNED NOT NULL DEFAULT 0 COMMENT 'The node identifier of a webform.',

    cid smallint(5) UNSIGNED NOT NULL DEFAULT 0 COMMENT 'The identifier for this component within this node, starts at 0 for each node.',

    pid smallint(5) UNSIGNED NOT NULL DEFAULT 0 COMMENT 'If this component has a parent fieldset, the cid of that component.',

    form_key varchar(128) DEFAULT NULL COMMENT 'When the form is displayed and processed, this key can be used to reference the results.',

    name varchar(255) DEFAULT NULL COMMENT 'The label for this component.',

    type varchar(16) DEFAULT NULL COMMENT 'The field type of this component (textfield, select, hidden, etc.).',

    value text NOT NULL COMMENT 'The default value of the component when displayed to the end-user.',

    extra text NOT NULL COMMENT 'Additional information unique to the display or processing of this component.',

    required tinyint(4) NOT NULL DEFAULT 0 COMMENT 'Boolean flag for if this component is required.',

    weight smallint(6) NOT NULL DEFAULT 0 COMMENT 'Determines the position of this component in the form.',

    PRIMARY KEY (nid, cid)

    )

    ENGINE = INNODB

    AVG_ROW_LENGTH = 574

    CHARACTER SET utf8

    COLLATE utf8_general_ci

    COMMENT = 'Stores information about components for webform nodes.';

  • I'm not at 100% sure, but it looks to me as if the only tables you need to look at are `791650_moc_content`.drupal_Re7RA_webform_submitted_data and `791650_moc_content`.drupal_Re7RA_webform_component. Maybe you have to select only rows that refer to submissions of relevant data, but roughly speaking you need to join component table with data table on cid equality to produce three separate joined pairs (with a where filter selecting the three component names you want, one name for each pair) and then join those 3 all together on equality of sid, and select the three componet values plus the sid as your result. I think that will deliver what you want.

    Of course that seems to me to be a pretty clear indication that the MYSQL database you are stuck with is pretty horrible (as it needs a 6 way join to do something that simple) probably as a result of using drupal (which doesn't appear to fit the relational model if this sample is anything to go by).

    Tom

  • TomThomson (3/16/2014)


    I'm not at 100% sure, but it looks to me as if the only tables you need to look at are `791650_moc_content`.drupal_Re7RA_webform_submitted_data and `791650_moc_content`.drupal_Re7RA_webform_component. Maybe you have to select only rows that refer to submissions of relevant data, but roughly speaking you need to join component table with data table on cid equality to produce three separate joined pairs (with a where filter selecting the three component names you want, one name for each pair) and then join those 3 all together on equality of sid, and select the three componet values plus the sid as your result. I think that will deliver what you want.

    Of course that seems to me to be a pretty clear indication that the MYSQL database you are stuck with is pretty horrible (as it needs a 6 way join to do something that simple) probably as a result of using drupal (which doesn't appear to fit the relational model if this sample is anything to go by).

    Hi Tom,

    Thanks for your input! I have experimented with derivatives of your idea with no success. Drupal totally ignores relational design, and uses PHP code to stuff data in arrays and parse those arrays for presentation. It is very frustrating to deal with it, (not to mention wasteful). I will probably have to resort to an drupal API to pull the data out using PHP. Drat! :angry:

  • Of course that seems to me to be a pretty clear indication that the MYSQL database you are stuck with is pretty horrible (as it needs a 6 way join to do something that simple) probably as a result of using drupal (which doesn't appear to fit the relational model if this sample is anything to go by).

    Well, lets remember what this is and what its for. Drupal is an open source cms. This particular thing the OP has to support is clearly a component that allows you to dynamically build a webform using only front end tools, as opposed to developing a custom form all the time. Of course the data model for something like that will be messed up and not relational, but then they only really intend it for use when the data needs coming out of those forms are pretty basic.

    To the OP: I assume you only need the data from this one webform? If you need it from all webforms, that becomes an entirely different monster.

    But assuming you just need to do it where nid = 17147, its still not that hard.

    Look at the examples I posted above.

    You are going to select from the data table. You are going to filter where nid = 17147. You are going to group by sid. And for each data field you actually care about, you will need a MAX(CASE statement as above, and then alias that clause with the fieldname as you'd like it to appear.. You don't really need the component table unless you are trying to do this dynamically.

    The only things that really suck are:

    1) With the given name field (cid 5), it is subdivided again by no. So for that instead of having one MAX(CASE based on the cid, you would need one for each of those fields checking for both the cid and the no.

    2) All your data is going to come out as character data, even numbers and dates. So after your import you may want to play with whether they can be converted.

  • Hi Nevyn,

    Yes, I only need it for one form. I am looking over your example again. You are correct: the form is dynamic, so that different fields show up depending on the user's input. (for example a different set of locations and majors is presented depending on the enrollment status). With respect to all of the data coming out as character initially: That is not too big of a problem as I can cast those fields into the correct datatypes before inserting them in the SQL server staging table.

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply