Unpivot okay, but how do I extract one more piece of data from the column?

  • This isn't a huge issue... It's more something I'm trying to learn.

    I have a database table that was designed by someone else, and I'm trying to see if I can normalize the pattern. Here's the dummy table:

    CREATE TABLE [dbo].[BadTox](

    [PatientID] [int] NULL,

    [Cycle] [tinyint] NULL,

    [ALOPECIA] [tinyint] NULL,

    [Causality1] [tinyint] NULL,

    [Relatedness1] [tinyint] NULL,

    [BLOOD] [tinyint] NULL,

    [Causality2] [tinyint] NULL,

    [Relatedness2] [tinyint] NULL,

    [BRAIN] [tinyint] NULL,

    [Causality3] [tinyint] NULL,

    [Relatedness3] [tinyint] NULL

    );

    All the column names in upper case are actually symptom names, and in those columns are values {NULL, 1, 2, 3, 4, 5}

    and they belong in a column, so the normalized structure should be like this:

    CREATE TABLE Symptom (

    PatientID INT NOT NULL,

    Cycle TINYINT NOT NULL,

    SymptomName VARCHAR(20) NOT NULL, -- from the source column *name*

    Grade TINYINT NOT NULL -- from the value in the column with the name in uppercase

    PRIMARY KEY (PatientID, Cycle, SymptomName));

    I can untwist the repeating groups with the code I borrowed from Kenneth Fisher's article [ here ], but the part I'm having a harder time with is grabbing the information that's still left in the column name and integrating it into the solution...

    I can retrieve all the column names that are in uppercase using this:

    DECLARE @db_id int;

    DECLARE @object_id int;

    SET @db_id = DB_ID(N'SCRIDB');

    SET @object_id = OBJECT_ID(N'SCRIDB.dbo.BadTox');

    SELECT name AS column_name

    , column_id AS col_order

    FROM sys.all_columns

    WHERE name = UPPER(name) COLLATE SQL_Latin1_General_CP1_CS_AS

    AND object_id = @object_id;

    but I can't figure out how to work it into this (that I built by mimicking Kenneth Fisher's article...):

    ALTER PROC [dbo].[UnpivotMaxGradeUsingCrossApply]

    AS

    SELECT PatientID

    , Toxicity

    , MAX(Grade) AS MaxGrade

    FROM

    (

    SELECT PatientID

    , Cycle

    , Toxicity

    , 1+ABS(CHECKSUM(NEWID())%5) AS Grade --- this is a stub... the actual value is under the Toxicity column.

    , Causality

    , Relatedness

    FROM (

    SELECT ups.PatientID

    , ups.Cycle

    , CrossApplied.Toxicity

    , CrossApplied.Causality

    , CrossApplied.Relatedness

    FROM UnpivotSymptoms ups

    -- how do I return a resultset of Toxicity Names and build this?

    CROSS APPLY (VALUES (ToxicityName1, Causality1, Relatedness1),

    (ToxicityName2, Causality2, Relatedness2),

    (ToxicityName3, Causality3, Relatedness3))

    CrossApplied (Toxicity, Causality, Relatedness)

    )x ) y

    GROUP BY PatientID, Toxicity

    ORDER BY PatientID, Toxicity;

    The problem is that I need to extract the column names (where ToxicityName[n] would be). I can do that by querying the sys.all_columns view, but I can't figure out how to integrate the two pieces. About the only thing I have even dreamed up is to build the VALUES(...) statements dynamically from the values returned by the system view.

    So how do I get both the value from the ToxicityName[n] column and the column name into my final data query? (I have a feeling this might require dynamic SQL...)

    Thanks!

    Pieter

  • If the structure of BadTox is fixed and is exactly as it is in your post,

    then you needn't anylize system dictionary.

    Just

    SELECT ups.PatientID

    , ups.Cycle

    , CrossApplied.SymptomName

    , CrossApplied.Toxicity

    , CrossApplied.Causality

    , CrossApplied.Relatedness

    FROM [dbo].[BadTox] ups

    CROSS APPLY (VALUES ('ALOPECIA', ALOPECIA, Causality1, Relatedness1),

    ('BLOOD', BLOOD, Causality2, Relatedness2),

    ('BRAIN',BRAIN, Causality3, Relatedness3))

    CrossApplied (SymptomName, Toxicity, Causality, Relatedness)

    Othewise you need dynamic sql plus some more info besides system dictionary which will tell you that column name "BLOOD" has number 2, so it should be associated with

    "Causality2", "Relatedness2". For example you may assume that this number is the row_number in the columns list

    --use <your db name>;

    DECLARE @object_id int;

    SET @object_id = OBJECT_ID(N'dbo.BadTox');

    SELECT name AS column_name

    , column_id AS col_order

    , Causality_N_Relatedness_suffix =row_number() over (order by column_id)

    FROM sys.all_columns

    WHERE name = UPPER(name) COLLATE SQL_Latin1_General_CP1_CS_AS

    AND object_id = @object_id;

    Or it may be some other rule which will help you to associate "BLOOD" with "Causality2", "Relatedness2".

  • The only rule is this:

    1. Symptom names are always in uppercase.

    The next column is usually Causality[n]

    3. The column after that is Relatedness[n].

    So a table could look like this:

    CREATE TABLE BadTox (

    PatientID INT,

    Cycle TINYINT,

    ALOPECIA TINYINT,

    Causality1 TINYINT,

    Relatedness1 TINYINT,

    BONE TINYINT,

    Bone_Specify VARCHAR(50),

    Causality2 TINYINT,

    Relatedness2 TINYINT,

    CARDIAC TINYINT,

    Causality3 TINYINT,

    Relatedness3 TINYINT

    ...)

    There could be a "Specify" after the Symptom column.

    From the looks of it, if I can figure out how to create the list in the VALUES part of the CROSS APPLY, this thing will work.

    Thanks!

  • Try this.

    --USE <your DB name>;

    GO

    IF object_id(N'dbo.BadTox', 'U') IS NOT NULL

    DROP TABLE [dbo].[BadTox];

    GO

    CREATE TABLE [dbo].[BadTox](

    [PatientID] [int] NULL,

    [Cycle] [tinyint] NULL,

    [ALOPECIA] [tinyint] NULL,

    [Causality1] [tinyint] NULL,

    [Relatedness1] [tinyint] NULL,

    [BLOOD] [tinyint] NULL,

    [Causality2] [tinyint] NULL,

    [Relatedness2] [tinyint] NULL,

    [BRAIN] [tinyint] NULL,

    [Causality3] [tinyint] NULL,

    [Relatedness3] [tinyint] NULL

    );

    INSERT [dbo].[BadTox] VALUES

    (100,110, 1, 2, 3, 11, 12, 13, 21, 22, 23)

    ,(100,111, 2, 3, 4, 12, 13, 14, 22, 23, 24)

    ,(200,111, 3, 4, 5, 13, 14, 15, 23, 24, 25);

    DECLARE @object_id INT;

    SET @object_id = object_id(N'dbo.BadTox');

    DECLARE @cmd VARCHAR(8000) = 'SELECT ups.PatientID

    , ups.Cycle

    , CrossApplied.SymptomName

    , CrossApplied.Toxicity

    , CrossApplied.Causality

    , CrossApplied.Relatedness

    FROM [dbo].[BadTox] ups

    CROSS APPLY (VALUES ';

    SET @cmd += STUFF(

    (SELECT ',('''+name+''','+ name

    +', Causality'+

    CAST(ROW_NUMBER() OVER (ORDER BY column_id) AS VARCHAR(2))

    +', Relatedness'+

    CAST(ROW_NUMBER() OVER (ORDER BY column_id) AS VARCHAR(2))

    + ')'

    --, Causality_N_Relatedness_suffix =row_number() over (order by column_id)

    FROM sys.all_columns

    WHERE name = UPPER(name) COLLATE SQL_Latin1_General_CP1_CS_AS

    AND object_id = @object_id

    FOR XML PATH('') )

    ,1,1,'');

    SET @cmd += ') CrossApplied (SymptomName, Toxicity, Causality, Relatedness);';

    EXEC (@cmd);

    The assumption is that Causality and Relatedness columns names follow Causality[n], Relatedness[n] pattern.

    Hope it helps.

  • Taking x_Specify columns into account

    --USE <your DB name>;

    GO

    IF object_id(N'dbo.BadTox', 'U') IS NOT NULL

    DROP TABLE [dbo].[BadTox];

    GO

    CREATE TABLE [dbo].[BadTox](

    [PatientID] [int] NULL,

    [Cycle] [tinyint] NULL,

    [ALOPECIA] [tinyint] NULL,

    [Causality1] [tinyint] NULL,

    [Relatedness1] [tinyint] NULL,

    BONE TINYINT,

    Bone_Specify VARCHAR(50),

    [Causality2] [tinyint] NULL,

    [Relatedness2] [tinyint] NULL,

    [BRAIN] [tinyint] NULL,

    [Causality3] [tinyint] NULL,

    [Relatedness3] [tinyint] NULL

    );

    INSERT [dbo].[BadTox] VALUES

    (100,110, 1, 2, 3, 11, 'qwe', 12, 13, 21, 22, 23)

    ,(100,111, 2, 3, 4, 12, 'asd', 13, 14, 22, 23, 24)

    ,(200,111, 3, 4, 5, 13, NULL, 14, 15, 23, 24, 25);

    DECLARE @object_id INT;

    SET @object_id = object_id(N'dbo.BadTox');

    DECLARE @cmd VARCHAR(8000) = 'SELECT ups.PatientID

    , ups.Cycle

    , CrossApplied.SymptomName

    , CrossApplied.Specify

    , CrossApplied.Toxicity

    , CrossApplied.Causality

    , CrossApplied.Relatedness

    FROM [dbo].[BadTox] ups

    CROSS APPLY (VALUES ';

    SET @cmd += STUFF(

    (SELECT ',('''+ca.name+''','+ ca.name

    +',' + isnull(cb.name,'NULL')

    +', Causality'+

    CAST(ROW_NUMBER() OVER (ORDER BY ca.column_id) AS VARCHAR(2))

    +', Relatedness'+

    CAST(ROW_NUMBER() OVER (ORDER BY ca.column_id) AS VARCHAR(2))

    + ')'

    FROM sys.all_columns ca

    LEFT JOIN sys.all_columns cb

    ON UPPER(cb.name) = UPPER(ca.name)+ '_SPECIFY' AND cb.object_id = @object_id

    WHERE ca.name = UPPER(ca.name) COLLATE SQL_Latin1_General_CP1_CS_AS

    AND ca.object_id = @object_id

    FOR XML PATH('') )

    ,1,1,'');

    SET @cmd += ') CrossApplied (SymptomName, Toxicity, Specify, Causality, Relatedness);';

    EXEC (@cmd);

  • Super cool!! That should fix it. Definitely a solution I will have to keep in my bag of tricks.

    Thanks!

    Pieter

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply