Normalize data using Pivot WITHOUT using Aggregate function

  • I've searched the forums here (which usually have a wealth of knowledge), but have been unable to find an answer to my problem. Below is the layout of my table as it is currently:

    Below is how I want the table to turn out:

    TABLE CURRENTLY:

    UniqueIDColumnNameText

    1ID123146

    2SubjectAreaMath

    3LocalCourseCode000A1

    4SchoolID123

    5SourceHub

    6CourseTitleAlgebra I

    7ID94503

    8SubjectAreaScience

    9LocalCourseCode9037

    10SchoolID46

    11SourceAnnex

    12CourseTitleBiology II

    WHAT I WANT:

    IDSubjectAreaLocalCourseCodeSchoolIDSourceCourseTitle

    123146Math000A1123HubAlgebra I

    94503Science903746AnnexBiology II

    I am currently using the PIVOT function in SQL Server 2008. However, using this function still requires the use of an aggregate, which kills what I'm trying to do. I am currently using min(text) for ColumnName in (<ColumnNames>).

    However, that does not produce correct results. I would like to use the PIVOT function because of speed (original table has appx 900,000 rows).

    Any help would be much appreciated!

  • I'd like to take a crack at this. Can you put your data into an actual text format (not an image) so I can try it out? (It'd be helpful if you can put your data into a table variable that mimics your sample data.)

    Just curious -- what is your PIVOT actually giving you? If I'm not mistaken, I don't think you can run a PIVOT without using an aggregate. I messed around with it a while back, and couldn't get it to work without the aggregate. I ended up setting up a "dummy" aggregate to get it to work.

    Hopefully someone who knows PIVOT better than me can answer that better.

    +--------------------------------------------------------------------------------------+
    Check out my blog at https://pianorayk.wordpress.com/

  • Please see my original post for code to copy/paste.

    I also setup a "dummy" aggregate to see if I could get that to work -- I used min(text) for columnname. However, using that aggregate with the data which I'm presenting produces problems:

    1. How to determine min(text) when it is possible for the TEXT of a particular column to contain both Numeric and Alphabetic data?

    2. In my example data, for the ID column, we know that 95403 is lower than 123146. However, when doing Min(text), I will get 123146 returned, since the lowest of the first character of the number (1 vs 9) is 1.

    These are just a smattering of the issues which I am running into.

  • Thanks . . . actually, what I was looking for was this (I'm posting this here for anyone else who wants to try this problem):

    declare @school table ([ID] int, ColumnName varchar(15), [Text] varchar(50))

    insert into @school values (1, 'ID', '123146')

    insert into @school values (2, 'SubjectArea', 'Math')

    insert into @school values (3, 'LocalCourseCode', '000A1')

    insert into @school values (4, 'SchoolID', '123')

    insert into @school values (5, 'Source', 'Hub')

    insert into @school values (6, 'CourseTitle', 'Algebra I')

    insert into @school values (7, 'ID', '94503')

    insert into @school values (8, 'SubjectArea', 'Science')

    insert into @school values (9, 'LocalCourseCode', '9037')

    insert into @school values (10, 'SchoolID', '46')

    insert into @school values (11, 'Source', 'Annex')

    insert into @school values (12, 'CourseTitle', 'Biology II')

    select * from @school

    Just for your future reference -- if you post/present your data like this, you'll probably get faster responses! 🙂 (Basically, help us help you -- the less work you make for everyone, the sooner people will respond!)

    I guessed on the field types -- let me know if they look correct!

    Let me mess around with this a little bit. If I come up with anything useful, I'll post again!

    +--------------------------------------------------------------------------------------+
    Check out my blog at https://pianorayk.wordpress.com/

  • Hi Justin

    This might sound daft, but bear with me...

    How do you know that UniqueID 4 is the school for ID 123146, and not UniqueID 10? Or to put it ab

    another way, is each block of 6 consecutive UniqueID's destined to become a single row in your output?


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • Yes, that is correct. Each block of 6 is destined to become a singular row in the final table.

  • Justin James (12/29/2009)


    Yes, that is correct. Each block of 6 is destined to become a singular row in the final table.

    Is there a missing column which would uniquely identify each block of six rows? Don't worry if there isn't because there's a workaround, but it would help a little.


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • There is not. I was thinking of using ROW_NUMBER(), but was unsure how to partition by each block of 6 rows.

  • Ray K (12/29/2009)


    Just for your future reference -- if you post/present your data like this, you'll probably get faster responses! 🙂 (Basically, help us help you -- the less work you make for everyone, the sooner people will respond!)

    Ray is sooooo right;

    once the data is in a consumable format, I jumped at trying my hand at this; if you can, always give us the tables and inserts...

    here's my version, and my example results; note the last column is off by one row, and i'm working on that, but this is getting fairly close to the desired results:

    the same results:

    REC ID SubjectArea LocalCourseCode SchoolID Source CourseTitle

    ----------- ------------ ---------------- --------- ------- ------------

    0 123146 Math 000A1 123 Hub NULL

    1 94503 Science 9037 46 Annex Algebra I

    2 NULL NULL NULL NULL NULL Biology II

    the test code

    declare @school table ([ID] int, ColumnName varchar(15), [Text] varchar(50))

    insert into @school values (1, 'ID', '123146')

    insert into @school values (2, 'SubjectArea', 'Math')

    insert into @school values (3, 'LocalCourseCode', '000A1')

    insert into @school values (4, 'SchoolID', '123')

    insert into @school values (5, 'Source', 'Hub')

    insert into @school values (6, 'CourseTitle', 'Algebra I')

    insert into @school values (7, 'ID', '94503')

    insert into @school values (8, 'SubjectArea', 'Science')

    insert into @school values (9, 'LocalCourseCode', '9037')

    insert into @school values (10, 'SchoolID', '46')

    insert into @school values (11, 'Source', 'Annex')

    insert into @school values (12, 'CourseTitle', 'Biology II')

    --select * from @school

    SELECT REC,

    [1] AS ID,

    [2] AS SubjectArea,

    [3] AS LocalCourseCode,

    [4] AS SchoolID,

    [5] AS Source,

    [0] AS CourseTitle

    FROM (SELECT [ID]/ 6 As REC,[ID]% 6 AS GRP,[Text] from @school) As TheSource

    PIVOT

    (

    min([Text])

    FOR

    GRP --each value

    IN ( [0], [1], [2], [3], [4], [5],[6])

    ) AS PIVOTALIAS

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Justin James (12/29/2009)


    There is not. I was thinking of using ROW_NUMBER(), but was unsure how to partition by each block of 6 rows.

    Like this:

    ;WITH CTE1 AS (

    SELECT ((UniqueID-1)/6)+1 AS CourseBlock, [ID], ColumnName, [Text]

    FROM YourTable

    ),

    CTE2 AS (

    SELECT CourseBlock,

    ROW_NUMBER() OVER(PARTITION BY CourseBlock ORDER BY [ID]) AS CourseElement,

    [ID], ColumnName, [Text]

    FROM CTE1

    )

    SELECT * FROM CTE2

    Edit: Currently unable to test, eval version expired during vacation


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • Thank you for the help Lowell. I'll make sure to include the appropriate SQL in my further posts.

    One question I have about your SQL:

    select ID/6 as REC, [ID]% 6 as GRP, TEXT from @school

    What if the number of rows in each group is odd....say, 7? I tried changing the value to an odd number, and the results were skewed.

  • lobbymuncher (12/29/2009)


    ;WITH CTE1 AS (

    SELECT ((UniqueID-1)/6)+1 AS CourseBlock, [ID], ColumnName, [Text]

    FROM YourTable

    ),

    CTE2 AS (

    SELECT CourseBlock,

    ROW_NUMBER() OVER(PARTITION BY CourseBlock ORDER BY [ID]) AS CourseElement,

    [ID], ColumnName, [Text]

    FROM CTE1

    )

    SELECT * FROM CTE2

    Ummm . . . this is really convoluted (and total overkill, IMHO). Wouldn't it be better to just do this (or something similar):

    select cast ([ID]/6 as int) from

    Don't yet have an answer to your original question. I'm starting to think that a CTE (rather than a PIVOT) might be the way to go.

    Any other suggestions from the peanut gallery?

    +--------------------------------------------------------------------------------------+
    Check out my blog at https://pianorayk.wordpress.com/

  • Justin James (12/29/2009)


    Thank you for the help Lowell. I'll make sure to include the appropriate SQL in my further posts.

    One question I have about your SQL:

    select ID/6 as REC, [ID]% 6 as GRP, TEXT from @school

    What if the number of rows in each group is odd....say, 7? I tried changing the value to an odd number, and the results were skewed.

    I'm assuming that the number of rows is alwasy exactly 6 items...if you have some groups that are 7, and others that are 6, we need a different strategy. The sample data is currently 6 items to a group.

    All I'm doing is a different technique to get the same results that lobbymuncher posted.

    you said that each group of 6 entries were a record...so i'm using integer division of ID / 6 to get a group for 6 items, and integer modulous to get 6 unique values from ID % 6

    any number % 6 will return one of the following values: 1,2,3,4,5 or 0, which is what i was trying to use to group the values.

    still not quite the way i had hoped it would work, but it is really close....

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Lowell (12/29/2009)


    Justin James (12/29/2009)


    Thank you for the help Lowell. I'll make sure to include the appropriate SQL in my further posts.

    One question I have about your SQL:

    select ID/6 as REC, [ID]% 6 as GRP, TEXT from @school

    What if the number of rows in each group is odd....say, 7? I tried changing the value to an odd number, and the results were skewed.

    I'm assuming that the number of rows is alwasy exactly 6 items...if you have some groups that are 7, and others that are 6, we need a different strategy. The sample data is currently 6 items to a group.

    All I'm doing is a different technique to get the same results that lobbymuncher posted.

    you said that each group of 6 entries were a record...so i'm using integer division of ID / 6 to get a group for 6 items, and integer modulous to get 6 unique values from ID % 6

    any number % 6 will return one of the following values: 1,2,3,4,5 or 0, which is what i was trying to use to group the values.

    still not quite the way i had hoped it would work, but it is really close....

    As I looked through my data further (what I provided was just a subset), I see that the grouping of information change throughout it. :(. However, on the BRIGHT side (!), there is a way to tell when a new group occurs. A new group occurs whenever the value in the TEXT column is [DW].[DimClass].

    That being said, here is the new test code:

    declare @school table ([ID] int, ColumnName varchar(15), [Text] varchar(50))

    --Group 1

    insert into @school values (0, 'DestinationTableName', '[DW].[DimClass]')

    insert into @school values (1, 'ID', '123146')

    insert into @school values (2, 'SubjectArea', 'Math')

    insert into @school values (3, 'LocalCourseCode', '000A1')

    insert into @school values (4, 'SchoolID', '123')

    insert into @school values (5, 'Source', 'Hub')

    insert into @school values (6, 'CourseTitle', 'Algebra I')

    --Group 2

    insert into @school values (7, 'DestinationTableName', '[DW].[DimClass]')

    insert into @school values (8, 'ID', '94503')

    insert into @school values (9, 'SubjectArea', 'Science')

    insert into @school values (10, 'LocalCourseCode', '9037')

    insert into @school values (12, 'SchoolID', '46')

    insert into @school values (12, 'Source', 'Annex')

    insert into @school values (13, 'CourseTitle', 'Biology II')

    insert into @school values (14, 'TeacherID', '58398')

    insert into @school values (15, 'Credits Possible', '1.0')

    --Group 3

    insert into @school values (16, 'DestinationTableName', '[DW].[DimClass]')

    insert into @school values (17, 'ID', '84023')

    insert into @school values (18, 'Source', 'Annex')

    insert into @school values (19, 'CourseTitle', 'Physics I')

    select * from @school

  • Justin James (12/29/2009)


    As I looked through my data further (what I provided was just a subset), I see that the grouping of information change throughout it. 🙁

    Argh! My model just got blown out of the water!

    Also, I got truncation errors when I ran the sample data code. I bumped the varchar(15) up to varchar(25), and that seemed to fix it.

    +--------------------------------------------------------------------------------------+
    Check out my blog at https://pianorayk.wordpress.com/

Viewing 15 posts - 1 through 15 (of 50 total)

You must be logged in to reply to this topic. Login to reply