Looking for an optimal solution

  • Preface:
    I've been working on a particular problem past the threshold of time before I should ask for second opinions.  I've come up with a couple of working solutions however I have that nagging feeling that I might be overlooking an obvious and less complicated alternative.  If interested I'll be happy to share these solutions afterwards but I want to see what other people come up with without being influenced by my methods.  Anyway see below:

    This is my starting dataset:

    declare @test-2 table (studentnumber int, seq int, subj nvarchar(50), scorea int, scoreb int, scorec int)

    --first student
    insert into @test-2
    values (123, 2, 'subject1', 2, 5, null), (123, 3, 'subject2', 20, 13, null), (123, 1, 'subject3', 23, 48, 39)

    --second student
    insert into @test-2
    values (345, 2, 'subject1', 20, 50, null), (345, 3, 'subject2', 200, 130, null), (345, 1, 'subject3', 230, 480, 390)

    this is my desired output:

    each student has its set of subject scores as columns on each row clustered by subject with the column clusters ordered by the seq value in the initial data (i.e. the first 3 score columns - 23, 48, 39  for studentnumber 123 - are the scorea/b/c values for subject3 and are the first score columns listed because subject3 has seq value of 1).

    Thanks in advance for all considerations and suggestions

    Regards,
    MF

  • My solution is to use the right tool for the job.  In this case it's a reporting tool such as SSRS.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • What are you trying to do?

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Hey all -

    Thanks for the interest.  Unfortunately, my scenario is rigidly as described as far as input and output are concerned.  This is the format the data has been delivered to me and my company's reporting solutions (which is ultimately what this logic will be feeding into) are all proprietary and developed in-house.  My solution has to be limited to TSQL options as third party reporting utilities are not on the table.  

    Thanks again for your time
    --MF

  • Could you provide actual test data, as CREATE TABLE and INSERT statements, rather than just a picture of data.  We can't write code against a picture .🙂

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • What have you tried?  Can you post that code?

    A few questions.
    Will there always be only 3 scores for a course for a student? In other words, there will be one row per student per course.

    Michael L John
    If you assassinate a DBA, would you pull a trigger?
    To properly post on a forum:
    http://www.sqlservercentral.com/articles/61537/

  • flownominal16 - Tuesday, December 12, 2017 1:58 PM

    Hey all -

    Thanks for the interest.  Unfortunately, my scenario is rigidly as described as far as input and output are concerned.  This is the format the data has been delivered to me and my company's reporting solutions (which is ultimately what this logic will be feeding into) are all proprietary and developed in-house.  My solution has to be limited to TSQL options as third party reporting utilities are not on the table.  

    Thanks again for your time
    --MF

    Ok, I understand what's going on here... 

    First, for sample data I'm going to use a temp table so that I can include an index which will aid the performance of my solution. The sample data:

    if object_id('tempdb..#test') is not null drop table #test
    create table #test
    (
    studentnumber int    not null,
    seq    int    not null,
    subj    nvarchar(50) not null,
    scorea   int    null,
    scoreb   int    null,
    scorec   int    null,
    constraint pk_test primary key clustered (studentnumber, seq) -- the index you want (nonclusted is better)
    );

    insert into #test
    values (123, 2, 'subject1', 2, 5, null), (123, 3, 'subject2', 20, 13, null),
       (123, 1, 'subject3', 23, 48, 39),(345, 2, 'subject1', 20, 50, null),
       (345, 3, 'subject2', 200, 130, null), (345, 1, 'subject3', 230, 480, 390);

    Note that the primary key creates a clustered index on studentnumber and seq.

    The solution is an alternative to PIVOT that I learned in this article. The solution:

    select
    t.studentnumber,
    scorea1 = max(case seq when 1 then scorea end),
    scoreb1 = max(case seq when 1 then scoreb end),
    scorec1 = max(case seq when 1 then scorec end),
    scorea2 = max(case seq when 2 then scorea end),
    scoreb2 = max(case seq when 2 then scoreb end),
    scorec2 = max(case seq when 2 then scorec end),
    scorea3 = max(case seq when 3 then scorea end),
    scoreb3 = max(case seq when 3 then scoreb end),
    scorec3 = max(case seq when 3 then scorec end)
    from #test t
    group by t.studentnumber;

    Note that, if you examine the execution plan, the aforementioned clustered index prevents a sort created by the GROUP BY clause. 

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • ScottPletcher - Tuesday, December 12, 2017 2:50 PM

    Could you provide actual test data, as CREATE TABLE and INSERT statements, rather than just a picture of data.  We can't write code against a picture .🙂

    Yup sorry, it was at the top of the original post but definitely not a great format.  Let me try again:
    if exists (
        select [object_id]
        from sys.tables
        where [object_id] = object_id('dbo.TestData')
    )
    begin
        drop table dbo.TestData
    end

    create table TestData (
        StudentNumber int,
        Seq int,
        Subj nvarchar(50),
        ScoreA int,
        ScoreB int,
        ScoreC int
    )

    insert into TestData (
        StudentNumber,
        Seq,
        Subj,
        ScoreA,
        ScoreB,
        ScoreC
    )
    values
    --student1
    (123, 2, 'subject1', 2, 5, null), (123, 3, 'subject2', 20, 13, null), (123, 1, 'subject3', 23, 48, 39),
    --student2
    (345, 2, 'subject1', 20, 50, null), (345, 3, 'subject2', 200, 130, null), (345, 1, 'subject3', 230, 480, 390)

    Michael L John - Tuesday, December 12, 2017 2:51 PM

    What have you tried?  Can you post that code?

    A few questions.
    Will there always be only 3 scores for a course for a student? In other words, there will be one row per student per course.

    To answer you're question - yes, the number of score columns per subject will be static.  Here's one example of a working solution I have (which feels unnecessarily complicated) - just tack this code underneath the test data bit right above ^:
    select *
    from TestData

    declare @XML xml;
    with distinctcandidates as (
        select distinct StudentNumber from TestData
    )
    select @XML = (
        select dc.StudentNumber as [@StudentID],
             (
                select Subj as [@subjname],
                     scorea,
                     scoreb,
                     scorec
                from TestData
                where StudentNumber = dc.StudentNumber
                order by seq
                for xml path ('Subject'), type
             ) as Subjects
        from distinctcandidates dc
        for xml path('Candidate'), root ('Candidates')
    )

    select @xml

    select XCol.value('../@StudentID', 'int') as studentnumber,
        scorea1 = XCol.value('Subject[1]/scorea[1]','int'),
        scoreb1 = XCol.value('Subject[1]/scoreb[1]','int'),
        scorec1 = XCol.value('Subject[1]/scorec[1]','int'),
        scorea2 = XCol.value('Subject[2]/scorea[1]','int'),
        scoreb2 = XCol.value('Subject[2]/scoreb[1]','int'),
        scorec2 = XCol.value('Subject[2]/scorec[1]','int'),
        scorea3 = XCol.value('Subject[3]/scorea[1]','int'),
        scoreb3 = XCol.value('Subject[3]/scoreb[1]','int'),
        scorec3 = XCol.value('Subject[3]/scorec[1]','int')
    from @XML.nodes('Candidates/Candidate/Subjects') AS XTbl(XCol)

    Thanks for the continued interest

    --MF

  • flownominal16 - Tuesday, December 12, 2017 3:08 PM

    ScottPletcher - Tuesday, December 12, 2017 2:50 PM

    Could you provide actual test data, as CREATE TABLE and INSERT statements, rather than just a picture of data.  We can't write code against a picture .🙂

    Yup sorry, it was at the top of the original post but definitely not a great format.  Let me try again:
    if exists (
        select [object_id]
        from sys.tables
        where [object_id] = object_id('dbo.TestData')
    )
    begin
        drop table dbo.TestData
    end

    create table TestData (
        StudentNumber int,
        Seq int,
        Subj nvarchar(50),
        ScoreA int,
        ScoreB int,
        ScoreC int
    )

    insert into TestData (
        StudentNumber,
        Seq,
        Subj,
        ScoreA,
        ScoreB,
        ScoreC
    )
    values
    --student1
    (123, 2, 'subject1', 2, 5, null), (123, 3, 'subject2', 20, 13, null), (123, 1, 'subject3', 23, 48, 39),
    --student2
    (345, 2, 'subject1', 20, 50, null), (345, 3, 'subject2', 200, 130, null), (345, 1, 'subject3', 230, 480, 390)

    Michael L John - Tuesday, December 12, 2017 2:51 PM

    What have you tried?  Can you post that code?

    A few questions.
    Will there always be only 3 scores for a course for a student? In other words, there will be one row per student per course.

    To answer you're question - yes, the number of score columns per subject will be static.  Here's one example of a working solution I have (which feels unnecessarily complicated) - just tack this code underneath the test data bit right above ^:
    select *
    from TestData

    declare @XML xml;
    with distinctcandidates as (
        select distinct StudentNumber from TestData
    )
    select @XML = (
        select dc.StudentNumber as [@StudentID],
             (
                select Subj as [@subjname],
                     scorea,
                     scoreb,
                     scorec
                from TestData
                where StudentNumber = dc.StudentNumber
                order by seq
                for xml path ('Subject'), type
             ) as Subjects
        from distinctcandidates dc
        for xml path('Candidate'), root ('Candidates')
    )

    select @xml

    select XCol.value('../@StudentID', 'int') as studentnumber,
        scorea1 = XCol.value('Subject[1]/scorea[1]','int'),
        scoreb1 = XCol.value('Subject[1]/scoreb[1]','int'),
        scorec1 = XCol.value('Subject[1]/scorec[1]','int'),
        scorea2 = XCol.value('Subject[2]/scorea[1]','int'),
        scoreb2 = XCol.value('Subject[2]/scoreb[1]','int'),
        scorec2 = XCol.value('Subject[2]/scorec[1]','int'),
        scorea3 = XCol.value('Subject[3]/scorea[1]','int'),
        scoreb3 = XCol.value('Subject[3]/scoreb[1]','int'),
        scorec3 = XCol.value('Subject[3]/scorec[1]','int')
    from @XML.nodes('Candidates/Candidate/Subjects') AS XTbl(XCol)

    Thanks for the continued interest

    --MF

    What I posted above will give you what you are looking for.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Alan.B - Tuesday, December 12, 2017 3:01 PM

    constraint pk_test primary key clustered (studentnumber, seq) -- the index you want (nonclusted is better)

    Note that, if you examine the execution plan, the aforementioned clustered index prevents a sort created by the GROUP BY clause. 

    @alan:  I'm assuming a typo here.  If not, I'm curious why you would create a clustered PK, yet state that nonclustered is better.

  • DesNorton - Tuesday, December 12, 2017 11:09 PM

    Alan.B - Tuesday, December 12, 2017 3:01 PM

    constraint pk_test primary key clustered (studentnumber, seq) -- the index you want (nonclusted is better)

    Note that, if you examine the execution plan, the aforementioned clustered index prevents a sort created by the GROUP BY clause. 

    @alan:  I'm assuming a typo here.  If not, I'm curious why you would create a clustered PK, yet state that nonclustered is better.

    Not a typo, I was trying to keep it simple. A nonclustered index would be better. Unique, covering and/or filtered even better.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Fells like a job for PIVOT to me.
    You say that the number of tests is fixed at three, but not the number of subjects so there could be more than 9 score columns.  If this is the case then you have to create the pivot table SQL Dynamically by reading the distinct list of subjects and concatenating them together (use a CURSOR or STUFF..FOR XML) and then injecting the column list into the PIVOT SQL statement which you then EXECecute.  You will find plenty examples of dynamic pivot.

    NOTE that this is a REALLY BAD idea if you are passing the results set on to another application layer as the dataset structure cannot be determined at compile time.

  • Or do you have Excel as an option:

    output the data as
    StudentID, Subject, Test#, Score

    Use this as a data set for a pivot table
    Student ID goes into rows
    Subject then Test# goes into Columns
    Score goes into Values

    You can fetch the data from the database directly using an embedded query (security permitting) and use VBA to redefine the pivot data and refresh the pivot cache

  • aaron.reese - Wednesday, December 13, 2017 7:50 AM

    Fells like a job for PIVOT to me.
    You say that the number of tests is fixed at three, but not the number of subjects so there could be more than 9 score columns. If this is the case then you have to create the pivot table SQL Dynamically by reading the distinct list of subjects and concatenating them together (use a CURSOR or STUFF..FOR XML) and then injecting the column list into the PIVOT SQL statement which you then EXECecute. You will find plenty examples of dynamic pivot.

    NOTE that this is a REALLY BAD idea if you are passing the results set on to another application layer as the dataset structure cannot be determined at compile time. 

    Or do you have Excel as an option:

    output the data as
    StudentID, Subject, Test#, Score

    Use this as a data set for a pivot table
    Student ID goes into rows
    Subject then Test# goes into Columns
    Score goes into Values

    You can fetch the data from the database directly using an embedded query (security permitting) and use VBA to redefine the pivot data and refresh the pivot cache

    No need for PIVOT, Dynamic SQL, Excel, VBA, Concatenation or any of this. What I posted provides what the OP needs. I even included a link to an article which explains why the cross tab technique I used is arguably better than PIVOT.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • flownominal16 - Tuesday, December 12, 2017 1:12 PM

    Preface:
    I've been working on a particular problem past the threshold of time before I should ask for second opinions.  I've come up with a couple of working solutions however I have that nagging feeling that I might be overlooking an obvious and less complicated alternative.  If interested I'll be happy to share these solutions afterwards but I want to see what other people come up with without being influenced by my methods.  Anyway see below:

    This is my starting dataset:

    declare @test-2 table (studentnumber int, seq int, subj nvarchar(50), scorea int, scoreb int, scorec int)

    --first student
    insert into @test-2
    values (123, 2, 'subject1', 2, 5, null), (123, 3, 'subject2', 20, 13, null), (123, 1, 'subject3', 23, 48, 39)

    --second student
    insert into @test-2
    values (345, 2, 'subject1', 20, 50, null), (345, 3, 'subject2', 200, 130, null), (345, 1, 'subject3', 230, 480, 390)

    this is my desired output:

    each student has its set of subject scores as columns on each row clustered by subject with the column clusters ordered by the seq value in the initial data (i.e. the first 3 score columns - 23, 48, 39  for studentnumber 123 - are the scorea/b/c values for subject3 and are the first score columns listed because subject3 has seq value of 1).

    Thanks in advance for all considerations and suggestions

    Regards,
    MF

    Alan Burstein posted the best solution for this (CROSS TAB), IMHO.  My question would be, will there always be a max of only 3 subjects or could there be more?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply