Query Help Needed

  • I've one scenario:

    create table dbo.T_Master(StudentId int, StudentName varchar(50), StudentAge int)

    go

    create table dbo.T_Detail(StudentId int, ColX varchar(20), ColY varchar(20), Subject varchar(20), Marks int)

    go

    Then I populated the data like this:

    insert into dbo.T_Master select 1001, 'James', 14

    insert into dbo.T_Master select 1002, 'Raghav', 15

    insert into dbo.T_Master select 1003, 'Maxx', 14

    insert into dbo.T_Master select 1004, 'Aaron', 14

    insert into dbo.T_Master select 1005, 'Jamie', 15

    insert into dbo.T_Detail select 1001, 'ColXdata A', 'ColYdata A', 'English', 60

    insert into dbo.T_Detail select 1001, 'ColXdata A', 'ColYdata A', 'Maths', 80

    insert into dbo.T_Detail select 1001, 'ColXdata A', 'ColYdata A', 'Science', 45

    insert into dbo.T_Detail select 1002, 'ColXdata B', 'ColYdata B', 'English', 65

    insert into dbo.T_Detail select 1002, 'ColXdata B', 'ColYdata B', 'Maths', 75

    insert into dbo.T_Detail select 1002, 'ColXdata B', 'ColYdata B', 'Science', 60

    insert into dbo.T_Detail select 1003, 'ColXdata C', 'ColYdata C', 'English', 55

    insert into dbo.T_Detail select 1003, 'ColXdata C', 'ColYdata C', 'Maths', 90

    insert into dbo.T_Detail select 1003, 'ColXdata C', 'ColYdata C', 'Science', 65

    insert into dbo.T_Detail select 1004, 'ColXdata D', 'ColYdata D', 'English', 70

    insert into dbo.T_Detail select 1004, 'ColXdata D', 'ColYdata D', 'Maths', 60

    insert into dbo.T_Detail select 1004, 'ColXdata D', 'ColYdata D', 'Science', 60

    insert into dbo.T_Detail select 1005, 'ColXdata E', 'ColYdata E', 'English', 75

    insert into dbo.T_Detail select 1005, 'ColXdata E', 'ColYdata E', 'Maths', 55

    insert into dbo.T_Detail select 1005, 'ColXdata E', 'ColYdata E', 'Science', 80

    Now I want to have the data by joining above two tables like this:

    A.StudentIdA.StudentNameA.StudentAgeB.ColXB.ColYSubject1Marks1Subject2Marks2Subject3Marks3

    1001James14ColXdata AColYdata AEnglish60Maths80Science45

    I am trying but not getting success. Also please consider that this is a parallel demo scenario created for student scenario. In actual, both the tables contain huge number of records (in millions).

  • Thanks for the ddl and sample data!!! What you are describing is a cross tab or pivot table. I generally use the cross tab approach because I find the syntax easier to read and the performance is more consistent. There are two different types of cross tabs. There is the static cross tab which you use when you know how many sets you have. The other is the dynamic cross tab which you use when you don't know how many sets you have.

    I put together one version of doing this with a static cross tab.

    with SortedData as

    (

    select m.StudentId, m.StudentName, m.StudentAge, d.ColX, d.ColY, d.Subject, d.Marks, ROW_NUMBER() over(partition by m.StudentID order by d.Subject) as RowNum

    from dbo.T_Master m

    join dbo.T_Detail d on m.StudentID = d.StudentId

    )

    select StudentId, StudentName, StudentAge,

    MAX(case when RowNum = 1 then Subject end) as Subject1,

    MAX(case when RowNum = 1 then Marks end) as Marks1,

    MAX(case when RowNum = 2 then Subject end) as Subject2,

    MAX(case when RowNum = 2 then Marks end) as Marks2,

    MAX(case when RowNum = 3 then Subject end) as Subject3,

    MAX(case when RowNum = 3 then Marks end) as Marks3

    from SortedData s

    group by StudentId, StudentName, StudentAge

    If you need the dynamic version I will leave that one up to you to start working on. You can read more about both types of cross tabs by following the links in my signature.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Sean;

    Interesting solution... I had something much like yours getting the same results the only difference is I split out the 3 different subjects into separate CTEs then joining them back much like you did in your query. One thing you forgot (and so did I) were the ColX & ColY columns.

    To solve that it can be done this way:

    with SortedData as

    (

    select m.StudentId, m.StudentName, m.StudentAge, d.ColX, d.ColY, d.Subject, d.Marks, ROW_NUMBER() over(partition by m.StudentID order by d.Subject) as RowNum

    from @T_Master m

    join @T_Detail d on m.StudentID = d.StudentId

    )

    select StudentId, StudentName, StudentAge,

    MAX(case when RowNum = 1 then ColX end) as ColX,

    MAX(case when RowNum = 1 then ColY end) as ColY,

    MAX(case when RowNum = 1 then Subject end) as Subject1,

    MAX(case when RowNum = 1 then Marks end) as Marks1,

    MAX(case when RowNum = 2 then Subject end) as Subject2,

    MAX(case when RowNum = 2 then Marks end) as Marks2,

    MAX(case when RowNum = 3 then Subject end) as Subject3,

    MAX(case when RowNum = 3 then Marks end) as Marks3

    from SortedData s

    group by StudentId, StudentName, StudentAge

    Kurt

    Kurt W. Zimmerman
    SR DBA
    Lefrak Organization
    New York, NY

    http://www.linkedin.com/in/kurtwzimmerman

  • Kurt W. Zimmerman (12/20/2013)


    Sean;

    Interesting solution... I had something much like yours getting the same results the only difference is I split out the 3 different subjects into separate CTEs then joining them back much like you did in your query. One thing you forgot (and so did I) were the ColX & ColY columns.

    To solve that it can be done this way:

    with SortedData as

    (

    select m.StudentId, m.StudentName, m.StudentAge, d.ColX, d.ColY, d.Subject, d.Marks, ROW_NUMBER() over(partition by m.StudentID order by d.Subject) as RowNum

    from @T_Master m

    join @T_Detail d on m.StudentID = d.StudentId

    )

    select StudentId, StudentName, StudentAge,

    MAX(case when RowNum = 1 then ColX end) as ColX,

    MAX(case when RowNum = 1 then ColY end) as ColY,

    MAX(case when RowNum = 1 then Subject end) as Subject1,

    MAX(case when RowNum = 1 then Marks end) as Marks1,

    MAX(case when RowNum = 2 then Subject end) as Subject2,

    MAX(case when RowNum = 2 then Marks end) as Marks2,

    MAX(case when RowNum = 3 then Subject end) as Subject3,

    MAX(case when RowNum = 3 then Marks end) as Marks3

    from SortedData s

    group by StudentId, StudentName, StudentAge

    Kurt

    I included ColX and ColY in my output. I wasn't quite sure what to do with those actually because in the OP stated output they were only listed once. I just decided to use MAX and figure I would let the OP figure out what to do with that one.

    You should post your solution. It may be we can come up with something between us better than either of us could do solo.

    --EDIT--

    I just looked at my post again and I didn't have those two columns in there. I had them in the SortedData cte but not in the final result. 😉

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Here is my solution:

    with details_English_cte (StudentID, Subject, Marks) as (

    select StudentId

    ,Subject

    ,Marks

    from dbo.T_Detail

    where Subject = 'English'

    ),

    details_maths_cte (StudentID, Subject, Marks) as (

    select StudentId

    ,Subject

    ,Marks

    from dbo.T_Detail

    where Subject = 'Maths'

    ),

    details_Science_cte (StudentID, Subject, Marks) as (

    select StudentId

    ,Subject

    ,Marks

    from dbo.T_Detail

    where Subject = 'Science'

    ),

    ColXY_cte (StudentID, Colx, Coly) as (

    select distinct StudentID

    ,ColX

    ,ColY

    from dbo.T_Detail

    )

    select mstr.StudentId

    , mstr.StudentName

    , mstr.StudentAge

    , ColXY.Colx

    , ColXY.Coly

    , English.Subject as Subject1

    , English.Marks as Marks1

    , Maths.Subject as Subject2

    , Maths.Marks as Marks2

    , Science.Subject as Subject3

    , Science.Marks as Marks3

    from dbo.T_Master mstr inner join

    ColXY_cte ColXY on mstr.StudentID = ColXY.StudentID inner join

    details_English_cte English on mstr.StudentId = English.StudentID inner join

    details_maths_cte Maths on mstr.StudentId = Maths.StudentID inner join

    details_Science_cte Science on mstr.StudentId = Science.StudentID;

    Kurt W. Zimmerman
    SR DBA
    Lefrak Organization
    New York, NY

    http://www.linkedin.com/in/kurtwzimmerman

  • Thanks Sean/Kurt for the wonderful query.

    One update to the requirement and one query.

    Modification is that there might be some students who have only one or two or all of the three subjects (No of subjkects may vary). In that case, will it work too ? ColX & ColY will have same values so no issues with that.

    Query is performance wise. I've created this scenario only as example. In reality it will be required to fetch data from the join where both the tables has around 10-20 billion records. In that case, will we need to tune this query ? Though i'll check at my end as well.

  • sqlnaive (12/20/2013)


    Thanks Sean/Kurt for the wonderful query.

    One update to the requirement and one query.

    Modification is that there might be some students who have only one or two or all of the three subjects (No of subjkects may vary). In that case, will it work too ? ColX & ColY will have same values so no issues with that.

    Query is performance wise. I've created this scenario only as example. In reality it will be required to fetch data from the join where both the tables has around 10-20 billion records. In that case, will we need to tune this query ? Though i'll check at my end as well.

    The one I posted will handle any number of t_Detail values and there is no requirement to have 3.

    Try this.

    insert T_Master

    select 1006, 'Sean', 45

    insert T_Detail

    select 1006, 'ColX', 'ColY', 'Writing', 33

    Then rerun the query I posted. It will just return NULL for the second and third groups.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Sean Lange (12/20/2013)


    sqlnaive (12/20/2013)


    Thanks Sean/Kurt for the wonderful query.

    One update to the requirement and one query.

    Modification is that there might be some students who have only one or two or all of the three subjects (No of subjkects may vary). In that case, will it work too ? ColX & ColY will have same values so no issues with that.

    Query is performance wise. I've created this scenario only as example. In reality it will be required to fetch data from the join where both the tables has around 10-20 billion records. In that case, will we need to tune this query ? Though i'll check at my end as well.

    The one I posted will handle any number of t_Detail values and there is no requirement to have 3.

    Try this.

    insert T_Master

    select 1006, 'Sean', 45

    insert T_Detail

    select 1006, 'ColX', 'ColY', 'Writing', 33

    Then rerun the query I posted. It will just return NULL for the second and third groups.

    It is important to understand the scope of your data so that you can determine what solution will be the one that works. If you know there will be a finite number of subjects then either solution will work. If the subjects vary then developing a more robust solution, such as Sean has pointed out is the direction you need to go.

    These are the kinds of nuances you will learn to ask for as you are given requirements to develop solutions. Often times you may develop a solution thinking it is the correct solution only to find out that someone failed to mention something that may be subtle to them but change the entire design of your solution. Happens all the time.

    More experienced developers know the right questions to ask to get as much if not all of the requirements to develop a solution.

    As Sean and I have pointed out there are multiple ways to solve a single solution. You will see that both solutions gives you the correct answer and yet are developed in a way that from a developers standpoint is somewhat different.

    Once you have a solution it is important to insure the solution will work in accordance to the requirements. The solution will need to be tested to insure the end result is what is to be expected. Quite often a solution will have to be reworked because the end result is not what is expected. That is the cycle of development & unit testing and quality control which is commonplace in any development shop.

    All the best and good luck.

    Kurt

    Kurt W. Zimmerman
    SR DBA
    Lefrak Organization
    New York, NY

    http://www.linkedin.com/in/kurtwzimmerman

  • The solution provided looks good. However in my actual scenario, there are around 15-20 columns in master table (all have to be included) and around 30 columns in details (of which 15 have to be included). Both the tables contain around 10-20 million records. I'm doing a normal join like this:

    insert into table3

    (col1, col2....coln)

    select a.StudentId, a.col1, a.col2... a.col15,

    b.col1 as col16, b.col2 as col17.... b.col15 as col30,

    b.subject as Subject1, b.marks as sub1_marks,

    c.subject as Subject2, c.marks as sub2_marks,

    d.subject as Subject3, d.marks as sub3_marks

    from dbo.t_master a

    left outer join dbo.t_detail b on a.StudentId = b.StudentId and b.Subject = 'English'

    left outer join dbo.t_detail c on a.StudentId = c.StudentId and c.Subject = 'Maths'

    left outer join dbo.t_detail d on a.StudentId = d.StudentId and d.Subject = 'Science'

    Is it possible to do it in efficient way ? With the number of records i have in both master and details table, it's taking too much time and the query you provided is taking around 15-20 mins more than this query. Any help would be appreciated. We have clustered indexes on Studentid on both tables.

  • sqlnaive (12/23/2013)


    The solution provided looks good. However in my actual scenario, there are around 15-20 columns in master table (all have to be included) and around 30 columns in details (of which 15 have to be included). Both the tables contain around 10-20 million records. I'm doing a normal join like this:

    insert into table3

    (col1, col2....coln)

    select a.StudentId, a.col1, a.col2... a.col15,

    b.col1 as col16, b.col2 as col17.... b.col15 as col30,

    b.subject as Subject1, b.marks as sub1_marks,

    c.subject as Subject2, c.marks as sub2_marks,

    d.subject as Subject3, d.marks as sub3_marks

    from dbo.t_master a

    left outer join dbo.t_detail b on a.StudentId = b.StudentId and b.Subject = 'English'

    left outer join dbo.t_detail c on a.StudentId = c.StudentId and c.Subject = 'Maths'

    left outer join dbo.t_detail d on a.StudentId = d.StudentId and d.Subject = 'Science'

    Is it possible to do it in efficient way ? With the number of records i have in both master and details table, it's taking too much time and the query you provided is taking around 15-20 mins more than this query. Any help would be appreciated. We have clustered indexes on Studentid on both tables.

    Did you try the approach I posted? With that amount of data it is more likely to perform better because you don't have so many table scans. Read the two articles I suggested about cross tabs in my signature.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • SELECT T1.StudentID,StudentName, StudentAge,ColX,Coly,

    MAX(CASE WHEN Subject = 'English' THEN 'English' ELSE '' END) AS Subject1,

    MAX(CASE WHEN Subject = 'English' THEN Marks ELSE 0 END) AS Marks1,

    MAX(CASE WHEN Subject = 'English' THEN 'Maths' ELSE '' END) AS Subject2,

    MAX(CASE WHEN Subject = 'Maths' THEN Marks ELSE 0 END) AS Marks2,

    MAX(CASE WHEN Subject = 'English' THEN 'Science' ELSE '' END) AS Subject3,

    MAX(CASE WHEN Subject = 'Science' THEN Marks ELSE 0 END) AS Marks3

    FROM dbo.T_Master T1

    INNER JOIN dbo.T_Detail T2

    ON T1.StudentID = T2.StudentID

    GROUP BY T1.StudentID,StudentName, StudentAge,ColX,Coly

    Regards,
    Mitesh OSwal
    +918698619998

  • Hi Sean/Kurt, Highly appreciate your help on this. It was a good query. However some conditions changed and there was some extra requirements and I lost focus in between so created new thread instead of going on with this one.

    The other thread is here:

    http://www.sqlservercentral.com/Forums/Topic1528153-392-2.aspx?Update=1

    Kindly provide your valuable feedback there.

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply