T-SQL Challenge #1

What are TSQL Challenges?

TSQL Challenges aim at helping people enhance their SET based query writing skills by presenting real life TSQL challenges. TSQL Challenges encourages people to think differently and come up with better solutions than what they are used do. It also provides a platform to see the solutions of other experts around and learn from their techniques.

TSQL Challenges is managed by ‘TSQL Challenge Committee’, a group of SQL Server Experts from around the world. Many of them are SQL Server MVPs, some are SQL Server consultants and others are developers working with SQL Server for a number of years.

What is this new series of articles all about?

This is the first of a series of articles that the TSQL Challenge Team brings to the SQL Server community. In this series, we will look at a number of TSQL Challenges and review the different ways of solving the given challenge.

TSQL Challenge #1

TSQL Challenge #1 presents a simple but interesting problem. TSQL Challenges are all about solving a given TSQL problem using a SET based query. Most of the times, the task is to format a set of data from one shape to another using a single SET operation. The Challenge in TSQL Challenge #1 is to write a query that takes data from three tables into a given shape. Here are the three tables.

Table A

code        aname
----------- ----------
1           Cat
2           Dog
3           Bird

Table B

code        bname
----------- ----------
1           aaa
1           bbb
2           ccc
2           ddd

Table C

code        cname
----------- ----------
1           xxx
1           yyy
1           zzz
2           www

The task is to write a query that produces the following output from the above tables. The query should run on SQL Server 2000 as well as on SQL Server 2005.

Output

code        aname      bname      cname
----------- ---------- ---------- ----------
1           Cat        aaa        xxx
1           Cat        bbb        yyy
1           Cat        NULL       zzz
2           Dog        ccc        www
2           Dog        ddd        NULL
3           Bird       NULL       NULL

Sample Data

DECLARE @a TABLE (code INT, aname VARCHAR(10))
INSERT INTO @a(code, aname) SELECT 1,'Cat'
INSERT INTO @a(code, aname) SELECT 2,'Dog'
INSERT INTO @a(code, aname) SELECT 3,'Bird'
DECLARE @b TABLE (code INT, bname VARCHAR(10))
INSERT INTO @b(code, bname) SELECT 1,'aaa'
INSERT INTO @b(code, bname) SELECT 1,'bbb'
INSERT INTO @b(code, bname) SELECT 2,'ccc'
INSERT INTO @b(code, bname) SELECT 2,'ddd'
DECLARE @c TABLE (code INT, cname VARCHAR(10))
INSERT INTO @c(code, cname) SELECT 1,'xxx'
INSERT INTO @c(code, cname) SELECT 1,'yyy'
INSERT INTO @c(code, cname) SELECT 1,'zzz'
INSERT INTO @c(code, cname) SELECT 2,'www'

The Challenge

Is there really a challenge? Well, there is. At first glance one might think that a simple query joining the three tables will do the trick. However, that is not true. If you join the three tables it will produce a number of duplicate rows because the ‘code’ column in Table B and Table C are not unique.

A join between @b and @c will produce duplicate rows. The "code" column in @b contains 2 records having value "1". Similarly the "code" column in @c contains 3 records having value "1". If you try to join them with a FULL JOIN you will receive 12 records. But our query should produce only 3 rows. Similarly, our query should produce only 2 records for rows having "2" in the column "code". The total number of rows we are looking for is 5 where as a FULL JOIN between @b and @c currently produces 16 records. See this example:

SELECT * FROM @b b
FULL JOIN @c c ON b.code = b.code
/*
code        bname      code        cname
----------- ---------- ----------- ----------
1           aaa        1           xxx
1           bbb        1           xxx
2           ccc        1           xxx
2           ddd        1           xxx
1           aaa        1           yyy
1           bbb        1           yyy
2           ccc        1           yyy
2           ddd        1           yyy
1           aaa        1           zzz
1           bbb        1           zzz
2           ccc        1           zzz
2           ddd        1           zzz
1           aaa        2           www
1           bbb        2           www
2           ccc        2           www
2           ddd        2           www
*/

Now it looks like a challenge! isn’t it?

Solving the Challenge

How do we remove those duplicate rows? Well, the join that we apply should do the following. It should take the first row in @b with value "1" and join it with the first row in @c having "1". Then it should take the second row in @b having "1" and join it with the second row in @c having value "1". It should then take the third row in @c having value "1". This does not have a matching row in @b, so it will return the information from @c and will return NULL from @b. If we can apply such a join, we can receive back exactly 3 records.

To join the rows in the manner described above, we need to generate a sequence number for each row. The sequence number should reset for each distinct value in the "code" column. So the trick here is to generate this sequence number.

SQL Server 2005 introduced ROW_NUMBER() that can be used to generate a sequence number. Here is an example:

SELECT
    bname,
    code,
    ROW_NUMBER() OVER(PARTITION BY code ORDER BY code) AS row
FROM @b 
/*
bname      code        row
---------- ----------- --------------------
aaa        1           1
bbb        1           2
ccc        2           1
ddd        2           2
*/

It is little trickier in SQL Server 2000. In SQL Server 2000, this can be achieved by doing a self join and counting the number of records smaller than equal to the current row. This trick is demonstrated in the below example.

SELECT
    b1.bname,
    b1.code,
    COUNT(*) AS row
FROM @b b1
INNER JOIN @b b2 ON 
    b1.code = b2.code
    AND b2.bname <= b1.bname
GROUP BY 
    b1.bname,
    b1.code
/*
bname      code        row
---------- ----------- -----------
aaa        1           1
bbb        1           2
ccc        2           1
ddd        2           2
*/

This trick is going to help us solve the problem. Using this trick, we can join @b and @c on Code + row and it will make each row unique and will eliminate the duplicate values from the results. A join of @b and @c using this method will produce the following result.

SELECT * FROM (
    SELECT
        b1.bname,
        b1.code,
        COUNT(*) AS row
    FROM @b b1
    INNER JOIN @b b2 ON 
        b1.code = b2.code
        AND b2.bname <= b1.bname
    GROUP BY 
        b1.bname,
        b1.code
) b FULL JOIN (    
    SELECT
        c1.cname,
        c1.code,
        COUNT(*) AS row
    FROM @c c1
    INNER JOIN @c c2 ON 
        c1.code = c2.code
        AND c2.cname <= c1.cname
    GROUP BY 
        c1.cname,
        c1.code
) c ON b.Code = c.Code AND b.row = c.row
/*
bname      code        row         cname      code        row
---------- ----------- ----------- ---------- ----------- -----------
ccc        2           1           www        2           1
aaa        1           1           xxx        1           1
bbb        1           2           yyy        1           2
NULL       NULL        NULL        zzz        1           3
ddd        2           2           NULL       NULL        NULL
*/

This leads us to the final solution. The final result can be achieved by simply joining this result with @a.

SELECT
    a.*,
    b.bname,
    c.cname
FROM @a a
LEFT JOIN (
    (
        SELECT
            b1.bname,
            b1.code,
            COUNT(*) AS row
        FROM @b b1
        JOIN @b b2 ON 
            b1.code = b2.code
            AND b2.bname <= b1.bname
        GROUP BY 
            b1.bname,
            b1.code
    ) b
    FULL JOIN
    (
        SELECT
            c1.cname,
            c1.code,
            COUNT(*) AS row
        FROM @c c1
        JOIN @c c2 ON 
            c1.code = c2.code
            AND c2.cname <= c1.cname
        GROUP BY 
            c1.cname,
            c1.code
    ) c ON 
        b.code = c.code
        AND b.row = c.row
) ON a.code IN (b.code, c.code)               
/*
code        aname      bname      cname
----------- ---------- ---------- ----------
1           Cat        aaa        xxx
1           Cat        bbb        yyy
1           Cat        NULL       zzz
2           Dog        ccc        www
2           Dog        ddd        NULL
3           Bird       NULL       NULL
*/

Here is the SQL Server 2005 version of the code, which is much simpler.

SELECT
    a.*,
    b.bname,
    c.cname
FROM @a a
LEFT JOIN (
    (
        SELECT
            bname,
            code,
            ROW_NUMBER() OVER(PARTITION BY code order by code) AS row
        FROM @b
    ) b
    FULL JOIN
    (
        SELECT
            cname,
            code,
            ROW_NUMBER() OVER(PARTITION BY code order by code) AS row
        FROM @c
    ) c ON 
        b.code = c.code
        AND b.row = c.row
) ON a.code IN (b.code, c.code)               
/*
code        aname      bname      cname
----------- ---------- ---------- ----------
1           Cat        aaa        xxx
1           Cat        bbb        yyy
1           Cat        NULL       zzz
2           Dog        ccc        www
2           Dog        ddd        NULL
3           Bird       NULL       NULL
*/

Winning Solutions

We just saw a quick tutorial that shows how to solve TSQL Challenge #1. The solution I presented above is certainly one of the simplest, but not one of the best in terms of performance. I choose a simple solution for the purpose of this tutorial. Now it is time to see a few elegant solutions from some of the TOP SQL Server guys in the SQL Server community.

Solution by Adam Machanic

Adam Machanic is a Boston-based independent database consultant, writer, and speaker.He is a Microsoft Most Valuable Professional (MVP) for SQL Server, Microsoft Certified IT Professional (MCITP), and a member of the INETA North American Speakers.

SELECT
    a.*,
    b.bname,
    c.cname
FROM @a a
LEFT OUTER JOIN
(
    (
        SELECT
            b1.bname,
            b1.code,
            count(*) AS row
        FROM @b b1
        JOIN @b b2 ON 
            b1.code = b2.code
            AND b2.bname <= b1.bname
        GROUP BY 
            b1.bname,
            b1.code
    ) b
    FULL OUTER JOIN
    (
        SELECT
            c1.cname,
            c1.code,
            COUNT(*) AS row
        FROM @c c1
        JOIN @c c2 ON 
            c1.code = c2.code
            AND c2.cname <= c1.cname
        GROUP BY 
            c1.cname,
            c1.code
    ) c ON 
        b.code = c.code
        AND b.row = c.row
) ON
    a.code IN (b.code, c.code)

Solution by Ashish Gilhotra

Ashish Gilhotra a Web Developer from Jaipur, Rajasthan. He is an MCAD in .net with Sql Server.

SELECT T1.code, T1.aname, D.bname, D.cname
FROM @a T1
LEFT JOIN
(
    SELECT 
        COALESCE(D2.code, D3.code) AS code
        ,D2.bname
        ,D3.cname
    FROM
    (
        SELECT 
            e.bname,
            e.code, 
            (
                SELECT COUNT(*) 
                FROM @b e2 
                WHERE e2.bname <= e.bname
            ) AS rownumber
        FROM @b e
    ) D2
        FULL JOIN
        (
            SELECT 
                e.cname,
                e.code, 
                replace(
                    (
                        SELECT COUNT(*) 
                        FROM @c e2 
                        WHERE e2.cname <= e.cname
                    )-1,0,4
                ) AS rownumber
            FROM @c e    
        ) D3
            ON D2.code = D3.code
                AND D2.rownumber = D3.rownumber
) D
    ON T1.code = D.code

Solution by Leonid Koyfman

Leonid Koyfman is a Senior Developer with Razorfish ( San Francisco). He is in software development over 10 years. His focus is data visualization and delivering reporting solutions from various data sources, but spending most of the time with SQL Server.

SELECT
    a.code,
    aname,
    bname,
    cname
FROM @a a 
LEFT JOIN (       
    SELECT 
        COALESCE(t1.code,t2.code) AS code,
        bname,
        cname
    FROM(       
        SELECT
            code,
            bname,
            (
                SELECT 
                    count(9) 
                FROM @b b2 
                WHERE b1.code=b2.code 
                    and b1.bname>=b2.bname
            ) AS ord
        FROM @b b1
    )t1 
    FULL OUTER JOIN(      
        SELECT
            code,
            cname,
            (
                SELECT 
                    count(9)
                FROM @c c2 
                WHERE c1.code=c2.code 
                    and c1.cname>=c2.cname
            ) AS ord
        FROM @c c1
    )t2 on t1.code=t2.code and t1.ord=t2.ord
)t ON a.code=t.code
ORDER BY a.code

Download solutions

All the solutions listed above are available for download at codeplex.

Inspired? Write a better solution or take the next challenge!

If you have got a better idea and come up with a different solution, please let me know. I invite you to share your ideas in the discussion thread below. If your solution is as elegant or better than the solutions already listed above, we can publish them in the codeplex project.

If TSQL Challenges interests you, take the next challenge or try to play with one of the existing challenges.

About the Author

Jacob Sebastian is a SQL Server MVP and is working with SQL Server for over 11 years. He is a Moderator of MSDN / Technet Forums, Regional Mentor for PASS Asia and a regular columnist at SQLServerCentral. He is the president of Ahmedabad SQL Server User Group and a regular speaker in the UG events. Jacob is the author of The Art of XSD - SQL Server XML SChema Collections, a contributing author in "SQL Server 2008 Bible". He blogs at http://beyondrelational.com/blogs/jacob/.

Rate

Share

Categories

Share

Rate

T-SQL Challenge #1

Rate

Share

Categories

Share

Rate

Related content

A Normalization Primer

Reaching the Outer Limits

Working with email addresses in SQL Server!

Querying Responses

Storing Checkbox Responses as Integers