UNION Question

  • I can't believe I'm having to ask this, but I am noticing some incredibly odd behavior with a UNION that I'm using and I'm hoping someone can help me understand why.

    I have a table with about 420,000 records in it--and it only has one datetime field. Let's call it calendar.

    SELECT * FROM calendar --420,000 records


    SELECT *
    FROM calendar
    UNION
    SELECT *
    FROM calendar

    --939 records

    How is that even possible? I must be missing something.

    Thanks,
    Mike

    Mike Scalise, PMP
    https://www.michaelscalise.com

  • Mike Scalise - Wednesday, July 19, 2017 12:07 PM

    I can't believe I'm having to ask this, but I am noticing some incredibly odd behavior with a UNION that I'm using and I'm hoping someone can help me understand why.

    I have a table with about 420,000 records in it--and it only has one datetime field. Let's call it calendar.

    SELECT * FROM calendar --420,000 records


    SELECT *
    FROM calendar
    UNION
    SELECT *
    FROM calendar

    --939 records

    How is that even possible? I must be missing something.

    Thanks,
    Mike

    Union eliminates dupes, basically like doing a select distinct, and union all does not. Maybe try union all and see what you get.

    Sue

  • Sue_H - Wednesday, July 19, 2017 12:24 PM

    Mike Scalise - Wednesday, July 19, 2017 12:07 PM

    I can't believe I'm having to ask this, but I am noticing some incredibly odd behavior with a UNION that I'm using and I'm hoping someone can help me understand why.

    I have a table with about 420,000 records in it--and it only has one datetime field. Let's call it calendar.

    SELECT * FROM calendar --420,000 records


    SELECT *
    FROM calendar
    UNION
    SELECT *
    FROM calendar

    --939 records

    How is that even possible? I must be missing something.

    Thanks,
    Mike

    Union eliminates dupes, basically like doing a select distinct, and union all does not. Maybe try union all and see what you get.

    Sue

    Sue,

    Thanks for the response. I know--I've been a database developer for nearly 10 years and have used UNION (and UNION ALL) on many occasions, but I've never seen it return results like that.

    When I use UNION ALL, I get 840,000 records, as expected.

    Any other ideas?

    Thanks,

    Mike

    Mike Scalise, PMP
    https://www.michaelscalise.com

  • based on what I believe you have said so far in this post...is the following a representative example? 

    SELECT TOP 420000
    TransDate = Dateadd(dd, Abs(Checksum(Newid()))%Datediff(dd, '2015', '2017'), '2015')
    INTO   #TransData
    FROM   sys.all_columns ac1
    CROSS JOIN sys.all_columns ac2

    SELECT * FROM #Transdata
    UNION
    SELECT * FROM #Transdata

    SELECT * FROM #Transdata
    UNION ALL
    SELECT * FROM #Transdata

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • J Livingston SQL - Wednesday, July 19, 2017 12:55 PM

    based on what I believe you have said so far in this post...is the following a representative example? 

    SELECT TOP 420000
    TransDate = Dateadd(dd, Abs(Checksum(Newid()))%Datediff(dd, '2015', '2017'), '2015')
    INTO   #TransData
    FROM   sys.all_columns ac1
    CROSS JOIN sys.all_columns ac2

    SELECT * FROM #Transdata
    UNION
    SELECT * FROM #Transdata

    SELECT * FROM #Transdata
    UNION ALL
    SELECT * FROM #Transdata

    Yes that is actually a perfect depiction of what's going on. In the case of the UNION in your example, I get 731 rows back. Do you have an explanation for why it's happening?

    Thanks in advance,

    Mike

    Mike Scalise, PMP
    https://www.michaelscalise.com

  • Mike Scalise - Wednesday, July 19, 2017 1:00 PM

    J Livingston SQL - Wednesday, July 19, 2017 12:55 PM

    based on what I believe you have said so far in this post...is the following a representative example? 

    SELECT TOP 420000
    TransDate = Dateadd(dd, Abs(Checksum(Newid()))%Datediff(dd, '2015', '2017'), '2015')
    INTO   #TransData
    FROM   sys.all_columns ac1
    CROSS JOIN sys.all_columns ac2

    SELECT * FROM #Transdata
    UNION
    SELECT * FROM #Transdata

    SELECT * FROM #Transdata
    UNION ALL
    SELECT * FROM #Transdata

    Yes that is actually a perfect depiction of what's going on. In the case of the UNION in your example, I get 731 rows back. Do you have an explanation for why it's happening?

    Thanks in advance,

    Mike

    Yep...as Sue said earlier "Union eliminates dupes, basically like doing a select distinct, and union all does not"

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • J Livingston SQL - Wednesday, July 19, 2017 1:05 PM

    Mike Scalise - Wednesday, July 19, 2017 1:00 PM

    J Livingston SQL - Wednesday, July 19, 2017 12:55 PM

    based on what I believe you have said so far in this post...is the following a representative example? 

    SELECT TOP 420000
    TransDate = Dateadd(dd, Abs(Checksum(Newid()))%Datediff(dd, '2015', '2017'), '2015')
    INTO   #TransData
    FROM   sys.all_columns ac1
    CROSS JOIN sys.all_columns ac2

    SELECT * FROM #Transdata
    UNION
    SELECT * FROM #Transdata

    SELECT * FROM #Transdata
    UNION ALL
    SELECT * FROM #Transdata

    Yes that is actually a perfect depiction of what's going on. In the case of the UNION in your example, I get 731 rows back. Do you have an explanation for why it's happening?

    Thanks in advance,

    Mike

    Yep...as Sue said earlier "Union eliminates dupes, basically like doing a select distinct, and union all does not"

    To go further UNION throws a distinct around the entire statement.  Since you're unioning the same table you are effectively just doing this, SELECT DISTINCT * FROM #Transdata

  • I have to ask why you are performing this query

    SELECT *
    FROM calendar
    UNION
    SELECT *
    FROM calendar

    are you by chance trying to obfuscate an actual real world query?

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • I can't believe I missed that. I'm just not operating on all cylinders today.

    I hadn't done a distinct or group by on the table(s) before using the union so it didn't dawn on me that I had duplicates in each individual table.

    I know I used two of the exact same table but originally I was comparing results from two different ones and was just testing a theory by using the same one twice.

    Thanks again both of you for pointing that out.

    Mike Scalise, PMP
    https://www.michaelscalise.com

  • Mike Scalise - Wednesday, July 19, 2017 1:35 PM

    I can't believe I missed that. I'm just not operating on all cylinders today. I hadn't done a distinct or group by on the table(s) before using the union so it didn't dawn on me that I had duplicates in each individual table. I know I used two of the exact same table but originally I was comparing results from two different ones and was just testing a theory by using the same one twice. Thanks again both of you for pointing that out.

    Mike...we all have days like this !
    Glad to have been some help....

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • Might be a good time to add a useful primary key to your table. 🙂


    "If I had been drinking out of that toilet, I might have been killed." -Ace Ventura

  • I could see this becoming a QOTD 🙂

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply