Finding duplicated values in multiple columns (Not the one we all know of)

  • This isn't specifically related to SQL Server 2017 but more a general T-SQL question. I couldn't find the right place for this question.

    You may though this is another "duplicated" question that can be solved with Group By and Count > 1 but it's not.

    I have been googling for days and pulling my hair for past a couple of days. I hope I can get some guidance from you guys.

    Let's say there are 5 columns in a table. (ID, V1 to V4)

    I would like to find rows with duplicated values with any other rows in any of the columns (V1 to 4).

    Here's a temporary table I would create for this.

    Declare @TheTable table (ID int, V1 varchar(1), V2 varchar(2), V3 varchar(1), V4 varchar(1))
    Insert into @TheTable values (1, 'A', NULL, NULL, NULL)
    Insert into @TheTable values (2, 'B', NULL, 'A', NULL)
    Insert into @TheTable values (3, 'C', 'B', NULL, NULL)
    Insert into @TheTable values (4, 'D', NULL, NULL, 'F')
    Insert into @TheTable values (5, NULL, NULL, NULL, NULL)
    Insert into @TheTable values (6, 'E', NULL, 'B', NULL)

    /* Expected Results (All the rows containing dupcliated values in any of the columns)
    IDV1V2V3V4
    1ANULLNULLNULL
    2BNULLANULL
    3CBNULLNULL
    6ENULLBNULL

    Only row 4, 5 would be valid rows.
    */

    Any help would be very appreciated!! Thanks 🙂

  • Why do you have 4 columns that, in theory, contain potential duplicates and lots of nulls? Either I'm missing something, or this is a nonsensical question.

  • This is one way, not sure if it is the best way:

    Declare @TheTable table (ID int, V1 varchar(1), V2 varchar(2), V3 varchar(1), V4 varchar(1)); -- End all your statements with a semicolon
    Insert into @TheTable values (1, 'A', NULL, NULL, NULL);
    Insert into @TheTable values (2, 'B', NULL, 'A', NULL);
    Insert into @TheTable values (3, 'C', 'B', NULL, NULL);
    Insert into @TheTable values (4, 'D', NULL, NULL, 'F');
    Insert into @TheTable values (5, NULL, NULL, NULL, NULL);
    Insert into @TheTable values (6, 'E', NULL, 'B', NULL);
    /* Expected Results (All the rows containing dupcliated values in any of the columns)
    IDV1V2V3V4
    1ANULLNULLNULL
    2BNULLANULL
    3CBNULLNULL
    6ENULLBNULL
    Only row 4, 5 would be valid rows.
    */

    with BaseData as (
    select
    tt.ID
    , oa1.DataCol
    , cnt = case when DataCol is null then 1 else count(*) over (partition by DataCol)end
    from
    @TheTable as tt
    outer apply (select DataCol from (values (tt.V1),(tt.V2),(tt.V3),(tt.V4))dt(DataCol))oa1(DataCol)
    )
    select
    tt.*
    from
    @TheTable as tt
    where
    exists(select 1 from Basedata as bd where tt.ID = bd.ID group by bd.ID having max(bd.cnt) > 1);
  • Declare @TheTable table (ID int, V1 varchar(1), V2 varchar(2), V3 varchar(1), V4 varchar(1))
    Insert into @TheTable values (1, 'A', NULL, NULL, NULL)
    Insert into @TheTable values (2, 'B', NULL, 'A', NULL)
    Insert into @TheTable values (3, 'C', 'B', NULL, NULL)
    Insert into @TheTable values (4, 'D', NULL, NULL, 'F')
    Insert into @TheTable values (5, NULL, NULL, NULL, NULL)
    Insert into @TheTable values (6, 'E', NULL, 'B', NULL)

    ;with cte as
    (
    select *
    from @TheTable t
    cross apply(values (t.V1),(t.V2),(t.V3),(t.V4)) u(col)
    )
    select DISTINCT a.ID, a.V1, a.V2, a.V3, a.V4
    from cte a
    where exists(select *
    from cte b
    where b.ID <> a.ID
    and b.col = a.col)
  • pietlinden wrote:

    Why do you have 4 columns that, in theory, contain potential duplicates and lots of nulls? Either I'm missing something, or this is a nonsensical question.

    I understand this might be kind of unusual. However, there could a case where a user can put any values into 4 extra custom columns whatever they want to put. And at some point, they want to clean up duplicated values or pull some sort of report based on the duplicated values among those extra columns.

    Thanks!

  • Thanks! Lynn and Jonathan.

  • Lynn-

    I'm really fascinated by your solution. I don't think I've come across that concept of outer joining to the FROM table using the VALUES constructor. Is that functionally similar to doing an UNPIVOT in this case?


    "If I had been drinking out of that toilet, I might have been killed." -Ace Ventura

  • autoexcrement wrote:

    Lynn- I'm really fascinated by your solution. I don't think I've come across that concept of outer joining to the FROM table using the VALUES constructor. Is that functionally similar to doing an UNPIVOT in this case?

    Thinking about it, I think it is.  It is making multiple rows from multiple columns.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply