Compare comma separated values

  • Hi,
       let's say I have this:
    DROP TABLE IF EXISTS #CSV
    CREATE TABLE #CSV
    (ID INT NOT NULL,
    LIST VARCHAR(30) NOT NULL
    )

    INSERT INTO #CSV
    (
     ID,
     LIST
    )
    VALUES
    ( 1,
      '2,3,5' -- LIST - varchar(30)
     ),
     (2,
     '3,5,2')
        As you can see, in the column list I have 5,3,2 and 3,5,2. For program design, those columns are the same. Question: how can I determine that? I have the idea that I should order comma separated values and then compare but I don't know how 🙂
         Thanks for any help.

    Mauricio

  • Hi Mauricio,

    How about the approach below:

    DROP TABLE IF EXISTS #CSV
    CREATE TABLE #CSV
    (ID INT NOT NULL,
    LIST VARCHAR(30) NOT NULL
    )

    INSERT INTO #CSV
    (
    ID,
    LIST
    )
    VALUES
    ( 1,
    '2,3,6' -- LIST - varchar(30)
    ),
    (2,
    '3,5,2')

    DECLARE @FirstValue VARCHAR(MAX)
    DECLARE @SecondValue VARCHAR(MAX)
    DECLARE @DiffTable TABLE(DifferenceValue VARCHAR(MAX))

    SELECT @FirstValue = LIST FROM #CSV WHERE ID = 1
    SELECT @SecondValue = LIST FROM #CSV WHERE ID = 2

    INSERT @DiffTable(DifferenceValue)
    SELECT * FROM string_split(@FirstValue, ',') EXCEPT
    SELECT * FROM string_split(@SecondValue, ',')

    IF EXISTS(SELECT NULL FROM @DiffTable)
        PRINT 'Different'
    ELSE
        PRINT 'Same'

  • Or, somewhat more scalably:
    SELECT
         c.ID
    ,    c.LIST
    ,    r.Reconcat
    FROM #CSV c
    CROSS APPLY (
        SELECT ',' + s.value
        FROM #CSV c
        CROSS APPLY STRING_SPLIT(c.LIST,',') s
        ORDER BY s.value
        FOR XML PATH('')
        ) r(Reconcat)

    String splitters other than STRING_SPLIT are available and may perform better and offer more flexibility.  You may want to investigate if you're going to run this on large data sets.

    John

  • I like your solution, John, mainly because it's more generic. However, I was expecting to have an ordered list like 2,3,5 in the reconcat column instead of ,2,2,3,3,5,5. I'm trying to figure it out.
    Thanks!

  • Yes, my mistake.  You were getting a concatenated list of every value in the table rather than every value in a particular string!  Try this:

    SELECT
         c1.ID
    ,    c1.LIST
    ,    r.Reconcat
    FROM #CSV c1
    CROSS APPLY (
        SELECT ',' + s.value
        FROM #CSV c
        CROSS APPLY STRING_SPLIT(c.LIST,',') s
        WHERE c.ID = c1.ID
        ORDER BY s.value
        FOR XML PATH('')
        ) r(Reconcat)

    John

  • Much better, you're improving 😉
    Thanks, John!!!!

  • Mauricio_ - Thursday, June 7, 2018 3:49 AM

    Hi,
       let's say I have this:
    DROP TABLE IF EXISTS #CSV
    CREATE TABLE #CSV
    (ID INT NOT NULL,
    LIST VARCHAR(30) NOT NULL
    )

    INSERT INTO #CSV
    (
     ID,
     LIST
    )
    VALUES
    ( 1,
      '2,3,5' -- LIST - varchar(30)
     ),
     (2,
     '3,5,2')
        As you can see, in the column list I have 5,3,2 and 3,5,2. For program design, those columns are the same. Question: how can I determine that? I have the idea that I should order comma separated values and then compare but I don't know how 🙂
         Thanks for any help.

    Mauricio

    Congratulations! You've just discovered why Dr. Codd only allowed scalar values in the relational model. There are even more reasons, that gets into a little bit of math and set theory. There is no way to agree upon equality or comparisons (or Theta operations to use the math terms) complex structures, such as lists.

    Please post DDL and follow ANSI/ISO standards when asking for help. 

  • Hi Joe,
        the values in the column, those comma separated, are as a result of a recursive CTE. Maybe  the original CTE is wrong and I've shouldn't get those results but that's what I have and what I needed to sort out.

  • Mauricio_ - Saturday, June 9, 2018 8:21 AM

    Hi Joe,
        the values in the column, those comma separated, are as a result of a recursive CTE. Maybe  the original CTE is wrong and I've shouldn't get those results but that's what I have and what I needed to sort out.

    Actually, the phrase CTE stands for "common table expression" and this is not a table at all. We need a key, by definition, again, there are so many reasons that CVS are not allowed in RDBMS. But let's go ahead and ignore everything the Dr. Codd taught us the last 40 years.

    Just decide which of these are equal to {1, 2, 3}, and why

    {1, 2, 3} identical order and elements

    {1, 2, 2, 3, 3, 3} Have the same elements, same order but different cardinality

    {3, 3, 1, 2, 2, 3} Have the same elements, same order but different cardinality

    {3, 2, 1} {2, 1, 3} Have same elements, same cardinality but different order

    Now define >, >=, <>, etc. and justify them. I really like the not equal operations, because some list are "more not equal" than others because they contain a different number of matching element. Lists can behave like sets, but that of course defeats the whole purpose of having ordering in them, doesn't it?

    I would rewrite your your query so that produces a real table, with a key, and then compare the results using standard set operators that are part of the SQL language. To save you the trouble of having to invent her own system of mathematics 🙁

    Please post DDL and follow ANSI/ISO standards when asking for help. 

  • Ok, let's start from the beginning. Let's say I have a hotel with rooms. There are rooms for 2 persons, rooms for 3 persons and rooms for 5 persons. For 2 persons there are 2 rooms and the same for 3 persons (2 rooms). With my CTE I've got the combinations for a group of 10 people. So I have 2,3,5, 2,3,5, 3,2,5, 3,2,5 and others. There is no difference between the rooms for 2 persons so, in fact, if my combination is 2a, 3a, 5 or 2b, 3a, 5, I don't care, it's the same combination for me. I don't know the english term for this but, in Spanish, I called this combination, a set of elements where the order doesn't matter.
    So, as a result of my recursive CTE I've got a table "similar" to the one I posted in the first message and therefore, my question.
    The problem can be a little more complicated if, for example, the rooms for 2 persons have different prices, but that's another story.
    I'm not trying to redefine maths or relational database theory, my questions was more simple. If you prefer, forget about comma and separated the rooms by - so the combinations will be 2-3-5 or 3-2-5. In other words, I just need to separate those values, get an ordered string and see if the string it's the same.

  • Mauricio_ - Saturday, June 9, 2018 12:37 PM

    Ok, let's start from the beginning. Let's say I have a hotel with rooms. There are rooms for 2 persons, rooms for 3 persons and rooms for 5 persons. For 2 persons there are 2 rooms and the same for 3 persons (2 rooms). With my CTE I've got the combinations for a group of 10 people. So I have 2,3,5, 2,3,5, 3,2,5, 3,2,5 and others. There is no difference between the rooms for 2 persons so, in fact, if my combination is 2a, 3a, 5 or 2b, 3a, 5, I don't care, it's the same combination for me. I don't know the english term for this but, in Spanish, I called this combination, a set of elements where the order doesn't matter.
    So, as a result of my recursive CTE I've got a table "similar" to the one I posted in the first message and therefore, my question.
    The problem can be a little more complicated if, for example, the rooms for 2 persons have different prices, but that's another story.
    I'm not trying to redefine maths or relational database theory, my questions was more simple. If you prefer, forget about comma and separated the rooms by - so the combinations will be 2-3-5 or 3-2-5. In other words, I just need to separate those values, get an ordered string and see if the string it's the same.

    Small piece of advice.... don't waste your time engaging Mr Celko in a discussion related to set theory or RDBMS principles.  You have a real-world problem to solve, and Mr. Celko has no idea how to handle reality when it differs from the reality his beloved standards exist to try and create.   You probably didn't design whatever mess you have, but you do have to deal with the consequences, and therefore need real help, not sky-high ideals and a drum-beater with nothing better to do than criticize.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • Thanks for the advice, Steve. Actually I was not trying to start a discussion and, to be honest, I was a bit surprised with his response.
    I understand that I'm not following standards and, maybe, there is another solution that I'm not aware. For the moment, the recursive CTE was the only I could find.
    Thanks again.

  • Mauricio_ - Monday, June 11, 2018 6:50 AM

    Thanks for the advice, Steve. Actually I was not trying to start a discussion and, to be honest, I was a bit surprised with his response.
    I understand that I'm not following standards and, maybe, there is another solution that I'm not aware. For the moment, the recursive CTE was the only I could find.
    Thanks again.

    And it's not necessarily a disaster.   The problem Joe has is that his online persona is just a little too much.   Over the top, so to speak.   You can usually create a CSV formatted column using FOR XML PATH(''), and may or may not need a CTE or even recursion, for that matter.   It just depends on the nature of the data available to you.   If you have the time, feel free to expand on the details of what the data tables you have available to you contain, and ideally, table create statements for the relevant tables, along with insert statements for sample data that help illustrate the problem you are trying to solve.   Folks here are usually able to solve problems framed that way fairly quickly.    You should also present the expected results based on the sample data, along with the code you've tried so far.

    As to Mr. Celko, your best bet is to stay as far away as you can.   While there are occasional occurrences of him providing useful information, they're pretty rare, and most of his time here is spent criticizing everything that doesn't match an ideal, bashing COBOL, and appearing to be a hypocrite as he recommends things that violate the very standards he so jealously attempts to protect.   According to others here, in person he's allegedly a pretty nice guy, but given his online persona here, and the fact that he acknowledges his curmudgeonly perspective and celebrates it, I'd really rather not ever meet the dude.  Best thing to do is ignore him.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • Here is an attached file with some sample. It's not the final solution but you can get the idea, I have a table with rooms that belong to an accommodation (hotel). Every room has a price and sometimes, for the same size of room I can have different prices. The maximum number of participants is 20, a way to limit the recursive CTE and, in my example, I want to get the combinations of rooms for 10 people. If the combination, considering the number of persons in each room, is the same as other but prices are different, then I need both. But if the combination and price are the same, I need to delete one.
    As I said before, maybe a CTE is not the best approach so I'm very open to hear others ideas.

  • As your code works, and gets the possible combinations via recursion, the only thing you may have to worry about is performance.   To eliminate your duplicates, however, you need an ordered list, and because you used a CSV format, here's some code that will ORDER your list for you and let the DISTINCT take care of eliminating the dupes.
    IF OBJECT_ID(N'tempdb..#RESULTS', N'U') IS NOT NULL
        BEGIN
        DROP TABLE #RESULTS;
        END;

    IF OBJECT_ID(N'tempdb..#ACCOS', N'U') IS NOT NULL
        BEGIN
        DROP TABLE #ACCOS;
        END;

    IF OBJECT_ID(N'tempdb..#ROOMS', N'U') IS NOT NULL
        BEGIN
        DROP TABLE #ROOMS;
        END;

    CREATE TABLE #ACCOS (
        ACCO_ID INT NOT NULL,
        ACCO_NAME VARCHAR(40) NOT NULL
    );

    CREATE TABLE #ROOMS (
        ACCO_ID INT NOT NULL,
        ROOM_ID INT NOT NULL,
        ROOM_NAME VARCHAR(10) NOT NULL,
        OCCUPANTS TINYINT NOT NULL,
        PRICE DECIMAL(15,4) NOT NULL,
        AVAILABLE BIT NOT NULL
    );

    INSERT INTO #ACCOS (ACCO_ID, ACCO_NAME)
    VALUES    (1,'TEST 1'),
            (2, 'TEST 2'),
            (3, 'TEST 3'),
            (4, 'TEST 4');

    INSERT INTO #ROOMS (ACCO_ID, ROOM_ID, ROOM_NAME, OCCUPANTS, PRICE, AVAILABLE)
    VALUES    (1, 1, '1PKA02', 2, 800, 1),
            (1, 2, '1PKA04', 4, 800, 1),
            (1, 3, '1PKA06', 6, 800, 1),
            (1, 4, '1PKA08', 8, 800, 1),
            (2, 5, '2PKA03', 3, 800, 1),
            (2, 6, '2PKA04', 4, 800, 1),
            (2, 7, '2PKA05', 5, 800, 1),
            (2, 8, '2PKA06', 6, 800, 1),
            (2, 9, '2PKA07', 7, 800, 1),
            (3, 10, '3PKA02', 2, 400, 1),
            (3, 31, '3PKA02', 2, 425, 1),
            (3, 32, '3PKA02', 2, 425, 1),
            (3, 11, '3PKA03', 3, 450, 1),
            (3, 33, '3PKA03', 3, 450, 1),
            (3, 12, '3PKA04', 4, 600, 1),
            (3, 13, '3PKA05', 5, 700, 1),
            (3, 14, '3PKA06', 6, 750, 1),
            (3, 15, '3PKA07', 7, 870, 1),
            (3, 16, '3PKA08', 8, 970, 1),
            (3, 17, '3PKA09', 9, 1100, 1),
            (3, 18, '3PKA010', 10, 1200, 1),
            (4, 19, '4PKA02',2, 500, 1),
            (4, 20, '4PKA03',3, 500, 1),
            (4, 21, '4PKA02',2, 500, 0);

    -- Test data
    DECLARE @ACCO_ID AS int = 3,
            @MAX_CUSTOMERS AS int = 20,
            @OCCUPANTS AS int = 10;

    WITH CTE_COMBINATION AS (

        SELECT
            R.ACCO_ID,
            R.ROOM_ID,
            R.OCCUPANTS,
            CAST(R.OCCUPANTS AS VARCHAR(30)) AS [ROOMS],
            CAST(R.ROOM_ID AS VARCHAR(30)) AS ROOM_IDS,
            CAST(R.PRICE AS DECIMAL(15,4)) AS PRICE
        FROM #ROOMS AS R
        WHERE
            R.ACCO_ID = @ACCO_ID
            AND R.AVAILABLE = 1
        UNION ALL
        SELECT
            R2.ACCO_ID,
            R2.ROOM_ID,
            CC.OCCUPANTS + R2.OCCUPANTS,
            CAST(CC.ROOMS + ', ' + CAST(R2.OCCUPANTS AS VARCHAR(6)) AS VARCHAR(30)),
            CAST(CC.ROOM_IDS + ' - ' + CAST(R2.ROOM_ID AS VARCHAR(6)) AS VARCHAR(30)),
            CAST(CC.PRICE + R2.PRICE AS DECIMAL(15,4)) AS PRICE
        FROM #ROOMS AS R2
            INNER JOIN CTE_COMBINATION AS CC
                ON CC.ACCO_ID = R2.ACCO_ID
        WHERE
            R2.ROOM_ID > CC.ROOM_ID
            AND R2.AVAILABLE = 1
    )
    SELECT DISTINCT
        ORL.ORDERED_ROOM_LIST,
        CC.PRICE
    FROM CTE_COMBINATION AS CC
        CROSS APPLY (
            SELECT STUFF((
                SELECT ', ' + S.Item
                FROM dbo.DelimitedSplit8K(REPLACE(CC.ROOMS, ' ', ''), ',') AS S
                ORDER BY S.Item
                FOR XML PATH('')
                ), 1, 2, '') AS ORDERED_ROOM_LIST
            ) AS ORL
    WHERE CC.OCCUPANTS = @OCCUPANTS
    ORDER BY
        CC.PRICE;

    IF OBJECT_ID(N'tempdb..#RESULTS', N'U') IS NOT NULL
        BEGIN
        DROP TABLE #RESULTS;
        END;

    IF OBJECT_ID(N'tempdb..#ACCOS', N'U') IS NOT NULL
        BEGIN
        DROP TABLE #ACCOS;
        END;

    IF OBJECT_ID(N'tempdb..#CSV', N'U') IS NOT NULL
        BEGIN
        DROP TABLE #CSV;
        END;

    IF OBJECT_ID(N'tempdb..#ROOMS', N'U') IS NOT NULL
        BEGIN
        DROP TABLE #ROOMS;
        END;
    GO

    The string splitting function dbo.DelimitedSplit8K is located here:

    http://www.sqlservercentral.com/articles/72993/

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply