Need unique number for groups of records

  • Hi,

    I'm trying to do the following and haven't been able to figure it out.

    Say there's a table with these records:

    Col1 Col2 Col3

    a b c

    a b c

    a b d

    e f g

    e f g

    I want to generate a number that represents the groups of columns like this:

    Col1 Col2 Col3 MyNumber

    a b c 1

    a b c 1

    a b d 2

    e f g 3

    e f g 3

    So that each grouping gets its own identifier. I've tried this:

    SELECT Col1, Col2, Col3

    row_number() OVER (PARTITION BY Col1, Col2, Col3

    ORDER BY Col1, Col2, Col3) AS MyNumber

    FROM MyTable

    But I get this:

    Col1 Col2 Col3 MyNumber

    a b c 1

    a b c 2

    a b d 1

    e f g 1

    e f g 2

    See my problem? Any input greatly appreciated!!!!

    Rebecca

  • Rebecca

    Here's one way: select distinct columns from the table, use row_number() on that, then join back to the table.

    John

  • Try the following:

    RANK() OVER (ORDER BY Col1, Col2, Col3)

    Notice it's using rank() instead of row_number() and no partition by.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Eric M Russell (8/6/2014)


    Try the following:

    RANK() OVER (ORDER BY Col1, Col2, Col3)

    Notice it's using rank() instead of row_number() and no partition by.

    May actually want to use DENSE_RANK() instead.

  • You guys rock! Thank you so much! I was not aware of Rank() or Dense_Rank() but it was Dense_Rank() that did the trick. I'm so happy to know about these functions - very useful!

  • SELECT *,DENSE_RANK() OVER(ORDER BY COL3) 'My Number' FROM MyTable

  • CELKO (8/19/2014)


    This is called a hash function. You can Google it or get a book on freshman computing, or data structures.

    Out of curiosity, isn't this a slight contradiction on your part? If the functionality is provided by the standard functionality, why should one reinvent the wheel?

    ๐Ÿ˜Ž

  • Eirikur Eiriksson (8/25/2014)


    CELKO (8/19/2014)


    This is called a hash function. You can Google it or get a book on freshman computing, or data structures.

    Out of curiosity, isn't this a slight contradiction on your part? If the functionality is provided by the standard functionality, why should one reinvent the wheel?

    ๐Ÿ˜Ž

    Who needs to reinvent the wheel? SQL Server has the HashBytes function that can return the MD2, MD4, MD5, SHA, or SHA1 has fo a given input.

    ๐Ÿ˜€

  • CELKO (8/19/2014)


    >> Say there's a table with these records: <<

    You missed every fundamental concept you need here. Rows are not records. Your narrative (we post DDL instead) is useless; it has duplicates, so it is not a table.

    Your trouble Joe is that you think in tables with columns and rows instead of in terms of relations (in the relational calculus sense, not the set theory sense) with attributes and maps. Just look at what a record is in a language with a proper abstract type system and you'll see that it's a map (in the mathematical function theory sense) on a finite list of attribute names to a matching list of types, which is exactly what the thing you insist on calling a "row" is.

    Then you compound you error by insisting that a table can't contain duplicates; since for example what a projection returns is a derived relation you have to accept that some derived ralations 9and hence some derived tables) can contain duplicates, and the fact that a base relation doesn't affect the fact that a derived relation can.

    I realise that you are trying to encourage conformity to the standard, but stating nonsense is not a successful or even a sensible means of doing that.

    >>I want to generate a number that represents the groups of columns like this: <<

    This is called a hash function. You can Google it or get a book on freshman computing, or data structures. Frankly with your narrative, I would set UNIQUE (c1, c2, c3) and add a โ€œdegree of duplicationโ€ (see Codd's second version of the Relational Model)

    Why oh why oh why shouldn't this be done using the easy mechanism provided in T-SQL instead of by reinventing the wheel? especially the far from round wheel with nasty pointy corners, utterly inappropriate in this context: while a hash function will allocate the same value to equal elements there is no guarantee at all that it will allocate different values to different elements, so a hash function could fail to deliver the required effect even with only 2 distinct values, while using dense rank will not fail unless the number of different values is 2^63 or greater.

    And while I'm asking "why", why don't you conform to this website's quote tag norm instead of doing something completely different?

    Tom

  • TomThomson (8/25/2014)


    CELKO (8/19/2014)


    >> ...

    ...

    And while I'm asking "why", why don't you conform to this website's quote tag norm instead of doing something completely different?

    Could it be that this sites quote tag norm isn't ANSI Standard? If it isn't ANSI Standard it isn't good enough for Mr. Celko I guess. ๐Ÿ˜€

  • I am definitely no expert, but in my (25-year) career I see a big difference between those who write books about coding versus those of us in the trenches who actually have to make things work. Under pressure, with no time to luxuriate in the philosophy of Best Practices. Ms. Pettis' suggestion of Dense_Rank() solved my problem perfectly and it was easy and intuitive to use.

  • rwaring 96203 (8/25/2014)


    I am definitely no expert, but in my (25-year) career I see a big difference between those who write books about coding versus those of us in the trenches who actually have to make things work. Under pressure, with no time to luxuriate in the philosophy of Best Practices. Ms. Pettis' suggestion of Dense_Rank() solved my problem perfectly and it was easy and intuitive to use.

    +1

    ๐Ÿ˜Ž

  • rwaring 96203 (8/25/2014)


    I am definitely no expert, but in my (25-year) career I see a big difference between those who write books about coding versus those of us in the trenches who actually have to make things work. Under pressure, with no time to luxuriate in the philosophy of Best Practices. Ms. Pettis' suggestion of Dense_Rank() solved my problem perfectly and it was easy and intuitive to use.

    <* cough *>

    rwaring 96203 (8/25/2014)


    I am definitely no expert, but in my (25-year) career I see a big difference between those who write books about coding versus those of us in the trenches who actually have to make things work. Under pressure, with no time to luxuriate in the philosophy of Best Practices. Ms. Mr. Pettis' suggestion of Dense_Rank() solved my problem perfectly and it was easy and intuitive to use.

  • Lynn Pettis (8/25/2014)


    rwaring 96203 (8/25/2014)


    I am definitely no expert, but in my (25-year) career I see a big difference between those who write books about coding versus those of us in the trenches who actually have to make things work. Under pressure, with no time to luxuriate in the philosophy of Best Practices. Ms. Pettis' suggestion of Dense_Rank() solved my problem perfectly and it was easy and intuitive to use.

    <* cough *>

    rwaring 96203 (8/25/2014)


    I am definitely no expert, but in my (25-year) career I see a big difference between those who write books about coding versus those of us in the trenches who actually have to make things work. Under pressure, with no time to luxuriate in the philosophy of Best Practices. Ms. Mr. Pettis' suggestion of Dense_Rank() solved my problem perfectly and it was easy and intuitive to use.

    Got me kind of worried for a minute:w00t:

    ๐Ÿ˜Ž

  • Sorry about that!!

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply