Porcess Range IDs

  • Hello everyone,

    DECLARE @MyTable TABLE

    (

    RowID bigint,

    RowKey int,

    Locale varchar(10),

    Descr nvarchar(1000)

    )

    INSERT INTO @MyTable (RowID,RowKey,Locale,Descr)

    SELECT '1','5','en','Sample Description 1' UNION ALL

    SELECT '2','5','de','Sample Description 2' UNION ALL

    SELECT '3','4','en','Sample Description 3' UNION ALL

    SELECT '4','3','en','Sample Description 4' UNION ALL

    SELECT '5','2','en','Sample Description 5' UNION ALL

    SELECT '6','1','en','Sample Description 6' UNION ALL

    SELECT '7','1','de','Sample Description 7' UNION ALL

    SELECT '8','2','en','Sample Description 8' UNION ALL

    SELECT '9','3','en','Sample Description 9' UNION ALL

    SELECT '10','5','en','Sample Description 10' UNION ALL

    SELECT '11','4','en','Sample Description 11' UNION ALL

    SELECT '12','3','en','Sample Description 12' UNION ALL

    SELECT '13','2','en','Sample Description 13' UNION ALL

    SELECT '14','1','en','Sample Description 14' UNION ALL

    SELECT '15','3','de','Sample Description 15'

    SELECT * FROM @MyTable

    In the above data, How do I get the range of RowIDs in such a way that there will be no duplicate keys(RowKey) in a certain RowID range?

    I need to check both the [RowKey] and [Locale] columns combination in getting the range.

    So the above data shall output/result the following range:

    RowID

    8

    12

    16

    So the range from 1-8 will cover only the following range of data with no duplicate based on the column

    combination [RowKey] & [Locale]:

    RowID RowKey Locale Descr

    15enSample Description 1

    2 5deSample Description 2

    34enSample Description 3

    43enSample Description 4

    52enSample Description 5

    61enSample Description 6

    71deSample Description 7

    then the range of 8-12 will cover only the following range of data:

    RowID RowKey Locale Descr

    8 2enSample Description 8

    9 3enSample Description 9

    105enSample Description 10

    114enSample Description 11

    then the range of 12-16 will cover only the following range of data:

    RowID RowKey Locale Descr

    123enSample Description 12

    13 2enSample Description 13

    141enSample Description 14

    153deSample Description 15

    You may be wondering why there is RowID 16 in the output, the reason is that I need to get the MAX(RowID)+1 to cover all the range needed.

    Anyone who knows how to get these range ids?

    thank you very much for your help.

    - dans

  • I'm afraid I don't understand the relationship between RowID and RowKey. How do you determine the range in which no RowKey can be duplicate?

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • RowIDs are just for row counters on which to return the range result. What I need to look at is the combination of the two columns [RowKey] and [Locale] as what I've mentioned above. If there is no RowKey & Locale can be duplicate then the range result is

    RowID

    16

    which means that the range is from 1-16 since there's no duplicates are found within this range.

    thanks a lot for your time and help.

  • So, what you're trying to find is how many in a row have no duplicates? Is that correct?

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Actually no, I'm trying to find the interleaving rows of unique data (trying to look at the combination of RowKey and Locale column). In the above sample, you can see that the range result doesn't have duplicates for every range of RowID's.

    So from 1-8 range of RowID's which has the data below:

    First Range

    RowID RowKey Locale Descr

    15enSample Description 1

    2 5deSample Description 2

    34enSample Description 3

    43enSample Description 4

    52enSample Description 5

    61enSample Description 6

    71deSample Description 7

    each of RowKey & Locale column data combination are unique(meaning no duplicates), RowID 8 is not included on the first range since it has the same (RowKey & Locale) data with RowID 5 which is within the first range already.

    So, RowID 8 will belong to the second range, as shown below:

    Second Range

    RowID RowKey Locale Descr

    8 2enSample Description 8

    9 3enSample Description 9

    105enSample Description 10

    114enSample Description 11

    Since RowID 12 (RowKey & Locale) data is the same with RowID 9, it won't be included in the second range 8-12 instead on the third range which is 12-16:

    Third Range

    RowID RowKey Locale Descr

    123enSample Description 12

    13 2enSample Description 13

    141enSample Description 14

    153deSample Description 15

    Hope this will help enlighten on what I am about to achieve.

    Thank you for your help.

    -dans

  • dsegalles80 (9/2/2008)


    Actually no, I'm trying to find the interleaving rows of unique data

    Are you trying to receive multiple resultsets....

    RowID RowKey Locale Descr

    153deSample Description 15

    Also from reading this wouldn't RowID 15 be part of the first set.

  • Use the Row_Number() function, partition by location and row key, to find the duplicates. Then select from one to the next, not including the next duplicate. That should give you what you want.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • To parackson:

    RowID 15 shouldn't be part of first set since this should be in sequence and this is a stored procedure which returns directly the result (range of row ids)

    To GSquared:

    Thank you for the idea about the partition but if can give me the sample source code in where it can output the correct result, then I would really appreciate it.

    It's something like this:

    SELECT RowID,RowKey,

    row_number() OVER (PARTITION BY RowKey,Locale ORDER BY RowID) AS DupRank

    FROM @MyTable

    which will give you the result with ranking for every duplicate, but how would I select/traverse from one to the next, not including the next duplicate?

    thanks so much for the help.

    - dans

  • Yes, that use of Row_Number will give you the ones that are duplicates.

    What you have to do at that point is find the first one where the row number is higher than 1, and end your set there. Well, one row before that (you can use your row ID for that part).

    Then start at that one, and re-run the duplicate check.

    This will have to be a recursive function of some sort.

    It's not going to perform well, but SQL isn't designed to deal with rows with an inherent, complex sequentiality since relational databases are inherently non-ordered.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • I see. What we are dealing here is a huge data like about 2 Million or more with possible pattern as to my sample. Aside from this, I'm also looking for the optimize possible solution to generate the exact result on this. Thank you for your inputs GSquared though it is much appreciated if code samples are displayed here to have a clearer view on what you mean.

    I really appreciate your help.

    -dans

  • Hi dans,

    You might try something like in the attached example.

    I have not spent too much time verifying it, so it might contain a few errors.

    But the concept might be helpful.

    I have tested it with 5000 copies of the original data you posted, and the query ran for approximately 1:30 minutes for 75000 RowIDs and 15000 Ranges.

    Since the ranges were very small in the example, you might be able to achieve better results

    with your real table if there alre also bigger ranges.

    You might also try to play around with the amount by which the OptRows is increased and decreased on each test.

    Best Regards,

    Chris Büttner

  • I gotta ask... why do you need to do this? What is the purpose of this?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • hmm... interesting, how about having a counter that resets to 1 when it encounters a duplicate... i'm not sure if this is applicable/achievable using partitioning/ranking,

    my thought would be

    RowIDRowKeyCounter

    151

    251

    342

    433

    524

    611

    711

    Then selecting all Counter with value of 1

    RowIDRowKeyCounter

    151

    251

    611

    711

  • jj, the problem with that is it has to be a duplicate, since the last duplicate.

    Getting all the duplicates is easy. Getting the duplicates since the last duplicate is a pain.

    It means that, if row 8 is a duplicate of row 3, and row 10 is a duplicate of row 5, then row 10 doesn't count as a duplicate, because 5 is before 8.

    I don't see a way around recursive processing. A function could take a "start row" input parameter, finds the first duplicate, return the ID, and then call itself with that as the new start point. That could give you a list of the start and end of each set.

    Of course, recursion is RBAR's slightly less ugly cousin, but I don't see a way around it in this case.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • GSquared (9/4/2008)


    jj, the problem with that is it has to be a duplicate, since the last duplicate.

    Getting all the duplicates is easy. Getting the duplicates since the last duplicate is a pain.

    It means that, if row 8 is a duplicate of row 3, and row 10 is a duplicate of row 5, then row 10 doesn't count as a duplicate, because 5 is before 8.

    I don't see a way around recursive processing. A function could take a "start row" input parameter, finds the first duplicate, return the ID, and then call itself with that as the new start point. That could give you a list of the start and end of each set.

    Of course, recursion is RBAR's slightly less ugly cousin, but I don't see a way around it in this case.

    hmm, sounds like its going to be messy.

    lets try not to go into cursors/loops/recursion.. this is indeed challenging. if not impossible.

    How about, Counter only Resets to 1 for every duplicate key encountered in a range

    RowID RowKey Counter

    1 1 1

    2 2 2

    3 3 3

    4 3 1

    5 2 2

    6 1 3

    7 3 1

    8 2 2

    9 2 1

    10 4 2

    and get all the 1

    RowID RowKey Counter

    1 1 1

    4 3 1

    7 3 1

    9 2 1

Viewing 15 posts - 1 through 15 (of 31 total)

You must be logged in to reply to this topic. Login to reply