Test Data

  • Does anybody have some code to build a table of strings from 1 to 10 characters in length, with no trailing or leading spaces (embedded spaces are allowed) containing all permutations of characters that could be entered by a user on a USA keyboard?

    I have rewritten an internal UDF using a tally table, and need to verify that it returns the same value as the function it was written to replace. (I wanted to eliminate a while loop in the original function.) As I look at what I need, I just can't seem to get my brain on track.

    😎

  • Um...that's a lot of strings.

    I don't have anything pre-made, but something like this is what comes to mind:

    [font="Courier New"]; WITH CharSet (N, Chr)

    AS (

    SELECT N, CHAR(N) FROM dbo.Tally WHERE N BETWEEN 65 AND 90 --Upper case letters

    UNION SELECT N, CHAR(N) FROM dbo.Tally WHERE N BETWEEN 65 AND 90 --Lower Case letters

    UNION SELECT N, CHAR(N) FROM dbo.Tally WHERE N BETWEEN 48 AND 57 --Numbers

    )

    SELECT

    T1.Chr + T2.Chr + T3.Chr

    FROM

    CharSet T1

    CROSS JOIN CharSet T2

    CROSS JOIN CharSet T3[/font]

    You would have to set up the CTE with the entire character set you wanted and add the other seven cross joins.

    If your SQL Server can return all of the possible permutations before it melts or the world ends, please let me know.

  • LOL!! :w00t:

    Thanks. Once I finish working on the second function, I'll start with your code and see what I can come up with. I appreciate the jump start!

    Edit: Anyone happen to know mathematiclly how many strings we may be talking about?

    😎

  • Well, if you used just letters and numbers - say all lowercase you have 35 characters.

    Enter 35^10 in calculator and see what you get...

  • Does this number look about right: 2,758,547,353,515,625?

    That's a heck of a lot of data to ensure that both functions are equivalent.

    😎

  • Lynn Pettis (7/23/2008)


    Does anybody have some code to build a table of strings from 1 to 10 characters in length, with no trailing or leading spaces (embedded spaces are allowed) containing all permutations of characters that could be entered by a user on a USA keyboard?

    Ummm, ... even assuming that we are discounting upper/lower case differences, 5810 is a very large number. :w00t:

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Michael Earl (7/23/2008)


    Well, if you used just letters and numbers - say all lowercase you have 35 characters.

    Hmmm, I'm pretty sure that would be 36.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Lynn Pettis (7/23/2008)


    Does anybody have some code to build a table of strings from 1 to 10 characters in length, with no trailing or leading spaces (embedded spaces are allowed) containing all permutations of characters that could be entered by a user on a USA keyboard?

    I have rewritten an internal UDF using a tally table, and need to verify that it returns the same value as the function it was written to replace. (I wanted to eliminate a while loop in the original function.) As I look at what I need, I just can't seem to get my brain on track.

    😎

    You know me, Lynn... I've just gotta ask... what on Earth are you going to use this for?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Four years ago, my predecessors predecessor wrote 2 scaler UDF's, one to encrypt and one to decrypt a string using the same algorithm that the SIS system uses on-line. It is loop-based to access each character one at a time. Not to bad considering it only has to work with strings upto 10 characters long. Of course, what if things change and it jumps to 128 or 256.

    The purpose of these functions is to allow us to import and export the data outside of the application. Therefore, I have to be absolutely positive that what I wrote mirrors what she wrote in all cases. It would be nice if I knew how to prove it identical mathematically, but I'm not really sure how to go about that either.

    I have run tests based on data in the system, but that isn't necessarily good enough. There could be a subtle difference between them that only manifests itself in very specific cases. The only other way I know, is brute force, check everything. Of course, this may not be feasible either. I do think the more testing I do, the more confident I'l be with my rewrite.

    I am also looking at eliminating a cursor in a stored proc. They loop through the data to be entered one record at a time. Not bad when you only have a few hundred rows, but certainly not scaleable and that is what I am after, especially since they didn't need to use a cursor for this update to begin with.

    Anything else you would like to know? I'd be happy to tell you more if need be.

    😎

  • Ah... I get it now... you want to test/compare every combination of 1 to 10 characters (36 characters total) of the old UDF against the new UDF. (3611)-1 is 131,621,703,842,267,135 or about 131.6 QUADRILLION... Heh... I don't think there's enough time left in the current decade to test all of those combinations unless you happen to have a Cray in your back pocket. 😛 I'm thinking you'll need to be happy with an extreme desk check comparison of the code and just a couple of million comparisons.

    I'm also nervous for you... unless someone happens to have some extreme crypto experience (I don't), I wouldn't trust the homegrown algorithm unless it were certified by an outside crypto org.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • A) That is why I am trying to come up with a subset that is large enough to satisfy my testing. If after a few million rows of test data are checked, and both return the same data for all attempts, good enough.

    B) It isn't that strong, but until we move to something else, like LDAP, it is the best we can do for now. At least it isn't our Personnel or Finance systems.

    C) That is why I didn't publish the code for all to see. With no clue of where to start, it at least makes it a little harder to crack unless you are adamant about doing so.

    Plus, it was fun getting something done using the Tally table instead of a loop. It was worth the exercise for the brain (as opposed to the physical exercise last week, 16 soccer games over 5 days from U11 to U19 boys and girls).

    😎

  • So, do you have a grip on generating the test data or do you still need a bit o' help on that?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • rbarryyoung - Yeah, 36.

    Thanks for pointing out the fact that I forgot about the zero.

    Sorry, I was mid-laugh thinking about how many combinations there would really be.

  • Michael Earl (7/24/2008)


    rbarryyoung - Yeah, 36.

    Thanks for pointing out the fact that I forgot about the zero.

    Don't worry about it, western civilization got along without it for several thousand years... 🙂

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Jeff Moden (7/24/2008)


    So, do you have a grip on generating the test data or do you still need a bit o' help on that?

    With thanks to Michael Earl, yes I got a grip on the test data. Generated over 80 million test records and after almost 3 hours, both functions returned identical values.

    I feel confident in my changes. Now, the question is how does everyone else here feel about the changes.

    Thank you for the help.

    😎

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply