random integer number scalar function

  • rbarryyoung (11/29/2008)


    Huhn, the bug happens on my last remaining SQL 2000 server also. MAXDOP has no effect on it.

    Is it a multi CPU box you're working on? Also, which SP are you guys using? I'm using sp3A... never had the hair to go to 4 because I heard about some performance issues.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Heh... well never mind... all of the sudden, I'm getting the same bug and I only have a single CPU...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Isn't that strange? It's the GROUP BY loosing it's mind and it doesn't happen in 2k5. Not sure what to do about it yet.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • The only thing that I can figure out is that the optimizer in 2k5 resolves the sub-query before using it... the one in 2k seems to be trying to resolve it on the fly and makes mistakes because Number is indeterminate. I've never run into this problem before because I've either always put such things into a Temp table first or simply not done a Group By on the random number.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Great timing Chris, I was just looking for something that would allow me to generate random numbers.

    Jeff, based on your code for random test data, I had been playing with newid() and now I took another look at the checksum().

    Question:

    What I'm trying to do is show folks how to generate test data that we (corporate we) can share across state units, without revealing HIPAA protected information (member names/ids/dates of service, etc). If I do this to generate fake names and dates:

    SELECT TOP 10 member.fullname,

    'member'+convert(varchar(25),abs(checksum(member.fullname))) AS testMember,

    member.dob,

    dateadd(dd,abs(checksum(member.dob))%(5-1+1) +1,member.dob) AS testDate

    FROM member

    can someone reverse the checksum somehow and read the correct data?

    ---------------------------------------------------------
    How best to post your question[/url]
    How to post performance problems[/url]
    Tally Table:What it is and how it replaces a loop[/url]

    "stewsterl 80804 (10/16/2009)I guess when you stop and try to understand the solution provided you not only learn, but save yourself some headaches when you need to make any slight changes."

  • The CHECKSUM() is not cryptographically safe, however, if you use it as a one-way hash, it shoudl be unreversible, unless the crackers already have a complete list of the source values.

    IE, CHECKSUM(fullname) should not be reversible, unless they already know what all of your "fullname" values could be, and in that context, there is no safe way to hash short text with limited possible values without a variable & secret key.

    If you want more protection than CHECKSUM, then you might want to try HASHBYTES().

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Thanks, and understood. I think this will be fine for our purposes, and we're mostly on 2000 at the moment, so hashbytes() won't help me until we upgrade.

    Thanks again,

    Jon

    ---------------------------------------------------------
    How best to post your question[/url]
    How to post performance problems[/url]
    Tally Table:What it is and how it replaces a loop[/url]

    "stewsterl 80804 (10/16/2009)I guess when you stop and try to understand the solution provided you not only learn, but save yourself some headaches when you need to make any slight changes."

  • Glad I could help.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Thought you might like to see what I did, any suggestions welcome on making it better. Tried to KISS for the end user of this script.

    Enjoy!

    ****Edit - I should mention that I'm NOT generating test data for the actual SELECT that's in here because, well,

    1) not sending PHI is the whole point of this, and 2) the SELECT is meant to be overwritten by the user anyway (you, in this case) *****

    /*

    Title: 'Generate De-identified Test Data.sql'

    Description:the purpose of this script is to demonstrate how to generate test data

    so that help can be sought enterprise-wide, from users who may not have

    access to the actual data source. Values entered into the variables

    will change dates, although zeroes can be used to retain date information

    if necessary

    There are five steps to use this (marked as such below):

    STEP ONE - initialize variables as needed to name tables, limit results and move dates around

    STEP TWO - enter your query, adding 'INTO ##myTemp' as noted and fully naming tables

    STEP THREE - copy your table.field names into the STEP THREE area (automatically grabs datatypes and lengths)

    STEP FOUR - Execute this script and copy/paste first the Message tab, then the Results tab into your post/email

    STEP FIVE - Save a copy of the results in case you want to tie the new values back to the original in your data

    Known flaws:n/a

    Author: Jon Crawford

    Healthcare Data Analyst

    Revision History:

    DateChanges

    -------------

    12/12/2008'initial implementation'

    */

    SET NOCOUNT ON

    USE TempDB -- DO NOT CHANGE THIS

    -- =================================

    -- DECLARE variables

    -- =================================

    DECLARE @shiftYearsLower int,-- lower bound of random years added to dates

    @shiftYearsUpper int,-- upper bound of random years added to dates

    @shiftMonthsLower int,-- lower bound of random months added to dates

    @shiftMonthsUpper int,-- upper bound of random months added to dates

    @shiftDaysLower int,-- lower bound of random days added to dates

    @shiftDaysUpper int,-- upper bound of random days added to dates

    @sq char(1), -- holds the value of a single quote to simplify our dynamic SQL

    @limit varchar(25), -- holds the number of rows you want to generate

    @columns varchar(8000), -- variable to hold the columns that we want

    @createTableSQL varchar(8000), -- variable to hold the dynamic SQL that will generate our table

    @mySelectList varchar(8000), -- variable to hold the dynamic SQL that represents your SELECT in the real world

    @myInsertList varchar(8000), -- @myInsertList holds the dynamic SQL for the INSERT statements you need

    -- if your inserts are too long for this,

    -- then my advice is to rethink your inserts, you're passing test data. Make it fit.

    @myTableData varchar(8000), -- holds the data that you will be copy/pasting

    @myTempTableName varchar(255) -- holds the name of your current output table (set to #temp)

    -- ===============================================

    -- Initialize variables - DON'T CHANGE THESE ONES

    -- ===============================================

    -- we have to set all to Empty String ('') because we're adding their value to themselves,

    --and if we don't, the first value is undefined, which wrecks the whole thing

    SET @columns = ''

    SET @createTableSQL = ''

    SET @mySelectList = ''

    SET @myInsertList = ''

    SET @myTableData = ''

    SET @sq = ''''

    -- ======================================================================================

    -- STEP ONE: Initialize variables - CHANGE THESE AS NEEDED

    -- ======================================================================================

    -- change @myTempTableName to whatever you want your table name to be that you will be posting

    -- *** NOTE *** you'll have to run this script once for each temp table that you want to generate

    SET @myTempTableName = '#temp'

    -- set @limit to 'TOP x' where x is the number of rows you want to generate

    -- or set to '' if you don't want to limit the results

    -- (but remember your statements need to fit within an 8000 character variable)

    SET @limit = 'TOP 10'

    -- the following shift days/months/years in all date fields either a set number or a random number in a boundary

    SET @shiftYearsLower = 10 -- lower bound of random years added to dates

    SET @shiftYearsUpper = 10 -- upper bound of random years added to dates

    SET @shiftMonthsLower = 1 -- lower bound of random months added to dates

    SET @shiftMonthsUpper = 1 -- upper bound of random months added to dates

    SET @shiftDaysLower = 0 -- lower bound of random days added to dates

    SET @shiftDaysUpper = 0 -- upper bound of random days added to dates

    -- =========================

    -- END OF STEP ONE

    -- =========================

    IF OBJECT_ID('Tempdb..##myTemp') IS NOT NULL BEGIN DROP TABLE ##myTemp END

    -- ======================================================================================

    -- STEP TWO:

    -- Add your SELECT statement below, this is a sample based on member and claim startdate

    -- that shows both the original and modified data

    -- SELECT INTO a global temp table (allows for complex joins, calculated fields, etc),

    -- so you'll want to alias all columns that are modified in any way (e.g. column AS alias)

    -- ======================================================================================

    SELECT TOP 10 rtrim(member.fullname) AS member,

    'member'+convert(varchar(25),abs(checksum(member.fullname))) AS testMember,

    rtrim(provider.fullname) AS provider,

    'provider'+convert(varchar(25),abs(checksum(provider.fullname))) AS testProvider,

    rtrim(claim.claimid) AS claimid,

    10000000000+abs(checksum(claim.claimid)) AS testClaim,

    claim.startdate,

    dateadd(yyyy,abs(checksum(claim.startdate))%(@shiftYearsUpper - @shiftYearsLower+1)+@shiftYearsLower,

    dateadd(mm,abs(checksum(claim.startdate))%(@shiftMonthsUpper - @shiftMonthsLower+1)+@shiftMonthsLower,

    dateadd(dd,abs(checksum(claim.startdate))%(@shiftDaysUpper-@shiftDaysLower+1)+@shiftDaysLower,claim.startdate)

    )

    ) AS testClaimStartdate

    -- ======================

    -- Add this to your query after your SELECT clause

    INTO ##myTemp

    -- FULLY NAME YOUR TABLES below along with aliases

    -- e.g. JOIN someDB.dbo.tablename tablename ON blah blah blah

    -- ^db ^author ^table ^alias ^JOIN criteria

    -- ======================

    FROM someDB.dbo.member member

    JOIN someDB.dbo.claim claim on member.memid = claim.memid

    JOIN someDB.dbo.provider provider ON claim.provid = provider.provid

    ORDER BY member.memid

    -- =========================

    -- END OF STEP TWO

    -- =========================

    -- GENERATE INSERT STATEMENTS TO HELP OTHERS POPULATE #temp WITH TEST DATA

    SELECT @columns = @columns + char(9) + cols.name + ' ' + systypes.name

    + CASE WHEN systypes.name IN ('char','varchar') THEN '('+convert(varchar(4),cols.length)+')' ELSE '' END

    + ',' + char(13),

    @mySelectList = @mySelectList + cols.name + ', ',

    @myInsertList = @myInsertList + @sq + ',' + @sq + char(13)

    +

    CASE

    WHEN systypes.name IN ('char','varchar','datetime','smalldatetime')

    THEN '+' +@sq+@sq+@sq+@sq+ ' + '

    --

    +CASE

    WHEN cols.name IN ('memid','carriermemid','fullname','enrollid','member')

    THEN

    @sq+'member'+@sq+'+convert(varchar(25),abs(checksum(ISNULL('+objs.name+'.'+cols.name+','+@sq+@sq+')))) + '

    WHEN cols.name IN ('provid','affiliateid','affiliationid','fedid','provider')

    THEN

    @sq+'provider'+@sq+'+convert(varchar(25),abs(checksum(ISNULL('+objs.name+'.'+cols.name+','+@sq+@sq+')))) + '

    WHEN cols.name IN ('claimid','claim','encounter')

    THEN 'convert(varchar(11),10000000000+abs(checksum('+objs.name+'.'+cols.name+','+@sq+@sq+'))) + '

    WHEN systypes.name IN ('datetime','smalldatetime')

    THEN 'convert(varchar,'+objs.name+'.'+cols.name+',20) + '

    ELSE

    'convert(varchar(25),ISNULL('+objs.name+'.'+cols.name+','+@sq+@sq+')) + '

    END

    + @sq+@sq+@sq+@sq

    ELSE '+ CONVERT(varchar(255),ISNULL(' + objs.name+'.'+cols.name+',0))'

    END

    + ' + '

    FROM dbo.sysobjects AS objs

    INNER JOIN dbo.syscolumns AS cols ON cols.id = objs.id

    INNER JOIN dbo.systypes systypes ON cols.xtype = systypes.xusertype

    WHERE objs.name+'.'+cols.name IN (

    -- ===========================================================================================================

    -- STEP THREE:

    -- change these to your table.field names

    -- (field names should match your fields/aliases in the SELECT above)

    -- ===========================================================================================================

    '##myTemp.member',

    --'##myTemp.testMember',

    '##myTemp.provider',

    --'##myTemp.testProvider',

    '##myTemp.claimid',

    --'##myTemp.testClaim',

    '##myTemp.startdate'--,

    --'##myTemp.testClaimStartDate'

    -- ====================================================================

    -- STEP FOUR: Now Execute the query for each temp table and

    --copy/paste from the Messages tab first and then

    -- the Results tab into your post/email

    -- ====================================================================

    )-- belongs to the above query

    -- Create temp table generation code

    SET @createTableSQL = '

    IF object_id(''Tempdb..'+@myTempTableName+''') IS NOT NULL

    BEGIN DROP TABLE '+@myTempTableName+' END

    CREATE TABLE '+@myTempTableName+'

    (iRow int identity(1,1), -- identity column for primary key

    ' + substring(@columns,1,len(@columns)-2) + ')

    --===== Add a Primary Key to maximize performance

    IF OBJECT_ID(''Tempdb..'+@myTempTableName+''') IS NULL

    BEGIN

    ALTER TABLE '+@myTempTableName+'

    ADD CONSTRAINT PK_'+@myTempTableName+'_iRow

    PRIMARY KEY CLUSTERED (iRow)

    WITH FILLFACTOR = 100

    END'

    PRINT @createTableSQL -- Created code will show up in Messages tab of QA, copy/paste into your email/posted question

    PRINT '-- Insert test data into table' + char(13) + '-------------------------------------------------------' + char(13)

    -- Created code will show up in Results tab of QA, copy/paste into your email/posted question

    SET @myTableData = 'SELECT ' + @limit + ' ' + @sq + 'INSERT INTO '+@myTempTableName+' (' + substring(@mySelectList,1,len(@mySelectList)-1) + ') '

    + 'VALUES (' + @sq + substring(@myInsertList,4,len(@myInsertList)) + @sq +')' + @sq

    + ' AS [Copy and Paste these AFTER the text from the Messages tab]'

    + char(13)+ ' FROM ##myTemp'

    --PRINT @myTableData -- uncommenting this will show you what the dynamic SQL generated from the above looks like

    EXEC(@myTableData) -- execute the dynamic SQL we just generated, to create the INSERT statements

    -- =======================================================================================================

    -- STEP FIVE - Save the data from this SELECT as a reference in case you want to tie back to the original

    -- =======================================================================================================

    SELECT 'See below, member and provider data ' AS [Example of how the data is changed],

    'is converted to memberX, providerX;' AS [.],

    ' claims are changed to other numbers;' AS [..],

    ' and dates are moved randomly' AS [...]

    SELECT * FROM ##myTemp

    ---------------------------------------------------------
    How best to post your question[/url]
    How to post performance problems[/url]
    Tally Table:What it is and how it replaces a loop[/url]

    "stewsterl 80804 (10/16/2009)I guess when you stop and try to understand the solution provided you not only learn, but save yourself some headaches when you need to make any slight changes."

  • Neat! Thanks for sharing, Jeff.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • RBarryYoung (12/17/2008)


    Neat! Thanks for sharing, Jeff.

    Jeff?

    Glad to share, Harold

    ---------------------------------------------------------
    How best to post your question[/url]
    How to post performance problems[/url]
    Tally Table:What it is and how it replaces a loop[/url]

    "stewsterl 80804 (10/16/2009)I guess when you stop and try to understand the solution provided you not only learn, but save yourself some headaches when you need to make any slight changes."

  • Sorry Jon, I misread earlier.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

Viewing 12 posts - 16 through 26 (of 26 total)

You must be logged in to reply to this topic. Login to reply