Script to create realistic generic financial data

  • Hi SSC,

    This morning I was doing some work from home on some work transcribing a financial algorithm into SQL, when my VPN client failed. Vigilant, I decided to make a table of fake financial data so that I could continue my work on a proof of concept without having access to the real data. I thought I'd share my script with you all, perhaps get some critique, but hopefully help someone who might need to make a similar large set of data.

    The key points I wanted out of the data were to have multiple funds, randomized data which looks like it realistically hovers around the same price (i.e. not 1000000 one day and -99927 the next), and of course, no loops. Please note, I included a temporary tally table in the script, but that's not really the point of this post; I just wanted everyone to be able to run this script right out of the box. If you're interested in building a tally table, there are lots of good articles[/url] out there. Also, while I've done this all with temporary objects, I'd probably recommend persisting this data somewhere if you plan to use it repeatedly like I do. Feel free to replace any of the temp objects with real ones.

    The process I went through is to first build a skeleton of fund identifiers and dates (msdates to be precise). To that skeleton, I then set the first occurence in the series to a random float between 1 and 1000, so that each fund wasn't exactly the same size. Then, using a quirky update, I added a much smaller random float to the previous value to give it the impression of moving around randomly, but still staying in the same ballpark as the last data point. While I'd be hesitant to use a quirky update in production code, for something like this, I think it works really well. Here's the script:

    /***************************************

    This script creates a table for generic fund data (symbol, date and price) and then populates it with semi-realistic random data

    By that I mean it's not just all over the place, it actually will trend a bit.

    select top 10000 *

    from #GenericFundData

    where TradingDate between 20000 and 20005

    ***************************************/

    use AdventureWorks

    go

    create table #Tally

    (

    num int primary key clustered

    )

    create table #GenericFundData

    (

    FundID varchar(20) not null,

    TradingDate int not null,

    Value decimal(38,6) null

    primary key clustered (FundID, TradingDate)

    )

    go

    --make a small tally table for use in this script

    select top 50000 row_number() over (order by (select null))

    from sys.objects a

    cross join sys.objects b

    cross join sys.objects c

    declare

    @RandomIncrement float, --Each row, this gets set to a small number which can be used to augment the price from the previous day

    @CurrentValue float, --pricing value on a given date

    @FundID varchar(20) --Fund id, used as an anchor in the quirky update to force a clustered index update

    ;with datecte as --Set a time frame of 20000 to 50000. Arbirtraily chosen

    (

    select TradingDate = num

    from #Tally

    where num between 20000 and 50000

    ), fundcte as --create 50 fake symbols. I appended A just to make them look more symbol-ish.

    (

    select top 50 FundID = concat(num, 'A')

    from #Tally

    order by num asc

    )

    insert into #GenericFundData

    (

    FundID,

    TradingDate

    )

    select

    FundID = f.FundID,

    TradingDate = d.TradingDate

    from datecte d

    cross join fundcte f

    --For the earliest data point, set a seed value of a random number between effectively 1 and 1001

    update a

    set Value = ((abs(checksum(newid())) % 1000) + 1)* rand()

    from #GenericFundData a

    inner join (select FundID, MinTradingDate = min(TradingDate)

    from #GenericFundData

    group by FundID) b

    on a.FundID = b.FundID

    and a.TradingDate = b.MinTradingDate

    --Quirky update. Increments the seed value set in the previous statement by a small amount so that the random amounts look to be sort of trending rather than just all over the place.

    update #GenericFundData with (tablockx)

    set @RandomIncrement = ((checksum(newid()) % 10) + 1) * rand(),

    @CurrentValue = case when TradingDate = 20000 then Value

    else @CurrentValue + @RandomIncrement

    end,

    Value = @CurrentValue,

    @FundID = @FundID --Anchor: forces a clustered index update so the quirky update works in order.

    option (maxdop 1)

    Cheers!

    Executive Junior Cowboy Developer, Esq.[/url]

  • Welcome to the world of SSC although I see you have around 40 posts, so better late than never.

    Jeff Moden has a couple of good articles on setting up randomized test data.

    http://www.sqlservercentral.com/articles/Data+Generation/87901/

    http://www.sqlservercentral.com/articles/Test+Data/88964/

    I see that you applied some of those techniques and my only criticism is that you probably don't really need to multiply by RAND().

    Of course, if you're looking to seed your data with random numbers that are not uniform, here's my take on that:

    http://www.sqlservercentral.com/articles/SQL+Uniform+Random+Numbers/91103/

    One other comment. You could probably have done without the QU by simply putting your random number value into a CROSS APPLY. You can then use that value to generate your increment, all within the original SQL statement (maybe).


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Gabe T. (11/22/2012)


    @FundID = @FundID --Anchor: forces a clustered index update so the quirky update works in order.

    That won't act as an anchor because you have a variable = variable. It probably needs to be @FundID = FundID. Haven't checked the rest of the query.

    As a sidebar, this is all very interesting. You should write an article about it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply