Create 100GB database with dummy data...

  • Hi,

    I am looking to perform some DBCC CHECKDB tests and for many reasons, I cannot copy a use a copy of current prod database, but the best I can aim for (yes, work with the tools ya got!) is to create a 100GB+ database that will reside on the new hardware.

    I have had a look around and not found much, any scripts out there that will pump oodles of data into an empty db?

    Many thanks

    qh

    [font="Tahoma"]Who looks outside, dreams; who looks inside, awakes. – Carl Jung.[/font]
  • I do not have my script handy at this second, but I know what it does in general.

    Create a table with several INT, CHAR, and DateTime columns

    use loop operator to get 1000 initial rows.

    for your purposes, the actual data can be anything

    use a loop to start copy the the table back into itself

    each pass will double the row count so at some point you will want to limit the number of rows in each loop

    I can quickly get into the hundreds of millions and if you are not in a hurry you can let it run overnight to get TB's of dummy rows.

  • quackhandle1975 (3/4/2014)


    Hi,

    I am looking to perform some DBCC CHECKDB tests and for many reasons, I cannot copy a use a copy of current prod database, but the best I can aim for (yes, work with the tools ya got!) is to create a 100GB+ database that will reside on the new hardware.

    I have had a look around and not found much, any scripts out there that will pump oodles of data into an empty db?

    Many thanks

    qh

    dbcc checkdb tells you specific details, about a specific database. running it on a different database, with different data, would only show you how long it runs, and would not help you evaluate the health of your real production database.

    you could use a Tally table to build ten thousand tables, and insert lots of random data into those tables, but that owuldn't do much more than waste time and disk space.

    can you at least run dbcc checkdb with physical_only on the production db?

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Yes it doesn't help thatway. You need to run it on the actual DB or a copy.

    --

    SQLBuddy

  • Lowell (3/4/2014)


    quackhandle1975 (3/4/2014)


    Hi,

    I am looking to perform some DBCC CHECKDB tests and for many reasons, I cannot copy a use a copy of current prod database, but the best I can aim for (yes, work with the tools ya got!) is to create a 100GB+ database that will reside on the new hardware.

    I have had a look around and not found much, any scripts out there that will pump oodles of data into an empty db?

    Many thanks

    qh

    dbcc checkdb tells you specific details, about a specific database. running it on a different database, with different data, would only show you how long it runs, and would not help you evaluate the health of your real production database.

    you could use a Tally table to build ten thousand tables, and insert lots of random data into those tables, but that owuldn't do much more than waste time and disk space.

    can you at least run dbcc checkdb with physical_only on the production db?

    Agreed. You could create many tables and run it up to 100 GB, but it would only be good for performance benchmarking of DBCC itself. Creating large data sets for testing is easy enough, but it wouldn't tell you any thing about your production database.

    It sounds like a frustrating endeavor to have to check a production database without being able to backup...restore a copy of it. I don't envy you in your task. What are you trying to find out?

  • Hi,

    Many thanks for all your replies and yes I do agree that running CHECKDB on a dummy database is not the same as running it in production and my results would be skewed. Unfortunately the client production database contains credit card data that is subject to PCI compliance.

    Cheers

    qh

    [font="Tahoma"]Who looks outside, dreams; who looks inside, awakes. – Carl Jung.[/font]
  • quackhandle1975 (3/4/2014)


    Hi,

    Many thanks for all your replies and yes I do agree that running CHECKDB on a dummy database is not the same as running it in production and my results would be skewed. Unfortunately the client production database contains credit card data that is subject to PCI compliance.

    Cheers

    qh

    I think you can scramble the CC data and restore the DB. The company that I work has a similar scrambling process and a special team takes care of that. Could you check if you can do that ?

    --

    SQLBuddy

  • quackhandle1975 (3/4/2014)


    Hi,

    Many thanks for all your replies and yes I do agree that running CHECKDB on a dummy database is not the same as running it in production and my results would be skewed. Unfortunately the client production database contains credit card data that is subject to PCI compliance.

    Cheers

    qh

    Is it acceptable to the "powers that be" if you were to cleanse a copy of production? For example, you could write your SQL to restore a recent full backup of your database into a test environment. Then, in the same script, overwrite all the credit card numbers with dummy data. There's got to be some card number that survives the rules check but is still representative of what's in production. Also, I'm not sure of the right term for it, but the 3-digit or 4-digit code on the back could be set to a random number and the name could be corrupted.

    If you have a dev copy of production that developers develop against, there must be some sort of cleansing process defined...or at least I hope there is.

  • sqlbuddy123 (3/4/2014)


    I think you can scramble the CC data and restore the DB. The company that I work has a similar scrambling process and a special team takes care of that. Could you check if you can do that ?

    --

    SQLBuddy

    Thanks sqlbuddy, unfortunately access to production is limited (even for me!) and it is 600GB in size, and ideally I would like to run CHECKDB with and without WITH PHYSICAL_ONLY and see what difference there is in time to complete. I guess I will have to wait until a copy of production is copied to the new 2008 R2 environment and run it then.

    qh

    [font="Tahoma"]Who looks outside, dreams; who looks inside, awakes. – Carl Jung.[/font]
  • Thanks sqlbuddy, unfortunately access to production is limited (even for me!) and it is 600GB in size, and ideally I would like to run CHECKDB with and without WITH PHYSICAL_ONLY and see what difference there is in time to complete. I guess I will have to wait until a copy of production is copied to the new 2008 R2 environment and run it then.

    qh

    You are welcome. Yeah, WITH PHYSICAL_ONLY option has less overhead and runs much quicker than full DBCC CHECKDB.

    Usually it's good for adhoc checks.

    --

    SQLBuddy

  • i don't think you really need the database at all.

    I'd create a job script for the client, nad ask the client to add a job or two , that runs the DBCC checkdb / with or without physical_only ,and emails the results(if bad)?

    that way no database needs to be copied, no PCI compliance is implied, just running the code onsite would be sufficient, i'd think.

    i can point you to a script here that i personally use that was written by regular contributor on SSC Gianluca Sartori that i use with hardley any changes:

    http://spaghettidba.com/2011/11/28/email-alert-dbcc-checkdb/

    scripting a job that does that should be fast and easy.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Lowell (3/4/2014)


    i don't think you really need the database at all.

    I'd create a job script for the client, nad ask the client to add a job or two , that runs the DBCC checkdb / with or without physical_only ,and emails the results(if bad)?

    that way no database needs to be copied, no PCI compliance is implied, just running the code onsite would be sufficient, i'd think.

    Hi Lowell,

    thanks for your suggestions, I know what your saying, I maybe didn't explain myself clearly. The 600GB prod database is to be migrated from 2005 to 2008 R2 and whilst I am waiting for the new 2008 R2 enviroment to be built I do have a 2008 R2 instance where I can test stuff out, and I initially summised that this would be okay to use for running DBCC checks however I am bound by the fact that even if I do create a 2008 DB I am not running CHECKDB on the proper restored (2005) backup.

    Cheers

    qh

    [font="Tahoma"]Who looks outside, dreams; who looks inside, awakes. – Carl Jung.[/font]

Viewing 12 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply