One Million Tables

  • Comments posted to this topic are about the item One Million Tables

  • Part of the extra time is taken by poor indexing of SQL Servers meta data. If you do an existence check on a table in the 1 million database it will take significantly longer than an existence check in the 1000 database.  I have reported this to Microsoft in the past and they came back with the stock answer that this is by design.  I was unaware that it takes longer to insert data in a database with 1 million tables compared to 1000 tables.  Thanks for this very interesting.

  • Heh... I advise people against trying to display a million rows even in the Grid and it has no meta-data to resolve.  I wouldn't expect the Explorer window to deal with a million of anything, which not only displays graphically, but also has to resolve the hierarchical meta-data, as well. 😀

    Interesting article, though.  Thanks, Steve.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Thursday, May 10, 2018 5:03 AM

    Heh... I advise people against trying to display a million rows even in the Grid and it has no meta-data to resolve.  I wouldn't expect the Explorer window to deal with a million of anything, which not only displays graphically, but also has to resolve the hierarchical meta-data, as well. 😀

    Interesting article, though.  Thanks, Steve.

    Try to do the same, but when you have tables that have billions of records, it's often hard. The biggest threat is running out of client memory and crashing your system though. I've had cases where machines have completely locked forcing hard restarts. 😉

    I will say in the MPP world with Azure Data Warehouse that the chances of more than what you would want with table count is higher because MPP does often force you to duplicate fact tables on different hashing keys as they only support one column per hash in order to distribute data across the hidden 60 databases. I can see in those instances, reaching a high amount of tables for sure. Hopefully that will change when they add support for more than one key.

  • So I made a variant using TSQL running in management studio for the insert and only tracked the Milliseconds taken for the insert - I got bored waiting for thetable creation so my one million database has about 340,000 tables


    begin transaction

    declare @loops int
    declare @i int
    Set @loops = 1000
    Set @i = 1

    declare @start datetime

    declare @end datetime

    declare @taken int

    set @taken = 0

    declare @number int
    declare @query nvarchar(200) 
    while (@i<@loops)
    Begin

    set @number= ABS(CHECKSUM(NewId())) % 1000 +1

    set @query = 'insert mytable' + cast(@number as nvarchar(10)) + ' select ' + cast(@number as nvarchar(10)) + ', replicate(CHAR(65), 1000)' 
    set @start = getdate()
    EXECUTE sp_executesql @query

    set @end = getdate()

    set @taken = @taken + DATEDIFF(ms,@start,@end)

    Set @i = @i +1
    End

    select @taken

    rollback


    Like you I didn't run this as a true performance benchmark just a vague idea of any differences. I ran each version a few times against each database, these are rough numbers

                  Min        Max        Most Common
        1KDB       20        600        25-45
        340KDB    650        900        750-800

    I'm thinking a lot of the inserts in the 1KDB took under 1ms which skews the result a lot

    EDIT::

    A Quick alter to datetime2 and nanosecond checks had the 340K returning  roughly 1,000,000,000 and the 1K at anything between 59,999,900 and 653,333,300 

  • But can SQL Server support a single table spread across 1 million partitions and files?

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • You asked about PowerShell: "Note: If I'm testing poorly with PoSh, let me know. I'd be curious why there would be a difference if PoSh is slowed with Get-Random somehow. "
    I think your method for measuring elapsed time and the Write-Host cmdlet is having a much larger impact on your performance.  I ran 4 tests which I'll attach, but first the summaries.

    Run 1 - Removed SQL insert, wrapped original test in Measure-Command statement.  TotalSeconds : 4.4825437

    Run 2 - Removed SQL insert, removed Get-Date/ New-TimeSpan method for performance measurement.  TotalSeconds : 3.6304238

    Run 3 - Removed SQL insert, removed Get-Date/ New-TimeSpan method for performance measurement, commented out write-host.  TotalSeconds : 0.392731

    Run 4 - Removed SQL insert, removed Get-Date/ New-TimeSpan method for performance measurement, commented out write-host, increased loops to 1 million.  TotalSeconds : 38.7010968

  • I'm no expert in the SQL PoSH module, but does the Invoke-Sqlcmd cmdlet open and then close a connection to the database on each call?
    Is that how you wish to test the inserts?
    How about creating an SQLConnection object, open a connection and only close it after all the inserts are completed.
    No time to do this myself right now :(.

  • I'm guessing query plan caching will have had some impact. In the case where you run 10,000 iterations across the one thousand tables, the sql you generate on each iteration will be found in the query plan cache because you will only have a maximum of 1,000 different queries being possible).

    In the case where you have one million tables, each of the 10,000 iterations will likely be generating a sql query that has never been executed before and is not in the plan cache.

    It would be interesting to see the results if you leave the max random value at 1000 for the one million table test...

  • That 2 billion covers all objects including SPs, views, tables, functions, etc.

    Sys.objects has a foreign key on schema_id to sys.schemas so the 2 billion includes all schemes not per schema.

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • This was removed by the editor as SPAM

  • qbrt - Thursday, May 10, 2018 10:44 AM

    I'm no expert in the SQL PoSH module, but does the Invoke-Sqlcmd cmdlet open and then close a connection to the database on each call?
    Is that how you wish to test the inserts?
    How about creating an SQLConnection object, open a connection and only close it after all the inserts are completed.
    No time to do this myself right now :(.

    This might do that, but some of what I was interested in was the load from multiple clients. I'm not trying to be efficient in inserts here, but simulate different clients and see if the number of tables makes a difference.

  • christianb - Thursday, May 10, 2018 3:29 PM

    You have too much time on your hands 🙂

    Perhaps, but this took me almost 2 months to fit in around other stuff. It was only slightly interesting.

  • keith.gilbert-1064677 - Thursday, May 10, 2018 1:28 AM

    Part of the extra time is taken by poor indexing of SQL Servers meta data. If you do an existence check on a table in the 1 million database it will take significantly longer than an existence check in the 1000 database.  I have reported this to Microsoft in the past and they came back with the stock answer that this is by design.  I was unaware that it takes longer to insert data in a database with 1 million tables compared to 1000 tables.  Thanks for this very interesting.

    That's interesting. I hadn't considered indexing the metadata.

  • andrew 13724 - Thursday, May 10, 2018 6:38 AM

    I'm thinking a lot of the inserts in the 1KDB took under 1ms which skews the result a lot

    EDIT::

    A Quick alter to datetime2 and nanosecond checks had the 340K returning  roughly 1,000,000,000 and the 1K at anything between 59,999,900 and 653,333,300 

    Interesting.

Viewing 15 posts - 1 through 15 (of 22 total)

You must be logged in to reply to this topic. Login to reply