• jctrelfa (1/6/2009)


    Please forgive my ignorance - but this seems like a lot of extra overhead:

    DECLARE @max_id INT

    SET @max_id = (SELECT max(contactid) FROM [Adventureworks].[Person].[Contact])

    INSERT INTO Contacts

    SELECT TOP 1000000 --<<<--THE NUMBER OF ROWS!

    FirstName = (SELECT TOP 1 FirstName

    FROM adventureworks.person.contact

    WHERE ContactID = ABS(CONVERT(BINARY(6),NEWID()) % @max_id) + 1),

    LastName = (SELECT TOP 1 LastName

    FROM adventureworks.person.contact

    WHERE ContactID = ABS(CONVERT(BINARY(6),NEWID()) % @max_id) + 1)

    FROM

    Master.dbo.SysColumns t1,

    Master.dbo.SysColumns t2

    GO

    Is there a downside to a single update/select without all the additional subqueries and functions you have in the article?

    INSERT INTO Contacts (FirstName, LastName)

    SELECT TOP 100000

    FirstName,

    LastName

    FROM [Adventureworks].[Person].[Contact]

    Maybe there's something that I just don't understand (I'm relatively new to this stuff)

    Your right.. the insert code does use a lot of additional overhead, but for good reason. The code randomizes first and last names from the AdventureWorks database, based on first and last names that are preexisting. This way we can generate a lot more random data, thus making indexing better and the data more accurate. The method you proposed will only dump the number of records from adventureworks, as they exist which is less than 50,000 rows.