jctrelfa (1/6/2009)
Please forgive my ignorance - but this seems like a lot of extra overhead:
DECLARE @max_id INT
SET @max_id = (SELECT max(contactid) FROM [Adventureworks].[Person].[Contact])
INSERT INTO Contacts
SELECT TOP 1000000 --<<<--THE NUMBER OF ROWS!
FirstName = (SELECT TOP 1 FirstName
FROM adventureworks.person.contact
WHERE ContactID = ABS(CONVERT(BINARY(6),NEWID()) % @max_id) + 1),
LastName = (SELECT TOP 1 LastName
FROM adventureworks.person.contact
WHERE ContactID = ABS(CONVERT(BINARY(6),NEWID()) % @max_id) + 1)
FROM
Master.dbo.SysColumns t1,
Master.dbo.SysColumns t2
GO
Is there a downside to a single update/select without all the additional subqueries and functions you have in the article?
INSERT INTO Contacts (FirstName, LastName)
SELECT TOP 100000
FirstName,
LastName
FROM [Adventureworks].[Person].[Contact]
Maybe there's something that I just don't understand (I'm relatively new to this stuff)
Your right.. the insert code does use a lot of additional overhead, but for good reason. The code randomizes first and last names from the AdventureWorks database, based on first and last names that are preexisting. This way we can generate a lot more random data, thus making indexing better and the data more accurate. The method you proposed will only dump the number of records from adventureworks, as they exist which is less than 50,000 rows.