halifaxdal (9/3/2015)
With Luis' help, I just imported a huge data file into my local server/db on my laptop as well as a remote server (btw: The built-in import/export seems a little bit faster than bcp, both finished in less than 30 minutes which is amazing) and started some unexpected journey, here is the first stop:The table has 40110117 rows, that's right, 40 millions.
What I want is maybe I should do some pre-analysis first and generates some staging tables for later use.
The table has only 8 columns, please see the screenshot
There is a primary key ID set for the table.
I want to get average(host_score) grouped by the other columns: network, OS, IP, Vuln_name
This query might work eventually, but I cancelled it after 10 minutes waiting.
SELECT Avg (host_score) as AvgScore, Network_Group_Name, OS, Vuln_Name, recordsWritten
FROM [IP360_Test].[dbo].z_SIVMS_currentVulns
Group By Network_Group_Name, OS, Vuln_Name, recordsWritten
As you can see those columns are all varchar and with 128+ characters except for IP with 15 characters.
I hope someone who ever work on this size of data can share your experience here, I basically cannot do anything on it now.
Since "RecordsWritten" is likely different for each instance of any single Network_Group_Name, you'll probably end up with a 40 million row return that simply duplicates the individual values of Host_Score and that's the reason it's taking to long. If you were to SUM or AVG "RecordsWritten", things might be a bit more snappy. Of course, I can't actually tell because I don't know enough about the data itself.
--Jeff Moden
Change is inevitable... Change for the better is not.