• halifaxdal (9/3/2015)


    With Luis' help, I just imported a huge data file into my local server/db on my laptop as well as a remote server (btw: The built-in import/export seems a little bit faster than bcp, both finished in less than 30 minutes which is amazing) and started some unexpected journey, here is the first stop:

    The table has 40110117 rows, that's right, 40 millions.

    What I want is maybe I should do some pre-analysis first and generates some staging tables for later use.

    The table has only 8 columns, please see the screenshot

    There is a primary key ID set for the table.

    I want to get average(host_score) grouped by the other columns: network, OS, IP, Vuln_name

    This query might work eventually, but I cancelled it after 10 minutes waiting.

    SELECT Avg (host_score) as AvgScore, Network_Group_Name, OS, Vuln_Name, recordsWritten

    FROM [IP360_Test].[dbo].z_SIVMS_currentVulns

    Group By Network_Group_Name, OS, Vuln_Name, recordsWritten

    As you can see those columns are all varchar and with 128+ characters except for IP with 15 characters.

    I hope someone who ever work on this size of data can share your experience here, I basically cannot do anything on it now.

    Since "RecordsWritten" is likely different for each instance of any single Network_Group_Name, you'll probably end up with a 40 million row return that simply duplicates the individual values of Host_Score and that's the reason it's taking to long. If you were to SUM or AVG "RecordsWritten", things might be a bit more snappy. Of course, I can't actually tell because I don't know enough about the data itself.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)