Distributed Replay Issue.

  • Setup Distributed Replay Controller & Client, collected 20GB of trace data. When running Dreplay preprocess receive the following error -

    2018-01-25 12:03:37:061 OPERATIONAL [Preprocess]          Preprocesspass 2 of 2 in progress.

    2018-01-25 12:07:44:659 CRITICAL    [Preprocess]          [0xC8010009(1)] Failed to get an event.

    2018-01-25 12:07:44:690 CRITICAL     [ControllerService]  Controller failed with an unexpected error.


    However when I reduce the amount of data to preprocess to 4GB it completes successfully as displayed below -

    2018-01-25 12:33:18:871 OPERATIONAL [Preprocess]          Preprocesspass 1 of 2 in progress.

    2018-01-25 12:36:13:016 OPERATIONAL [Preprocess]          Preprocesspass 1 of 2 completed.

    2018-01-25 12:36:13:016 OPERATIONAL [Preprocess]          5257693events processed in total.

    2018-01-25 12:36:13:016 OPERATIONAL [Preprocess]          Elapsedtime: 0 day(s), 0 hour(s), 2 minute(s), 54 second(s).

    2018-01-25 12:36:13:016 OPERATIONAL [Preprocess]          Preprocesspass 2 of 2 in progress.

    2018-01-25 12:38:42:817 OPERATIONAL  [Preprocess]         Preprocess pass 2 of 2 completed.

    2018-01-25 12:38:42:817 OPERATIONAL [Preprocess]          5257686events processed in total.

    2018-01-25 12:38:42:817 OPERATIONAL [Preprocess]          Elapsedtime: 0 day(s), 0 hour(s), 2 minute(s), 29 second(s).

    2018-01-25 12:38:42:896 OPERATIONAL [Preprocess]          25845replayable events written to intermediate file(s) in [d:\replay\Controller].


    Are there any limitations to this or any time outs that I need to configure for it to preprocess the 20Gb of data that I have collected?

    Any help/Advice welcome and Thanking you in anticipation.

  • I've run it on more than 4GB of data. Could be one of the files is damaged and unreadable, Can't really tell from the error.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Hi Gail,

    Thank you for your reply. What is strange is that I can open the said trace file in profiler without any issues, it only fails in the preprocess stage.
    Its one of our busy server that I have capture a terabyte of server side trace files in the form of 1GB roll over files.

    Thank you for your guidance once again.
    Regards
    Kailash.

  • (Environment Details - 1 SQL Server, 1 Replay Controller & 16 Replay Client Servers)

    Quick update -
    Took a new trace using the TSQL_Replay template as suggested and no additional events as previously selected. We were able to successfully preprocess the trace. However when running Replay in Synchronous mode encountered the  "CRITICAL [ClientReplay]
    Active connections exceed 8192, connection 138936 iswaiting" issue which seems to be an in built restriction within the Replay controller and it tooks 4 hours to replay the events for a 40 minute trace therefore wonder whether the results would be inaccurate.

    Restored the database and reran the replay in stress mode,  tweaked a few setting within the DReplay.exe.Replay.Config file, and found the replay to complete within 58 minutes but the overall pass rate is 62.28%. I am assuming that this is because the SQL Server is being hammered and I see locking issues etc. some queries could be timing out.

    Distributed Replay had been recommended as a tool for right sizing testing, however we are struggling to get some relevant data out of it.  Has anyone used this tool successfully and gained some credible data? If yes, can you please share your experience?

    Any suggestions/help welcome.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply