SQL Server 2008 Management Data Warehouse problem

  • Hi All,

    We have set up sql server 2008 management dataware house (mdw) a month ago, it was working correctly for some time, our configuration is two 2-node SQL Server 2008 failover cluster (4 machine at total) as :

    Cluster1:

    Node 1 : Instance A

    Node 2 : Instance B + Instance C

    Cluster2:

    Node 1 : Instance D

    Node 2 : Stand by

    so 4 instance at total. This was the configuration, our problem is:

    MDW collection sets are working well for all instances except "Server Activity" collection set on instances B and C; server activity collection set is not working on this node.

    When I looked at the "view log" the main error is:

    The thread "ExecMasterPackage" has timed out 3600 seconds after being signaled to stop. on upload step of collection (its a cached collection)

    By using dbo.sp_syscollector_update_collection_set i have set logging_level to 2 for the collection I went deeper in the error:

    At performance counters part there is a warning which is repeated multiple times (not an error):

    The output column "collection_time" (703) on output "Raw File Source Output" (40) and component "RAW - Read current tempoary storage - purpose - obtain counter paths" (36) is not subsequently used in the Data Flow task. Removing this unused output column can increase Data Flow task performance.

    As far as i see the error is the job cannot be finished in one hour and error comes:

    The thread "ExecMasterPackage" has timed out 3600 seconds after being signaled to stop. on upload step of collection (its a cached collection).

    - All instances works with the same MDW.

    - I have setup a new MDW but problem is same

    - I have tried different schedules up to 6hours for uploading for Server activity, but the errors comes and job fails,

    Any ideas will be appreciated,

    Thanks

    Serter

  • I'm also experiencing a similar issue. Anyone know how to fix this?

  • Any solution to this ..even I am facing the same issue. Any pointer would be very helpful

  • SP1 has resolved some of the errors for me, now there is arithmetic overflow error, as I searched the internet, it is said SP1 CU8 is the cure, but I haven't applied the CU yet..

  • I had similar issue in my environment. I had restarted each collection set seperately. After then the job ran successfully and performance data has got collection in the Warehouse database.

  • I am also getting this timeout error sporadically. Keen to get this resolved. Will keep investigating & advise if I find a fix.

  • I am facing similar issue only for Server Activity Collection Set. IT upload job runs for more than 1hr it fails with below error message

    the thread "ExecMasterPackage" has timed out 3600 seconds after being signaled to stop.

    I have also applied SP2 on the server.

  • I am now trying to have upload job for server activity run 3 times a day every 8 hr. Also I have given retry attempts and retry interval to 1 and 5 respectively in the job. Let's C how it is going and will update accordingly within 1 day.

  • I have made a lot of changes to fix this issue, but think I finally [serendipitously] figured out how to resolve this yesterday.

    I actually did two things yesterday. 1. I capped SQL Server memory so that 2GB were free for the OS. 2. I turned off all data collectors and added them back one at a time. There is one more important note: I was monitoring the server that the Utility was running on in addition to six other servers. I have not turned data collection back on for that instance.

    I think it was change 2. that actually made the difference, but include both for your information. Since that time, I have only had one "The thread "ExecMasterPackage" has timed out" error on the busiest server. I was getting them ALL the time on several servers.

    I suspect the problem is related to the these failures causing the log on the utility server to get backed up and rendering it unable to carry the backlog. Our utility server is lightly horse-powered, since this is a relatively new technology. I think "the Utility" [which is a totally dumb name] requires quite a bit of IO throughput in order to avoid problems. I suspect I increased the conflict by monitoring the local system. I also suspect that the problem is related to DCEXEC executions and stops.

    Good luck. Hope this helps!

  • I noticed this was happening on one of my SQL instances.

    In order to fix the problem I followed these steps:

    1) right click Data Collection and select Disable Data Collection (this disables the SQL Agent jobs)

    2) stop the SQL Server Agent service

    3) opened task manager and killed all DCEXEC.exe processes (I think these were hung)

    4) start the SQL Server Agent service

    5) right click Data Collection and select Enable Data Collection

    All pending cache files were processed and new ones were being re-created again after doing this.

  • Fixed my failures. Thank you!

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply