• I have a DB that has 5 jobs that send about 300 emails on the 15th of each month. These would fail occasionally and it would drive me nuts figuring it out. What I did was use the return value from xp_sendmail within my sp to check for failure and if so use

    exec master..xp_logevent to log the failure with information from the point of failure.

    won't solve your problem, but it might give you an idea on how to pin it down.