I would review the logs. See why it can't execute that stored procedure and possibly find a maintenance window where you can run sp_replrestart for a long period of time as I expect it will complete at some point, it just has a lot of work to do and starting then stopping it is causing problems.
But step 1 would be to review ALL logs to determine why it is failing. Chances are one (or more) of the logs will point you in the right direction. when I say all logs, I mean all of the SQL logs on the source and the destination, as well as the windows logs on both the source and destination.
Generally, when errors come up, reviewing logs is my first step. And checking all of the logs is important. It may be that the replication log is only logging the little bit that you shared ("the process could not execute sp_repldone/sp_replcounters"), but it could be that the SQL error log on the destination server is logging a "login failed" type message. Or Windows may have a log entry about a NIC failure at the same time replication is failing.
So, I would start by reviewing the logs and looking at what they say at the time that replication is failing and see if you can find any commonality in them. There will be a lot of fluff for sure, but finding that one log that points you to the solution is rewarding!
The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!