December 31, 2010 at 10:18 am
I would like to preface this with I dont really post here (and I should) but I do find useful information here all the time. I figured I would share this story of some odd behaviors while debugging a minor issue the other day.
Sometimes debugging can cause more issues than expected. I recently migrated to a new virtualized distributor in order to free up some rack space at our datacenter. Our old distributor did not have much load on it and it was a dell 2850 taking up two U on our rack. It was a better use of space to make an image on a VM host and mount it there since, in all honestly, a distributor doesn’t really do much work. When I migrated to this new server the scripting feature for creating new publications and subscribers started to error. So like any good admin I set forth to debug the issue and find the culprit.
Looking at the error, it reported that executing sp_MSget_agent_names on the publisher was throwing an error due to multiple results on a subquery were returned. I scripted the sp out and added a few “Print “ + @variablename for all sp_executesql statements. I placed them before the actual exec so that I could see the call before it caused an error. I found that when this stored procedure is called on the publisher, it calls the same sp through the repl_distributor linked server that is built by replication. Up to this point my issue was multiple results due to a subquery. As soon as I added my trouble shooting I got a different error: this was kinda odd, but I decided to solve the issue on the distributor first.
The error ended up just being multiple log reader records for one log reader in the distributor db. Removing the duplicate records fixed the local call of sp_MSget_agent_names on the distributor box. But that is not very important to the point of this little story.
As a result of my troubleshooting I was getting a different error. The new error was stating that repl_distributor was not set up for RPC. I thought that this was a bit odd. So I checked the linked server on the publisher and the distributor (since replication creates one on both servers). Checking on the server option tab, both had data access, RPC and RPC Out set as true. By now I was confused and calling my computer a liar.
I really don’t like a lot of the management tools and the wizards that management studio encourages you to use because it is too much of a black box for my liking. So running sp_helpserver on my publisher and distributor showed me that the status of this linked server showed the exact same as the server option of the tab. Doing some googleing on this issue didn’t really help me much. Checking on settings such as running sp_configure on both servers and making sure each is set up for remote data access was a dead end as both were. Checking out services and making sure that Distributed Transaction Coordinator was in-fact running, which they were. Making sure that MSDTC was enabled with the proper credentials on the servers, they were.
By this time I was running a stored procedure hopelessly getting an error that was not correct. Imagine yourself getting an error saying the text you typed is not using black, but you can see in your editor, you have selected “black”. So naturally cursing at my monitor, pulling my hair out and giving a computer the finger, I took a few minutes to gather my composure.
Still thinking that things are still being a black box and hidden from me, I decide to look at the actual system table sysservers. There were two interesting columns on the system table that were not being displayed by sp_helpserver. The first column not being displayed was the “System” column. It looked like a bit column and the repl_distributor record was the only one with the bit turned on. I thought that this was a little odd but, since it was created by replication, it seemed reasonable. The other column I saw was the srvstatus column. It looked like a bitcode column and after a little more googleing I found a translation of what the values were. I found that 1, 64, 128 and the 512 bit were all on and according to this translation table I found they would be the RPC bit is on, the RPC OUT bit is on, the Data Access bit is on and the system bit is on.
I was nearly at another dead end. Most people would have tried to script out the linked servers and rebuild them, but nearing the end of the day, I didn’t want replication to have a chance to break and for me to have to rebuild it by hand without scripting support. Back to google and I tried one last search. I searched for sp_MSget_agent_names, repl_distributor and RPC. I ended up with some very obscure pages at the end of my search. I looked through all of them and nearing the end I found someone who posted a sql file. After reviewing it, it appeared to be a script for rebuilding the master database… I figured it wouldn’t be very useful at all.
Searching through the script for sp_MSget_agent_names, I found the corresponding drop at the beginning of the create but there was one more instance of that name in the script, right after the create. “execute master..sp_MS_marksystemobject sp_MSget_agent_names”. I saw this and asked myself, what the hell is this?? It didn’t show up in books online so I figured it was an undocumented stored procedure from Microsoft. Back to Google I found that apparently the way that management studio knows to put all the system stored procedures isn’t just a list, or a naming convention, there is actually a system flag that needs to be marked on each object.
As I read that, the gears started grinding. System objects… Should be sorted to the system stored procedure folder on the object explorer of managements studio… Trying to execute a system stored procedure through a system linked server… All these things started to have some sort of correlation. I refreshed my stored procedure folder on the object explorer for the master database and guess what? The sp_MSget_agent_names was now listed as a user defined stored procedure! Since I altered the stored procedure Microsoft took it upon them to mark it as a non-system stored procedure. Naturally a non system stored procedure is not allowed to access a system level linked server. So after running the “execute master..sp_MS_marksystemobject sp_MSget_agent_names” to remark this stored procedure as a system stored procedure, the execution worked correctly and the scripting of replication now works flawlessly. Be forewarned! If you alter system stored procedures, you own them and they are not a system stored procedure anymore.
December 31, 2010 at 10:41 am
Interesting story, and good advice.
Viewing 2 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply