Linking to SQL Server in one domain from a different domain works but...

Question

Post reply

Linking to SQL Server in one domain from a different domain works but...

MHilsher

SSChasing Mays

Points: 623
More actions
May 23, 2011 at 3:38 pm

#389572

Okay this is sort of complicated so it's going to be a bit long.
This is a fairly simple version of a larger problem for which setting up domain trusts is not a viable solution. I've seen other topics in other places get derailed by the argument "the only right way to do it is setting up a trust relationship between domains so drop everything else and do that". Well regardless of the "theological" argument of there being only one true way and all others should be shunned, that's just not going to happen in a variety of security situations including the one I'm dealing with.
So, The setup...
We've got 3 different DMZ zones that are split across two domains. Two of the zones are 'exposed' to the internet and support connections through HTTP and HTTPS (mostly these). Let's represent these two zones as A and B.
The second type of DMZ zone is the internal zone where the database servers reside. This zone is accessible only via a Winsock connection on port 1433 between one or more web servers in one of the exposed zones and a database server in the internal zone. Let's call this zone C since in our implementation there's currently only one.
As for Active Directory (AD) domains, Zone A and C are in one AD forest and zone B is in a completely different one. There are other servers in zone B connecting to god knows what, I don't have a need to know so I can't say. We already have servers in A connecting to C with no problems and life there is bliss.
The current solution...
We have a recent need to connect a server in Zone B to Zone C. It so happens that the B server is also running SharePoint Foundation 2010 (SPF) as the application layer interface but that's not where the problem lies. Or at least not entirely.
We used a feature in Windows Server 2008, also found in Windows 7 and can be retrofit into earlier OS's, called Credential Manager (CM). Now CM let's you create a trusted store for credentials, certificates and/or username/password pairs and to store and transmit those credentials in an encrypted "secure" form.
With CM you can assign a particular credential to a user and say "Whenever you connect with this IP(in zone C), including an optional port to confine it to a single well known service (WKS), use this login Id(from zone B) instead of your current login Id(from zone C)". This works great, believe it or not, and even works for service accounts if you do a little dancing around it by temporarily enabling them for interactive login, setting up the zone B credential, and then removing the interactive login privilege.
We were even able to install SPF and have it create the databases and such and start the website. It works perfectly for interactive logins like creating a session (in zone C) and using SSMS to connect to the database server (in zone B).
Now the problem...
Okay for non-interactive logins, like executing the SPF site as the SharePoint Administrator, it runs okay for a time period of between 1 to 3 hours.
What we found in the logs...
When it's working we see a event id of 4648, "A login was attempted using explicit credentials", in the web server's Security log. This shows both the local login that is making the request and the account name (but not the password) that has been used on the remote machine. And it shows up as a successful login attempt.
Meanwhile back at the database server, we're seeing a corresponding logon with event Id of 4624, "An account was successfully logged on.". This looks exactly the same as an interactive login on the database server except it's got a login type id of '3' signifying a network logon. These session(s) connect and stay open for around 15 minutes on average and then they eventually logoff. I think the connection management is keeping an open pipe to the database backend but if calls aren't made then the connection closes.
Usually within a few seconds, 20 to 40 seconds on average, of a connection closing on the web server side we see an explicit request of the 4648 type and a corresponding 4624 on the database server. This goes on fat dumb and happy for 1-3 hours.
Then suddenly, on the database server side we see a pair of Audit Failure messages one an event Id of 4776, "Credential Validation", error and one event Id of 4625, "Account failed to log on.", error. In the details for the 4625 we can see that it's attempting to use the account Id from the other domain (zone C) and other errors can be found indicating that a login was attempted from a non-trusted domain (zone C). On the web server side there is not indication at all that a log on attempt has taken place, the 4648 events disappear.
The final piece we have...
At the time we notice the problem if we restart the IIS Web Server the problem disappears for a while. That's the problem in a nutshell. In other forums we're getting a wide variety of unhelpful answers, eg "then don't do that". And, not unsurprisingly, the usual degree of fingerpointing indicating it's always someone else's problem.
Just wanted to throw this out to the combined wisdom of the SSC community and see if anyone has solved a similar problem and, if so, how. We're not committed to Credential Manager as an answer but we do need a secure channel for sending credentials. SQL Server authentication is NOT an option. And we don't have the option of setting up a domain trust.
As I said I've seen other people with a similar problem and the general form of the question is that the Web Server is being hosted in one place and can allow login(s) controlled by that AD domain AND the database is being hosted and controlled by a second group that has a different set of people that have access and that set of login(s) is controlled by a different AD domain. The 'limit of trust' is being able to store a valid encrypted credential for domain B in domain C. Any solution that allows that and keeps working instead of failing without leaving a footprint would be viable. To reiterate (because I've seen 'that's stupid do it by one of your disallowed solutions' in so many other places to other people's posts), SQL authentication and cross-domain trusts are both off the table.
Come Watson the game is afoot...
P.S.
Ah, I was explicit above the operating system version being Windows Server 2008. In fact it is Release 2 of 2K8. AND, the SQL version we're using is SQL Server 2008 also R2. We have tried both Standard Edition and Developer's Edition of SQL Server so it's not a case of Enterprise edition or Datacenter edition doing something that Standard won't. (In fact given the facts I suspect it's a problem on the Web Server side of the equation not SQL server at all!)

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

K. Brian Kelley SSC Guru Points: 114562 More actions · Answer 1

When you see the audit failures in the SQL Server log, what do you see in the OS' security event log? Same thing?

K. Brian Kelley
@kbriankelley

MHilsher SSChasing Mays Points: 623 More actions · Answer 2

Brian,

Thanks for asking. To Clarify the Event Id(s) given above where from the OS Security log. The Error Message numbers that we get in the SQL log at the time of the failures were:

1)

Error: 17806, Severity: 20, State: 14.

-and-

2)

SSPI handshake failed with error code 0x8009030c, state 14 while establishing a connection with integrated security; the connection has been closed. Reason: AcceptSecurityContext failed. The Windows error code indicates the cause of failure. [CLIENT: xxx.xxx.xxx.xxx].

(The value after "CLIENT:" was a valid IP Address from the webserver machine which I redacted according to my company's policy.)

These are exactly the error I would expect if a user from the C zone attempted to connect to any resource in the B zone as that credential is from an untrusted Domain. What we don't get on the SQL side is any indication of what the database messages are when it WORKS. Only that when it's doing a "explict login", the 4638 events in the webserver's OS security log, that the application layer works and we don't see anything at all in the SQL log.

Michael Valentine Jones SSC Guru Points: 64818 More actions · Answer 3

If you are getting "SSPI handshake failed" errors, have you created SPNs for the database server?

http://www.sqlservercentral.com/articles/Security/65169/

K. Brian Kelley SSC Guru Points: 114562 More actions · Answer 4

Michael Valentine Jones (5/24/2011)
If you are getting "SSPI handshake failed" errors, have you created SPNs for the database server?
http://www.sqlservercentral.com/articles/Security/65169/

SPNs aren't the issue here. Because there are not in the same realm (Microsoft calls this "forest") or in forests with a forest-level trust, Kerberos wouldn't be used here. Everything drops back to NTLM.

K. Brian Kelley
@kbriankelley

K. Brian Kelley SSC Guru Points: 114562 More actions · Answer 5

MHilsher (5/24/2011)
Brian,
Thanks for asking. To Clarify the Event Id(s) given above where from the OS Security log. The Error Message numbers that we get in the SQL log at the time of the failures were:
1)
Error: 17806, Severity: 20, State: 14.
-and-
2)
SSPI handshake failed with error code 0x8009030c, state 14 while establishing a connection with integrated security; the connection has been closed. Reason: AcceptSecurityContext failed. The Windows error code indicates the cause of failure. [CLIENT: xxx.xxx.xxx.xxx].
(The value after "CLIENT:" was a valid IP Address from the webserver machine which I redacted according to my company's policy.)
These are exactly the error I would expect if a user from the C zone attempted to connect to any resource in the B zone as that credential is from an untrusted Domain. What we don't get on the SQL side is any indication of what the database messages are when it WORKS. Only that when it's doing a "explict login", the 4638 events in the webserver's OS security log, that the application layer works and we don't see anything at all in the SQL log.

The error code 0x8009030c means there isn't a trusted connection. So basically, at some point IIS is failing to connect and authenticate properly with the OS where SQL Server resides. Because the login can't be authenticated, SQL Server rejects. As to why it fails after about 3 hours, I'm not familiar enough with Credential Manager to tell you. You might try on an IIS specific forum because this is outside of SQL Server and you would likely get a better answer with folks using Credential Manager more regularly (I don't... we don't allow domains in the DMZ). Failing that, it might be a good idea to open a case with Microsoft Support in case you guys are bumping up against a bug.

K. Brian Kelley
@kbriankelley

Michael Valentine Jones SSC Guru Points: 64818 More actions · Answer 6

Another workaround is to create the same login with the same SID and password in each domain, and use that to connect from the web server to the SQL server.

MHilsher SSChasing Mays Points: 623 More actions · Answer 7

I'm mostly a DBA with some SA duties and I've been assured by our network people that having the two SIDs be the same on both domains is something that A: They don't want us to do, manually setting SIDs; B: You shouldn't do because the SID is a combination of User, Machine and Domain factors to create a unique key (not sure but this is what I was told) and C: If this works it is most assuredly a failure in the Microsoft Security that constitutes a major hole. As in, everyone run in panic scream and shout the world is ending... ( I suspect that's a bit of an overreaction but it's an accurate description of the reaction I got).

Bottom line I can't get access to the external domain to either get or set the SID for a user in that domain. If I can't be allowed to know what it is I can't set mine, and I certainly can't set there SIDs.

I'm going to try and find a suitable Microsoft OS and/or SharePoint forum to post this issue in and I'll write back here with a resolution if I can get one.

K. Brian Kelley SSC Guru Points: 114562 More actions · Answer 8

Don't go the SID route, because they're right. You can, however, create username/passwords on the web server and the SQL Server that are identical. These would be local, not domain accounts. That is one way of getting a trusted connection. It's using pass-through authentication via NTLM.

K. Brian Kelley
@kbriankelley

hudson123 Newbie Points: 2 More actions · Answer 9

This reply has been reported for inappropriate content.

For Linking the SQL Server in one domain from a different domain works, Here your hosting can help you in this case. If your hosting server are strong.

Host Venture Pakistan

as_1234 SSCrazy Points: 2849 More actions · Answer 10

Could it be something in the firewall between the zones? Maybe a max connection time? Or perhaps, for some reason, it's seeing the connection as idle and dropping it?

Sorry if I'm going in the completely wrong direction.

Jeff Moden SSC Guru Points: 1003893 More actions · Answer 11

as_1234 wrote:

Could it be something in the firewall between the zones? Maybe a max connection time? Or perhaps, for some reason, it's seeing the connection as idle and dropping it?
Sorry if I'm going in the completely wrong direction.

It normally doesn't matter to me because new tricks even on very old posts are always welcome (to me, anyways) but, in this case, it does. You're responding to a thread that's 8 years old about an operating system (Windows Server 2008) that's more than a decade old, has serious security issues, and needs to be replaced.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

as_1234 SSCrazy Points: 2849 More actions · Answer 12

Thanks for the correction. Sorry I was reading through active topics and didn't notice that it was only the junk advertising post that was recent. I'll avoid making that mistake again.