today one of our aoag crashed. Suddenly we were not able to connect via SSMS to the aoag listener, or to the actice primary node.
We got this entry in the ERRORLOG, but I was not able to found something in the internet:
spid143,Unknown,Failed allocate pages: FAIL_PAGE_ALLOCATION 16
(ERRORLOG from the sql server, who was the primary when the error ocurred)
11/10/2023 13:44:30,spid83s,Unknown,A connection timeout has occurred on a previously established connection to availability replica 'srdbpw127' with id [A7B7B4DF-FEEC-48D1-8449-74A496D09E9D]. Either a networking or a firewall issue exists or the availability replica has transitioned to the resolving role.
(ERRORLOG from the sql server, who was the secondary whe the error ocurred)
We were able to do a failover, and the aoag was fine. But we had to restart the sql server service, to be able to connect to the sql server.
In Red Gate SQL Monitor we could see, that Red Gate was able to collect the windows counter, but was not able to collect the sql server counter.
I have no idea whats wrong:
virtuell sql server 2022 (up to date)
windows server 2022 (up to date)
64 GB RAM
I thougt about NUMA problems, could this be a valid reason for this error?
thanks a lot,