Proxy accounts failing to authenticate intermittently

  • TangoVictor, Any updates on this?

  • Any updates on this?

  • Nothing since the last update. We had another occurrence of the error just a couple days ago. Before that it was a couple weeks.

    The number of errors after patching went down significantly, but they do still pop up from time to time.

    The intermittent nature of this makes troubleshooting and trying different fixes quite difficult. One thing I thought was noticing that the failing packages all appeared to be SSIS 2016 versions but the error occurs before it starts executing anything.

    Another thought I had just today was to delete and re-create the jobs. All the jobs were moved to the new server using dbatools copy-dbaagentjob. In fact everything was moved to the new server using one of the migration cmdlets. I really don't think this would change anything. I honestly feel like it's a windows service that is the culprit.

    The version we patched Windows to is:

    Windows Server 2019 Standard

    Version: 1809

    OSBuild: 17763.2565

    If anything changes I'll try to remember to post it here.

  • We were able to capture the error in a HUGE log that we had running to attempt to catch this random issue. We shipped that off to Microsoft support, they have had the file a few weeks and are slogging through to see what they can find. I will also keep this post updated if they find anything meaningful. Thanks 🙂

  • Sounds great. Can I asked how you captured the log?

  • We ran the scripts provided by Microsoft support to generate the logs that they need.  High level procedures as follows:

    Start the script , if possible reproduce the issue ( by running multiple jobs as you have defined) and stop it.

    We should start the Auth script before the issue occurs , so that we can capture the error.

    Please enable the netlogon logging on sql server you can use article https://support.microsoft.com/en-us/kb/109626

    Path: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Netlogon\Parameters

    Value Name: DBFlag

    Value Type: REG_DWORD (if you find the REG_SZ entry for DBFlag please delete it)

    Value Data: 2080ffff

    Run command net stop netlogon & net start netlogon this will enable the netlogon logging. This registry change will not require any reboot service restart will initiate the logging.

    Please download the attached files and change the .txt to .ps1

    Run the powershell script : start-auth

    Reproduce the issue and run the script : stop-auth

    Run command wevtutil epl system c:\system.evtx

    Run command wevtutil epl application c:\application.evtx

    Run command wevtutil epl security c:\security.evtx

    Copy netlogon.log and netlogon.bak from sql server

    Run command tasklist /v>tasklistv.txt

    Run command tasklist /svc>tasklistsvc.txt

  • Maybe also ping the dbatools team with your findings !

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • We are facing a similar issues on our Win 2019 + SQL server 2019 Prod. We are not seeing on any other platforms. We had a ticket open with Microsoft as well. So far no fixes. we have latest CU 15 applied. the issue seem to be intermittent

  • Same for us, this error first appeared after we migrated from SQL 2016/Win 2016 to SQL2019/Win 2019

    We just did a straight migration using dbatools migration cmdlets to move everything to new server. Ran in test for months without issue. Then we deployed and started seeing the errors.

    We patched SQL to CU15 and Windows to the latest -Server 2019 Standard, Version: 1809, OSBuild: 17763.2565

    The number of errors dropped quite a bit but do still show up. I'm going to put in a ticket as well to help push the issue.

    If you do run across a solution please let us know and we'll do the same. thx

  • Update:

    No resolution yet, but we are also working with MS support now. Their feeling on this seems to point at a Windows server issue. We've sent logs to them using their SDP log capture scripts. They suggested a couple things but nothing that would point to an incremental error. Things like the service account being changed using the service control manager or the pw being expired.

    Also note we use MSAs for all services. I would be curious what others are using for their service accounts. With the MSAs there is a password sync process and was thinking that this may be having an affect. Although I believe the sync period is monthly.

    Here's the link to the one thing he mentioned which I'm sure everyone has already looked at and ruled out.

    https://support.microsoft.com/en-us/topic/sql-server-agent-jobs-may-fail-after-you-change-the-sql-server-agent-service-startup-account-by-using-the-windows-service-control-manager-5591c47c-c362-f542-e643-5d41f2a4454a

     

     

  • TangoVictor

    We also worked with Microsoft for a while. We provided all the logs they asked for.  They were asking for netmon and proc mon traces as wells. Not very practical since jobs only fails once in a while.

    We are using domain user account to run SQL services and Agent. We are not using MFAs. We change pwd every 120 days and restart services. We have this setup on other platforms such as Win2012/SQL 2016, WIN2016/SQL2017. no issues at all. we have been running this way for many years.

    The error only come up on jobs with proxy accounts (WIN2019/SQL 2019). we are using multiple proxy accounts for different jobs. No errors on the jobs that uses SQL agent service account.  very strange since I cannot see a pattern, it is very random.

    1. We granted local admin to proxy account to eliminate permission issues.
    2. we tried replace-a-process-level-token https://docs.microsoft.com/en-us/windows/security/threat-protection/security-policy-settings/replace-a-process-level-token
    3. We tried to replicate in dev with a couple of jobs.
    4. we added proxy accounts to master database as default db to see if it is an issue with null sid was being passed.

    none worked.

    I am starting to suspect it's a bug in WIN 2019/SQL 2019 combination. I hope Microsoft fixes it soon.

     

  • Update, this is the final resolution from Microsoft: Hope this is helpful...

     

    Please see the below resolution:

    Error

    ===============

    Step 1 (Execute SSIS Package): Unable to start execution of step 1 (reason: Error authenticating proxy domain\proxyacct, system error: ConnGetProxyPassword). The step failed.

    2. Step 1 (Execute SSIS): Unable to start execution of step 1 (reason: Error authenticating proxy domain\proxyacct, system error: The user name or password is incorrect.). The step failed

    Summary/Symptom

    ===================

    SQL Server Agents jobs using Proxy accounts and credentials may fail intermittently with error "The user name or password is incorrect" or "ConnGetProxyPassword"

    Cause

    =================

    When multiple jobs are executed at the same time on an SQL Server Agent, the temporary memory buffer that stores the decrypted password can become corrupt.

    Resolution

    =====================

    To mitigate this issue, Please stagger the jobs. Even staggering by just a few seconds should be sufficient. Do not execute any jobs on the same schedule as other jobs.

  • It's a lazy workaround, right.

    It'll have to do for now.

     

    Thank you for the feedback.

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • Makes me wonder why the occurrences dropped after patching.

    I guess if it works. Easy non-impactful change so worth a shot... we'll start changing jobs today and see how it goes.

    Thank you for posting this! Really appreciate the follow up!

    If I learn anything further will also post it here. The MS support guy said he's included the SSIS and SQL Network team but I've yet to hear from them.

  • Johan Bijnens wrote:

    It's a lazy workaround, right.

    It'll have to do for now.

    Thank you for the feedback.

    i agree. such as lazy and Band-Aid workaround instead of fixing the corrupt issue.

    They should be able to fix it since we did not see this issue on older version of SQL + WIN

Viewing 15 posts - 16 through 30 (of 32 total)

You must be logged in to reply to this topic. Login to reply