AlwaysON Lease Timeout or Expired Error

Recently performed side-by-side migration from SQL Server 2008R2 Standalone databases(30 DB’s) to SQL Server 2016 by building new Windows Serever 2016 operating systems and also configured Alwayson( 2 replicas) for SQL 2016 servers with multiple Availability Groups.

Application team has modified their connection string to connect with ‘ ListnerName ‘ and tested the application connectivity. App team confirmed everything is working as expected and all good.

After 1 week got a call from application team(middle of the night) that they are having connectivity Issues and trouble connecting to databases.

When I logged in and checked the database server that the Primary Availability Groups alwayson Dashboard looks HEATHY and GREEN . All Databases are ONLINE and SYNCHRONIZED.

Checked if any connections are hitting to Databases using sys.sysprocesses and seen Application coonection are coming to database server.

So, informed the same to application team and they replied saying they have restarted their application services and then database connectivity is working fine.

I opened SQL Server Error Log and seen below error messages :

Message

SQL Server hosting availability group ‘XXX’ did not receive a process event signal from the Windows Server Failover Cluster within the lease timeout period.

Message

Error: 19421, Severity: 16, State: 1.

Message

The lease between availability group ‘XXX’ and the Windows Server Failover Cluster has expired. A connectivity issue occurred between the instance of SQL Server and the Windows Server Failover Cluster. To determine whether the availability group is failing over correctly, check the corresponding availability group resource in the Windows Server Failover Cluster.

Message

Error: 19407, Severity: 16, State: 1.

Message

The state of the local availability replica in availability group ‘XXX’ has changed from ‘PRIMARY_NORMAL’ to ‘RESOLVING_NORMAL’. The state changed because the lease between the local availability replica and Windows Server Failover Clustering (WSFC) has expired. For more information, see the SQL Server error log, Windows Server Failover Clustering (WSFC) management console, or WSFC log.

Message

Always On Availability Groups connection with secondary database terminated for primary database ‘XXX’ on the availability replica ‘XXXXX’ with Replica ID: {95d2878a-3fe0-48bf-aa2a-a41805fda786}. This is an informational message only. No user action is required.

Message

Remote harden of transaction ‘user_transaction’ (ID 0x000000002af6b5b4 0001:99ea0a50) started at Apr 17 2021 1:46AM in database ‘XXX’ at LSN (7544210:90297:16) failed.

Message

The availability group database “XXX” is changing roles from “PRIMARY” to “RESOLVING” because the mirroring session or availability group failed over due to role synchronization. This is an informational message only. No user action is required.

Message

State information for database ‘XXX’ – Hardened Lsn: ‘(7544210:90297:1)’ Commit LSN: ‘(7544210:89962:3)’ Commit Time: ‘Apr 17 2021 1:45AM’

Message

Stopped listening on virtual network name ‘ListnerNAme’. No user action is required.

Message

Process ID 250 was killed by an ABORT_AFTER_WAIT = BLOCKERS DDL statement on database_id = 16, object_id = 0.

Message

Always On: Explicitly transitioning the state of the Windows Server Failover Clustering (WSFC) resource that corresponds to availability group ‘XXX’ to Failed. The resource state is not consistent with the availability group state in the instance of SQL Server. The WSFC resource state indicates that the local availability replica is the primary replica, but the local replica is not in the primary role. This is an informational message only. No user action is required.

Message

Always On: The local replica of availability group ‘XXXX’ is preparing to transition to the resolving role in response to a request from the Windows Server Failover Clustering (WSFC) cluster. This is an informational message only. No user action is required.

Message

Unable to access availability database ‘DBNAME’ because the database replica is not in the PRIMARY or SECONDARY role. Connections to an availability database is permitted only when the database replica is in the PRIMARY or SECONDARY role. Try the operation again later.

Message

Always On: The local replica of availability group ‘XXXX’ is preparing to transition to the primary role in response to a request from the Windows Server Failover Clustering (WSFC) cluster. This is an informational message only. No user action is required.

Message

The state of the local availability replica in availability group ‘XXXX’ has changed from ‘RESOLVING_NORMAL’ to ‘PRIMARY_PENDING’. The state changed because the availability group is coming online. For more information, see the SQL Server error log, Windows Server Failover Clustering (WSFC) management console, or WSFC log.

Message

Started listening on virtual network name ‘ListnerName’. No user action is required.

Message

The lease worker of availability group ‘XXXXX’ is now sleeping the excess lease time (164797 ms) supplied during online. This is an informational message only. No user action is required.

Message

The state of the local availability replica in availability group ‘XXXX’ has changed from ‘PRIMARY_PENDING’ to ‘PRIMARY_NORMAL’. The state changed because the local replica has completed processing Online command from Windows Server Failover Clustering (WSFC). For more information, see the SQL Server error log, Windows Server Failover Clustering (WSFC) management console, or WSFC log.

Message

A connection for availability group ‘XXXX’ from availability replica ‘InstanceName’ with id [3DB08082-C6D9-47E6-B53A-CCC8B15C21D2] to ‘AU2106CP0644\WDCP01’ with id [95D2878A-3FE0-48BF-AA2A-A41805FDA786] has been successfully established. This is an informational message only. No user action is required.

From SQL Server Error log , I could see the issue occurred because of Lease Time Out which was happened on Primary replica as it controls the mechanism of leae timeout .
This could occur due to high resource utilization on Database server( CPU 100% Usage), at this time SQL Server may go to hung or not responding state for sometime. This could have been resulted to Lease timout or expired.

To find out CPU Usage follow below steps to set it in Perfmon

After creating Data collection and clicked Start , leave it for running until you get lease timeout error or the issue re-Occurs. Once the issue happened again now stop the data collector, open the log in Performance Monitor and review the values of Processor / % Processor Time counter to see if sustained CPU utilization is detected during the time of the lease timeout.

In SQL Server Agent jobs I can see Optimization job was triggered at that time , when the job started running after few minutes the lease timeout error was occurred, also the optimization job was not completed successfully.

BY this I can say When Optimization job was triggered as per the schedule the CPU usage went high at 100% ,also there are Blockings happened on the Database server we can see SPID killed by SQL Server in errrolog message due to /this the Lease Timeout or Lease Expired error occurred .

By default the Lease Timeout Value is ‘ 20000 ‘ if the issue re surfacing again and again we can Increase the Lease Timout Value from Windows Server Failover Cluster( WSFC ).

Open ‘ cluadim ‘ –> click on Roles in Left Pane–>Now On Right Pane click Availability Group–>Down you will seeing Resource TAB,click on it —>Right click on ListnerName –> click Properties–>Again on Properties TAB—> change the Lease Timeout value here —> then click OK .

But the changing Lease Timeout Value will not resolve your issue , it is only temporary Issue. Need to find out why blockings had occurred , also try changing the Optimization job schedule and will see.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.