Node Quarantine state Issue on WSFC 2016 with Availability groups( Win 2016 + SQL 2016 )

We have encountered an odd issue in our environment, where we had AlwaysON Availability Groups configured on Windows Server 2016 and SQL Server 2016.
When operating system patching is in progress and we’re working on AG failover and failback, I found an intriguing scenario. I was unable to view the AG dashboard healthy for one of the nodes after running some network loss situations. I noticed that SQL is up and running, but the Availability Groups are in the Resolving State. I quickly checked WSFC and noticed that one of the nodes is reporting an error and is not operational. So, I tried to get my cluster node back online as soon as possible using the typical method, but it didn’t work. A cursory examination of the cluster, A brief glance at the cluster event log revealed the following error message:

The status of the node in WSFC is also quarantined… This is a fascinating topic!!

What would happen to my availability group if the node does not automatically join the cluster till 02:03:26 in my case? The quarantined cluster node, as shown in the screenshot below, indicates an availability replica disconnected and a synchronization issue.

We notice a corresponding error number 100060 with the message An error occurred while receiving data: ‘10060(A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.)’. There is no explicit notification concerning quarantine state in the SQL Server error log. I received the following sample message in the SQL Server error log from the secondary replica:

SQL Server is waiting for the cluster node to join the WSFC and start. In summary, the quarantined node is operational, hence the availability health condition will remain unchanged. As a result of the problem, the node will join the cluster automatically, which may be a good thing until the linked issue on the affected cluster node is resolved. Fortunately, according to the Microsoft whitepaper, we will not have to wait for the quarantined period to end.

To release the quarantined State of Node, run the following PowerShell command:

Start-ClusterNode -Clearquarantine

Let’s get started! Check the cluster’s health status now, and the Node has been restored to its previous state.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.