Node Quarantine state Issue on WSFC 2016 with Availability groups( Win 2016 + SQL 2016 )

We have encountered an odd issue in our environment, where we had AlwaysON Availability Groups configured on Windows Server 2016 and SQL Server 2016.
When operating system patching is in progress and we’re working on AG failover and failback, I found an intriguing scenario. I was unable to view the AG dashboard healthy for one of the nodes after running some network loss situations. I noticed that SQL is up and running, but the Availability Groups are in the Resolving State. I quickly checked WSFC and noticed that one of the nodes is reporting an error and is not operational. So, I tried to get my cluster node back online as soon as possible using the typical method, but it didn’t work. A cursory examination of the cluster, A brief glance at the cluster event log revealed the following error message:

The status of the node in WSFC is also quarantined… This is a fascinating topic!!

What would happen to my availability group if the node does not automatically join the cluster till 02:03:26 in my case? The quarantined cluster node, as shown in the screenshot below, indicates an availability replica disconnected and a synchronization issue.

We notice a corresponding error number 100060 with the message An error occurred while receiving data: ‘10060(A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.)’. There is no explicit notification concerning quarantine state in the SQL Server error log. I received the following sample message in the SQL Server error log from the secondary replica:

SQL Server is waiting for the cluster node to join the WSFC and start. In summary, the quarantined node is operational, hence the availability health condition will remain unchanged. As a result of the problem, the node will join the cluster automatically, which may be a good thing until the linked issue on the affected cluster node is resolved. Fortunately, according to the Microsoft whitepaper, we will not have to wait for the quarantined period to end.

To release the quarantined State of Node, run the following PowerShell command:

Start-ClusterNode -Clearquarantine

Let’s get started! Check the cluster’s health status now, and the Node has been restored to its previous state.

Author: Sri

Hello Friends, This blog is to help the IT professionals who want to become a professional SQL Server DBA but don't know how and from where to start with. So, I am going to share my experiences and my learning in this blog. Will talk about what are the pre-requisite skills required to become a Professional SQL Server DBA, how much time it takes to be a good DBA and what are the additional skill sets are required to become a good DBA. Apart from that I will also post real time sql server settings on server level and database level, Configuring High Availability. Also will share the Client requirements with real time setups if possible I will provide you with screenshots.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.