Patching Issue – Fixing of Missing Windows Installer Cache Files(.MSI & .MSP) which occurs during SQL Server Update

When you are installing a Microsoft SQL Server service pack or a cumulative update, sometimes we will encounter the error messages with Windows Installer Cache problems. The Windows Installer Cache are located at c:\windows\installer folder which stores important files for applications installed using the Windows Installer technology and should not be deleted. The problem occurs when we are performing these actions such as uninstall, repair, or update SQL Server

Note : If the Cache installer is compromised then you will not see any immediate problems

.The issue occurs when the cache files are deleted from the location c:\windows\installer.

The problem occurs when the Windows Installer database file (.msi) and the Windows Installer patch file (.msp) is missing from the Windows Installer cache folder. The Windows Installer cache is located in the following folder:

        C:\Windows\installer

After doing SQL Patching changes continuously for so many servers without any Issue yesterday when Implementing SQL Patching on one of the Critical Server finally the day has arrived and faced Missing of MSI & MSP error issue

When I run SQL Server 2008R2 ServicePack 3 Initial Setup wizard itself not getting opened and throwing below error :

As per the Error I have fixed the Issue of Missing File by Copying the file .MSI which was mentioned in above to INSTALLER Folder  from SQL Media and started running Service Pack 3 again , still having issue and got other error as below :

I have fixed this error as well in the same way which I did as above and Initiated Service Pack 3 again now this time the Initial setup wizard was opened and the setup was started

After clicking the Update button in few minutes an error was popped up saying another MSI file was missing as it is not the best way to fix these MSI and MSP errors  also not sure how many files are Missing.

So, I have Cancelled the Setup and checked for Missing Files using FindSQLInstalls.vbs script
which helped me to find missing of MSI & MSP files before we start the ServicePack 3 Setup.

Script was found in MICROSOFT site  https://support.microsoft.com/en-gb/help/969052/how-to-restore-the-missing-windows-installer-cache-files-and-resolve-p#%2Fen-gb%2Fkb%2F969052

Copied the script to Server ‘ XXX ‘and saved at Location E:\MSI\FindSQLInstalls.vbs ..Ran the script from Command Prompt by redirecting the directory from Drive C to E:\MSI

                                          FindSQLInstalls.vbs E:\MSI\output.txt

Note : This script needs Elevated Administrative Privileges , Open the Command Prompt with Run As Administrator

The output file contains all the Missing Files of MSI & MSP which needs to fix before staring the SP setup

Open the output.txt file and search with key word  ” DOES NOT   “

Below Screenshot for finding out of MSP file

Below Screenshot for finding out MSI File

Copied the Location( COPY “Path”) from the txt file and kept in a side which is Highlighted in above screen shots

Now opened the Media file and copied the MSI file to C:\Windows\Installer Folder and Renamed the MSI File to Cached name which is shown in Error

Again searched with key word  ” DOES NOT ” till the end of the txt file and found total 25 Missing Files

Fixed all Missing files and ran the Script again in Command Prompt  FindSQLInstalls.vbs E:\MSI\output.txt to check that no files are missing before starting the SP installation.

Opened the Output file and searched for key word ” DOES NOT   ”  this time I didn’t find any Missing File which Needs to take Action

So started running SP3 but after Fixing the MSI and MSP files it asked for Server RESTART

I requested windows team to reboot,once server is came UP Initiated SP3 , this time patch was completed successfully.

After SP completion it was asked for RESTART of Server again and once the Server is Up I  started Installing Security Update and this time again some new Issue I faced which is shown below :

The location which is showing in above error was not exist on Drive H  …So, I have clicked browse button and given the correct path of the File sqlncli.msi and clicked OK

But popped up with error not a valid one

So, Clicked ok and Cancel but the setup was Running and Finally the Security Update was Successfully Applied

The Security Update KB article is Reflecting KB4057113 and all looks Good .

NOTE :

1. The SQL Media\Service Pack \ Security Update \ Cumulative Update any patch or ISO file we need to EXTRACT Manually first to a Specific location or folder and then run the installation by clicking on “SETUP.exe” file which is Best Practice for Installation

2 .If we run the Package directly without Manual extraction the Software will extract  Automatically to Some UN-known Location and will start the Setup Once that the server is REBOOTED sometimes that UN-known location will not be exist on the server and next time when we run any New SP\CU\SU it will check for Previously Extracted location but unfortunately that location is not available on Server it will throw an error and the installation will Failed.

Node Quarantine state Issue on WSFC 2016 with Availability groups( Win 2016 + SQL 2016 )

Recently in our environment we saw weird issue where we have Windows Server 2016 + SQL Server 2016 configured with AlwaysON Availability Groups.
I noticed an interesting scenario when operating System patching is going on and We are working on doing AG failover and failback. After simulating some network outage scenarios, I was not able to see the AG dashboard healthy for one of node .I see SQL is up and running good but the Availability Groups are in Resolving State , immediately i have checked in WSFC and saw one of the Node is showing Error and is not UP.So, tried to bring back my cluster node online immediately by using traditional way but it didn’t came up. A quick look at the cluster event log led me to notice some error message as shown below :

Also in WSFC the Node status showing as Quarantined…Interesting Issue!!

Here, the Node will not automatically join the cluster until 02:03:26 in my case and what will happen to my availability group? Well no surprise here, the quarantined cluster node means an availability replica disconnected and a synchronization issue as shown in below in Screen shot:

We notice a corresponding error number 100060 with the message An error occurred while receiving data: ‘10060(A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.)’. There is no specific message from SQL Server error log about quarantine state. From the secondary replica, I got the following sample message into the SQL Server error log:

SQL Server is waiting for the cluster node to start and rejoin the WSFC. In short, overall the quarantined node is active so the availability health state will not change . As pet the error the node will join automatically to cluster , it may be a good thing until you don’t fix the related issue on the concerned cluster node. Fortunately, as stated by Microsoft document, this is not mandatory for us to wait for the quarantined period to finish.

We can use the following PowerShell command to come out of quarantined State of Node :

Start-ClusterNode -Clearquarantine

Lets go! Check the cluster health state now and the Node was got back to normal

Upgrade/migrate SQL Server 2014 Availability group(s) running on Windows Server 2012 R2 to SQL Server 2019 running on Windows Server 2016

What we are performing today ?

To upgrade/migrate (side-side) SQL Server 2014 Availability group(s) running on Windows Server 2012 R2 to SQL Server 2019 running on Win 2016 with the least amount of downtime.

My Current Environment Setup :

I have a 2 node Windows Server 2012 R2 Failover cluster hosting SQL Server 2014 Always On Availability Group with synchronous commit mode. I have a listener configured for my applications to connect. These 2 replicas are running SQL 2014 with Latest builds.

Replica(Node) Details are below :

W12SQL2014A
W12SQL2014B

Above are my two replicas(Nodes) which are running with WindowsServer2012R2+SQL 2014.

Our main Agend is
I have SQL 2014 AG running on Windows Server 2012R2 which needs to be upgraded/migrated to SQL Server 2019 running on Windows Server 2016 with a very minimal downtime and no configuration changes for the Application teams.

Note : Here I am not performing In-place upgrade.Which requires more downtime.So, let’s assume In-place upgrades are not allowed.

High Level Implementation Plan :

. Get ready with 2 new builds of Windows Server 2016 (W16SQL2019A & W16SQL2019B)

. Install SQL Server 2019 on both win 2016 servers

. Add W16SQL2019A and W16SQL2019B nodes to the same windows cluster

. Enable Alwayson HADR Feature in SQL 2019

. Add the 2 nodes as replicas in SQL Server Alwayson Availability Group

. Join the Databases .

. Check the Dashboard of AG and make sure all looks healthy and Green

. At the change window of final cutover date/time, failover SQL Server 2014 to SQL Server 2019 and remove the old replicas( SQL Server 2014) from AG.

. And in Windows Server 2012R2 EVICT both nodes from the Cluster

Important things be a taken into Consideration after Successful configuration

Adding Win2016 nodes to existing Win2012R2 Same cluster is Leveraging MIXED MODE. Please don’t run this mixed mode WSFC for a long period. Microsoft may not support if you stay in mixed mode for more than 4 weeks.This blog is only to perform rolling upgrades to make your systems are really highly available. Try to wrap up the entire process in a day or two and make sure to be done with it.

Sometimes after adding the new replicas the Databases synchronization will not start automatically. you need to Join each databases Manually to get it SYNCHRONIZED.

Change the failover mode to manual for all replicas just to make sure cluster has no control over failing over my AG automatically.

At the final cut over before Failover the AG make sure to change Availability Mode to SYNCHRONOUS COMMIT mode and then proceed with your failover of AG from W12SQL2014A( Which is a Primary Replica) to W16SQL2019A.

At this point, W16SQL2019A took over the primary role of all your databases participating in your AG have been upgraded to SQL 2019 and the other SQL 2019(W16SQL2019B ) Instance will also be in sync

But the two SQL 2014 Instances will be in unhealthy state, In fact those databases will become inaccessible at this time, since Logs cannot be shipped from higher(2019) to lower(2014) version.

That mean the Databases will not be SYNCHRONIZED on SQL 2014 Instances

I did a Cleanup by removing both SQL 2014 Instances from AG as replicas and EVICTED the W12SQL2014 Nodes from WSFC.