Troubleshoot SQL Connectivity Issues

During daily operations with SQL Instances, DBA/Application teams frequently encounter SQL Connectivity issues while connecting from the applications. In this  blog, I am covering few checklist steps to troubleshoot SQL Connectivity issues.

When a client connects to SQL Server, it goes through authentication and authorization stages. If the connectivity issue is because of authorization, client will be able to establish a physical connection with SQL Server, but the issue would be while authorizing with SQL Server like Login errors because of permission issues etc.  If the issue is because authorization, SQL Server Error log is the starting point for further troubleshooting.

Very important step in addressing SQL Connectivity issues is isolating which component is causing the issue: Broadly, we can classify the the connectivity issues in below categories:

  1. Network issue,
  2. SQL Server configuration issue.
  3. Firewall issue,
  4. Client driver issue,
  5. Application configuration issue.
  6. Authentication and logon issue.

Some of the common error messages thrown by the application while connecting to SQL Server are:

Error 1:

Test connection failed because of an error in initializing provider. [DBNETLIB] [ConnectionOpen (Connect ()).]SQL Server does not exist or access denied.

Error 2:

[Microsoft] [SQL Server Native Client 11.0]SQL Server Network Interfaces: Error Locating Server/Instance Specified [xFFFFFFFF].

[Microsoft] [SQL Server Native Client 11.0] Login timeout expired

[Microsoft] [SQL Server Native Client 11.0] A network-related or instance-specific error has occurred while establishing a connection to SQL Server. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information, see SQL Server Books Online.

Error 3:

An error has occurred while establishing a connection to the server. (provider: Named Pipes Provider, error: 40 – Could not open a connection to SQL Server) (Microsoft SQL Server, Error: 5)

An error has occurred while establishing a connection to the server.  When connecting to SQL Server 2005, this failure may be caused by the fact that under the default settings SQL Server does not allow remote connections. (provider: Named Pipes Provider, error: 40 – Could not open a connection to SQL Server) (Microsoft SQL Server, Error: 1326)

Error 4:

[Microsoft] [SQL Server Native Client 10.0] TCP Provider: An existing connection was forcibly closed by the remote host.

CHECKLIST STEPS:

1.Get the details of the servers involved in the connectivity.

  • How many servers are involved?
  • Is the application a web based/thick client application?
  • Is the client connecting to application server which then connects to SQL Server?
  • Is the connectivity issue happening only on one client box or multiple clients are not able to connect as well?
  • Collect server names and IP addresses of all the servers involved.

2. Check if the server is SQL Server is reachable. Simple test is to perform network ping to check if the server is reachable.

Ping <SQLServername> or Ping <IP address of SQL Server> If Ping itself fails to locate the SQL Server box, this indicates that client is not able to locate SQL Server machine itself and we need Network team assistance to debug this further.

3.Check if the server is SQL Server is reachable. Simple test is to perform network ping to check if the server is reachable.

4.Local connectivity on SQL Server box works on Shared memory protocol. Next step is to check the protocols enabled on SQL Server and ensure that TCP/IP and Named Pipes protocols is enabled for remote connections:

5. To check if the client is able to reach the SQL Server port or not: We can use telnet tool to test if the SQL Server port is listening to incoming connections.

For example: if the SQL Server Host name is: SQLprod and the port on which SQL is listening is: 1433
telnet sqlprod 1433
If the client box is not able to reach the port on SQL is listening on, we get the error as mentioned below:

6. If the telnet fails as well, this is a clear indication that some device (Firewall) is blocking the port. To verify if the port is blocked or not, we can use the tool portqueryUI.
PortqueryUI tool can be downloaded from: https://www.microsoft.com/enin/download/details.aspx?id=24009

If you see the output as Filtered, this indicates that the port is blocked.
For example, if the SQL hostname is aoindia and SQL Port is 1433:

Next step is check with Windows/networking team to allow the port. Ensure that the Inbound and outbound rules are created to allow the connection for SQL port at operating System level.

7. Once the port is allowed, run the PortqueryUI program again and ensure that the status shows as LISTENING.

Telnet tool can be used only to check if the TCP port is blocked or not. But the PortQueryUI tool can be used for both TCP and UDP to check if port is blocked.

8. Collect the connection string details of the application. If the application is web based, web.config file will generally have the connection string details. Gather the following details: a. Data Source b. Initial Catalog c. Integrated Security d. Provider
For example:
Provider=SQLNCLI10; Server=myServerName\theInstanceName;Database=myDataBase; Trusted_Connection=yes;
Few important details to be studied from connection string are: Type of authentication used (Windows/SQL), provider used (OLE DB or ODBC), Data Source field.

9. To isolate if the issue is caused by application or not, simple test would be to connect to the SQL instance using SQL management studio from the client box. But in most scenario’s, SSMS tool may not be installed on the client machines. In this case, we can create a universal data link tool (UDL) to test the connectivity.
From the client machine, create a text document and rename the extension as .udl.
For example: test.udl
Based on the provider used by the application (Microsoft OLE DB for SQL Server), or SQL native Client provider, select the appropriate provider list in udl file and click on next.

Enter the SQL Server name and default database and click on test connection:

If the connection succeeds through udl, this indicates that some connection string configuration is incorrectly formed in the application.

Try to simulate the connectivity issue using the same settings mentioned in the connection string and check if you are able to reproduce the issue or not.
If the connectivity works from UDL, then check with application team if the connection string can be replaced with the one created by udl. [Open the udl file in notepad to get the dynamically generated connection string]

10.  If the issue is client provider specific, check if different provider can be used to connect to SQL. From udl file, select different providers of SQL and check connectivity works or not.
For example, if the issue is specific to Microsoft OLE DB for SQL Server, check if SQL native client can be used.

11. If the application connection string is using data source name (commonly known as DSN), then create a user DSN using ODBC Data Source Administrator tool (odbcad32) and check the connectivity.

To launch ODBC administrator tool in 32-bit mode: C:\Windows\SysWOW64\odbcad32.exe

12. On the client machine: at the below registry location, Last SQL Server connect entry is stored: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\MSSQLServer\Client

Clear the entries listed for lastConnect entry for the SQL Server in question and delete these entries. [Ensure that the registry backup is taken before making any modifications]

13. Check if there are client side aliases present using SQL Server Client network utility:

To launch the tool:

In alias tab, check if there are any orphaned alias entries. Take the backup of the aliases present and delete the entries if there are any aliases created for the SQL Server instance we are troubleshooting.

To launch the SQL Client config tool in 32-bit mode: C:\Windows\SysWOW64\cliconfg.exe

14. Check if you are able to test the connectivity by forcing the protocols like TCP and Named pipes.

By forcing the protocol, if the connectivity works, check with the application team if the connection string can be modified to force the protocol in the data source field.

To check if the connection established with SQL is going over TCP/named pipes: execute the below query:

select session_id, net_transport from sys.dm_exec_connections

If no modifications can be done on connection string and if the connectivity works by forcing a protocol, check if a client side alias can be created.
Example: If the connectivity works only on Named pipes protocol, create a named pipe alias using SQL Server client network utility:

15.  Check if you are able to connect to the instance by forcing the SQL port.  For example:

If the connectivity works while forcing the port number, this indicates an issue with SQL Browser. SQL Browser is responsible to resolve the Instance name with port registered on SQL Server. Then the option is to check if SQL UDP browser port 1434 is blocked or not and troubleshoot browser issues

16.  Check if the connectivity works by specifying the IP address of the SQL Server instead of the hostname. For example:

If the IP address of instance hosting SQL is 10.16.17.18 and 5223 is the port on which SQL is listening on:


If the connectivity works by hardcoding the IP address in the Server name, then as a temporary resolution, create a host file entry with the Server name and the IP address mapping to avoid DNS name resolution and involve the DNS team for name resolution issues.

17. Even after trying out the above steps, if the connectivity issue is still not resolved, next step is to collect the Network traces using Network Monitor tool.

Hope the above steps mentioned will help you in troubleshooting SQL Connectivity issues.

TroubleShoot SQL High CPU Utilization

Below screenshot summarizes an approach to isolate SQL high CPU issues:

Below are the CHECKLIST POINTS which Microsoft Recommends :

1. From Windows task manager check the overall CPU utilization. Collect the details of number of logical processors present on the box.

2. From task manager, check the SQL Server process CPU utilization. Is the SQL CPU constantly above 70%?

3. Gather the following details:

  • How do you typically become aware that CPU is the bottleneck?
  • What is the impact of the problem? Are there particular errors that your user application?
  • When did the problem first occur? Are you aware of anything that changed around this time? (Increased workload? Change in table size? App upgrade? SQL upgrade?)
  • Can you make new connections to the server during the problem period?
  • How long did the problem last? Have you been able to do anything that seemed to help resolve the problem?
  • What system-level symptoms have you observed during the problem periods? For example, is the server console slow or unresponsive during the problem periods? Does overall CPU usage increase? If so, what %CPU is observed during the problem? What is the expected %CPU?

4. If the High CPU is causing by process other than SQL Server process (sqlservr.exe) engage the team which takes care of that process.

5. Open Perfmon and add the below counters:

Process (sqlservr):

% Privileged Time

% Processor Time

% User Time

Processor

% Privileged Time

% Processor Time

% User Time

6. If Processor Privileged time is above 25%, engage the Windows team

Processor Time = Privileged Time + User Time.

7.  Confirm that SQL is consuming high CPU on the box by validating the below counters:

Process (sqlservr): % Privileged Time

% Processor Time

% User Time

Divide the value observed with the number of logical processors to get the CPU utilization by SQL Process.

If (Process (sqlservr)% Privileged time/No of Procs) is above 30%, ensure that KB 976700 is applied for Windows 2008 R2

This step, gives an indication of if SQL Server is causing the high privilege time on the server. If SQL privilege time is high, as per the above calculations, engage the Windows team.

8. Check the below configurations from sp_configure and make sure they are as per the best practice recommendations:

Follow KB 2806535 for Max DOP recommendation settings.

9. If you are unable to connect to the SQL instance locally using SSMS, try connecting to SQL instance using Dedicated Admin connection (DAC) using:

ADMIN: Servername

10. Get the top 10 queries consuming High CPU using below query:

SELECT s.session_id,

r.status,

r.blocking_session_id ‘Blk by’,

r.wait_type,

wait_resource,

r.wait_time / (1000 * 60) ‘Wait M’,

r.cpu_time,

r.logical_reads,

r.reads,

r.writes,

r.total_elapsed_time / (1000 * 60) ‘Elaps M’,

Substring(st.TEXT,(r.statement_start_offset / 2) + 1,

((CASE r.statement_end_offset

WHEN -1

THEN Datalength(st.TEXT)

ELSE r.statement_end_offset

END – r.statement_start_offset) / 2) + 1) AS statement_text,

Coalesce(Quotename(Db_name(st.dbid)) + N’.’ + Quotename(Object_schema_name(st.objectid, st.dbid)) + N’.’ +
Quotename(Object_name(st.objectid, st.dbid)), ”) AS command_text,
r.command,

s.login_name,

s.host_name,

s.program_name,

s.last_request_end_time,

s.login_time,

r.open_transaction_count

FROM sys.dm_exec_sessions AS s

JOIN sys.dm_exec_requests AS r

ON r.session_id = s.session_id

CROSS APPLY sys.Dm_exec_sql_text(r.sql_handle) AS st

WHERE r.session_id != @@SPID

ORDER BY r.cpu_time desc

11.  Check the wait type of the queries returned from the above output. If CPU is the major bottleneck, most of the sessions will have the below waits:

SOS_SCHEDULER_YIELD

CXPACKET

THREADPOOL

If most of the queries are waiting on CXPACKET, revisit sp_configure setting for “Max degree of parallelism” and “Cost degree of parallelism” and check if they are set as per best practice recommendations.

12.  Run the SQL Standard report to get the list of Top CPU queries:

Right Click on the instance, go to reports> Standard reports

Check the Top CPU queries obtained in the report. Compare the report with the Top CPU consuming queries obtained from above step.

13.  Once the top CPU queries are identified, get the list of all the SQL tables involved using statement_text and command_text column output obtained from step 10.

Check the following:

Index Fragmentation on the top CPU driving tables

Last Statistics updated information

If the Index fragmentation is above >30 %, rebuild the index. If the statistics are not updated on the table, update the statistics.

From the Top CPU queries, if there are only few set of tables which are responsible for high CPU, share the tables list with the application team and share the statistics report and fragmentation report.

Check if there are any select queries which are causing high CPU, check with application team if they can be stopped temporarily on high OLTP servers.

14.  Once the database maintenance activity is performed (like Index rebuild and Stats update), if SQL is still using high CPU, execute the query mentioned in Step 10.

Check if the Top CPU query has changed. If the query has changed, then follow the action mentioned in Step 13. If the query is still the same, then go to next step.

15. Collect the estimated execution plan of the top CPU consuming queries involved using:

Query 1:   Get the Top CPU consuming session ID’s from the output of query mentioned in step 10.

Collect the Plan handle and SQL handle information from below query:

select sql_handle,plan_handle from sys.dm_exec_requests where session_id=<session_id>

Get the text of the query:

–replace the SQL Handle with the value obtained from above query.

select * from sys.dm_exec_sql_text (sql_handle)

Get the estimated execution plan of the query:

–replace the Plan handle with the value obtained from above query.

select * from sys.dm_exec_query_plan (plan_handle)

Query 2: The below query captures the Total CPU time spend by a query along with the plan handle. Plan handle of the query is needed to get the estimated execution of the query.

select

highest_cpu_queries.plan_handle,

highest_cpu_queries.total_worker_time,

q.dbid,

q.objectid,

q.number,

q.encrypted,

q.[text]

from

(select top 50

qs.plan_handle,

qs.total_worker_time

from

sys.dm_exec_query_stats qs

order by qs.total_worker_time desc) as highest_cpu_queries

cross apply sys.dm_exec_sql_text(plan_handle) as q

order by highest_cpu_queries.total_worker_time desc

16.  Share the estimated execution plan obtained with the application team.

Check for the operator which has high Cost.

Check the Indexes used for the operator and number of rows estimated.

Revisit the statistics and Indexes on the table reported for the operator which has high cost and make sure that there are no stale statistics.

Check if the estimated execution plan is recommending any new index to be created. If the plan reports an index recommendation, share the missing index details with the Application team.

17.  “Convert Implicit” function in execution plan can result in High CPU utilization of SQL Server as well.

Review the execution plans of the High CPU consuming queries and review the Operator with High Cost and check if there are any Convert_Implicit function is called.

In the above screenshot, CONVERT_IMPLICIT function is implicitly converting the column “NationalIDNumber” to integer whereas in the table definition its defined as nvarchar (15). So, share the report with application team and ensure that the data type passed and stored in database are having the same data type.

18.  Run the missing index query on the database which has reported high CPU and check if there are any missing indexes recommendations. Share the Index recommendation report with the Application team

19.  Tune the Top CPU consuming queries with the Database Engine Tuning Adviser to see whether database engine recommends index recommendation/statistics creation.

20.  Check for Compilations/Re-Compilations in SQL Server:

From perfmon, capture the below counters:

SQL Server: SQL Statistics: Batch Requests/sec

SQL Server: SQL Statistics: SQL Compilations/sec

SQL Server: SQL Statistics: SQL Recompilations/sec

Batch Requests/sec: Number of SQL batch requests received by server.

SQL Compilations/sec: Number of SQL compilations per second.

SQL Recompilations/sec: Number of SQL re-compiles per second.

If the recompilation count is high, check for below:

Any Schema changes

Statistics changes

SET option changes in the batch

Temporary table changes

Stored procedure creation with the RECOMPILE query hint or the OPTION (RECOMPILE) query hint

From SQL profiler, add the following events and check the stored procedures which are getting recompiled frequently.

21. Check if SQL System threads are consuming high CPU:

select * from sys.sysprocesses where cmd like ‘LAZY WRITER’ or cmd like ‘%Ghost%’ or cmd like ‘RESOURCE MONITOR’

Ghost cleanup thread >>>> Check if the user deleted large number of rows

Lazy Writer thread >>>>>>> Check if any memory pressure on the server

Resource Monitor thread >> Check if any memory pressure on the server

22.  If the Top CPU consuming queries has the wait type: SQLTRACE_LOCK, check there are any traces running on the server using:

select * from sys.traces

23. Collect the PSSDIAG during the Top CPU issue time. Refer KB 830232. Load and Analyze the data in SQL Nexus tool.

24. Even after implementing above action plans, if the SQL CPU utilization is high, then increase the CPU on the server.