How to troubleshoot Slow-running queries on SQL Server

Steps to troubleshoot SQL Server queries that are taking too long to execute :

When a user approached you and said his query was taking too long to complete, As a DBA, we perform all of the basic checks, such as checking CPU usage, memory usage, any blockings, deadlocks, locks, fragmentation, statistics update, and so on. If the user still experiences slowness after performing all of these checks, we must follow the steps below to troubleshoot or fix slow-running queries on SQL Server.

1 Check for Index Analysis :That means we need to check if correct indexes are exist. When a user requests that a DBA study a slow-running query with which he is having problems, we can utilise Database Tunning Advisor to analyse the query and provide recommendations for indexes that will help us improve query performance. We may check if right indexes are placed in a database this manner.

2 We need to search for join hints: HASH,LOOP,MERGE, and REMOTE are the four forms of join hints. I believe SQL Server mostly use these operators: LOOP, HASH, and MERGE. This hint will override SQL Server query optimization, preventing the query optimizer from selecting a fast execution plan. Because the optimizer changes in the current SQL Server version or builds may have a negative impact on performance or may have no impact at all.

So, if the query we’re looking at has any hints, try to decrease or delete them before rerunning the query to see how it performs.

3 Check the Execution Plan: SQL query Analyser may be used to check the query execution plan. We may achieve this by selecting Display Estimated Execution Plan from the query option in the query window.

4 Check the ShowPlan Output: The ShowPlan Output provides information about the query execution plan used by SQL Server. The basic checks we must observe from the execution plan to establish if it is using the correct plan are listed below :

–> Index Usage
–> Join Order Usage
–> Join Types
–> Parallel Execution

Tempdb Full troubleshoot

Tempdb Data Files are Full :

I’m constructing a temporary table to fill up my Tempdb using a script, and I’m doing some testing in my local to see how to fix the tempdb full issue. For testing purposes, I set initial sizes for tempdb data files and disabled Autogrowth.

I verified the Available space on each Tempdb file before running the script to build a table, and each one has 99 percent free space:

USE tempDB
go

-- get data file space and locations
SELECT b.groupname AS 'File Group'
,a.NAME
,CONVERT(INT, a.Size / 128.000, 2) AS [Currently Allocated Space (MB)]
,CONVERT(INT, FILEPROPERTY(a.NAME, 'SpaceUsed') / 128.000, 2) AS [Space Used (MB)]
,CONVERT(INT, (a.Size - FILEPROPERTY(a.NAME, 'SpaceUsed')) / 128.000, 2)
AS [Available Space (MB)]
,(CONVERT(INT, ROUND((a.Size - FILEPROPERTY(a.NAME, 'SpaceUsed')) / 128.000, 2)) * 100)
/ (CONVERT(INT, ROUND(a.Size / 128.000, 2))) AS PercentFree
FROM sys.database_files a(NOLOCK)
LEFT OUTER JOIN sysfilegroups b(NOLOCK) ON a.data_space_id = b.groupid
ORDER BY PercentFree

Please see the SS below to run the aforesaid script to check the percent of space available on each tempdb file:

Now I’m using a loop to develop a temporary table with 1000000000 rows. The script I used to fill up my Tempdb is as follows:

SET NOCOUNT ON
GO

-- Create temp table
CREATE TABLE #tblToFillTempDBtest (col1 nchar(4000));
GO

-- Begin loop to insert records
DECLARE @Loop AS INT = 1

WHILE(@Loop <= 1000000000)
BEGIN
INSERT INTO #tblToFillTempDBtest VALUES(@Loop)
SET @Loop += 1
END

I ran the script in SSMS and noticed that my tempdb was rapidly filling up; after a while, it produced an error stating that the Tempdb Primary filegroup was FULL. Please see below for Error SS:

If we look at the Available Free Space and Percent Free columns from the first script, we can see that all three tempdb data files have ‘0’ percent free space. The following is a screenshot:

We know the suspect Query that is consuming all of the space on tempdb and refusing to be released in this scenario, however in a real-time or production environment, we are unsure of the culprit Query when we have a tempdb full issue..

To determine which files are using or occupying Tempdb space, we use the following query, which will provide the Tempdb utilisation of physical files. Based on the results of the script, we must move to the next step, which is to look for column values of Internal Objects, VersionStore

SELECT SUM(unallocated_extent_page_count) AS [free pages],
SUM(unallocated_extent_page_count 

+user_object_reserved_page_count

+internal_object_reserved_page_count

+mixed_extent_page_count

+version_store_reserved_page_count) * (8.0/1024.0/1024.0) AS [Total TempDB SizeInGB]
, SUM(unallocated_extent_page_count * (8.0/1024.0/1024.0)) AS [Free TempDB SpaceInGB]
,unallocated_extent_page_count
,user_object_reserved_page_count
,SUM(version_store_reserved_page_count * (8.0/1024.0/1024.0)) AS [version_store_GB]
,internal_object_reserved_page_count
,mixed_extent_page_count
FROM tempdb.sys.dm_db_file_space_usage
--where [FreeTempDBSpaceInGB]>50
group by unallocated_extent_page_count,user_object_reserved_page_count,internal_object_reserved_page_count,mixed_extent_page_count;

We can conclude from the above SS that I have no difficulties with versionStore and internal objects because the values are relatively low. If the values are excessive, please see the Microsoft TechNet article below.

https://docs.microsoft.com/en-us/previous-versions/sql/sql-server-2008-r2/ms176029(v=sql.105)?redirectedfrom=MSDN

To find out who else is using Tempdb, I used the query below to identify Active reQuests. If the Query is done, the reQuests will not appear in this Query.

;WITH task_space_usage AS (
-- SUM alloc/delloc pages
SELECT session_id,
request_id,
SUM(internal_objects_alloc_page_count) AS alloc_pages,
SUM(internal_objects_dealloc_page_count) AS dealloc_pages
FROM sys.dm_db_task_space_usage WITH (NOLOCK)
WHERE session_id <> @@SPID
GROUP BY session_id, request_id
)
SELECT TSU.session_id,
TSU.alloc_pages * 1.0 / 128 AS [internal object MB space],
TSU.dealloc_pages * 1.0 / 128 AS [internal object dealloc MB space],
EST.text,
-- Extract statement from sql text
ISNULL(
NULLIF(
SUBSTRING(
EST.text,
ERQ.statement_start_offset / 2,
CASE WHEN ERQ.statement_end_offset < ERQ.statement_start_offset THEN 0 ELSE( ERQ.statement_end_offset - ERQ.statement_start_offset ) / 2 END
), ''
), EST.text
) AS [statement text],
EQP.query_plan
FROM task_space_usage AS TSU
INNER JOIN sys.dm_exec_requests ERQ WITH (NOLOCK)
ON TSU.session_id = ERQ.session_id
AND TSU.request_id = ERQ.request_id
OUTER APPLY sys.dm_exec_sql_text(ERQ.sql_handle) AS EST
OUTER APPLY sys.dm_exec_query_plan(ERQ.plan_handle) AS EQP
WHERE EST.text IS NOT NULL OR EQP.query_plan IS NOT NULL
ORDER BY 3 DESC, 5 DESC

The following query displays the session space usage; however, if the session is closed, you will not receive any results of that Query 

SELECT DES.session_id AS [SESSION ID],
Db_name(DDSSU.database_id) AS [DATABASE Name],
host_name AS [System Name],
program_name AS [Program Name],
login_name AS [USER Name],
status,
( user_objects_alloc_page_count * 8 ) AS
[SPACE Allocated FOR USER Objects (in KB)],
( user_objects_dealloc_page_count * 8 ) AS
[SPACE Deallocated FOR USER Objects (in KB)],
( internal_objects_alloc_page_count * 8 ) AS
[SPACE Allocated FOR Internal Objects (in KB)],
( internal_objects_dealloc_page_count * 8 ) AS
[SPACE Deallocated FOR Internal Objects (in KB)],
cpu_time AS [CPU TIME (in milisec)],
total_scheduled_time AS
[Total Scheduled TIME (in milisec)],
total_elapsed_time AS
[Elapsed TIME (in milisec)],
( memory_usage * 8 ) AS [Memory USAGE (in KB)],
CASE is_user_process
WHEN 1 THEN 'user session'
WHEN 0 THEN 'system session'
END AS [SESSION Type],
row_count AS [ROW COUNT]
FROM tempdb.sys.dm_db_session_space_usage AS DDSSU
INNER JOIN sys.dm_exec_sessions AS DES
ON DDSSU.session_id = DES.session_id
ORDER BY [space allocated for internal objects (in kb)] DESC

The Temp space utilization session ID can be found in the above Query result:

The Session ID, SystemNames(Hostname), Username, and the space allocated may all be found in the above screenshot. I took note of the session ID 68 and used the script below to find the culprit.

SELECT TEXT FROM sys.dm_exec_connections CROSS APPLY sys.dm_exec_sql_text(most_recent_sql_handle) WHERE session_id = (68)

We can ask the user if the Query from the preceding script can be killed; if so, the space will be returned to the tempdb data files.

Note : The above is a simple method to tackle the issue; however, in a live environment, you may encounter additional issues such as Performance Issues due to Tempdb Contention Issues, for which the troubleshooting processes are different.

Remove/Delete Tempdb Data Files

Tempdb Data Files Remove:

Tomorrow I had a change activity scheduled to remove Tempdb Data Files. So, I am testing the same in my local instance .

I have created 3 tempdb ndf files along with 1 Primary Data file and given a size of each to 1024MB. Please find the screenshot below :

From above SS we can see all Data files are having 99% free Space Available. As the change needs to be done on Production environment which will not be in same case like above , I am creating a temporary table and inserting the data into the table for filling up Tempdb . After filling up Tempdb we have 0% free space Available in all Tempdb files , please find SS below :

So, for removing file in a Database we need to first EMPTY the file and then initiate the command for removing the file.

To EMPTY the file we need to use shrink command DBCC SHRINKFILE (YourLogicalFileName, EMPTYFILE);

Here I am trying to remove the Tempdb Datafile ‘tempdev3‘, I executed below command in TEMPDB :

USE [tempdb];
GO
DBCC SHRINKFILE (tempdev3, EMPTYFILE);
GO

After executing the above command it took sometime and returned with error like shown in below SS:

I tried restarting the instance and executed above command but no luck got the same error , I also tried connecting in Single User mode to run above command but still no luck returned with same error.

So, I thought to execute some DBCC commands and run Remove file command by putting it all together in a Same Query window . Please find below SS:

Now the error says the file not EMPTY , So, I tried emptying the file but after that also getting failed.

I have googled about how to remove tempdb files in a Production environment ,seen so many sites and most of them said to take the Instance in a Minimal configuration Mode i.e., start the Instance with /f Parameter , connect using SQLCMD and then try removing the file.

So, I tried about method by Stopping SQL Services on my local using Command Prompt , below is the command I used:

NET STOP MSSQLSERVER

After stopping SQL Services I started my default Instance using /f parameter :

As said above after connecting instance in Minimal Configuration Mode ,Now I am connecting with SQLCMD to my local Instance as shown below but it returns error while connecting due to Named Pipes in DISABLED State :

I ENABLED Named Pipes and restarted the instance to take effect but still no luck getting below error Server is in Single User mode.

I tried stopping instance again and start with /f parameter and then tried connecting using SQLCMD mode.

If you see in above SS its got connected with SQLCMD mode after multiple attempts. So, I directly executed the statement

ALTER DATABASE [tempdb] REMOVE FILE [tempdev3]
GO

And it didn’t returned any error So, I hope the file was removed successfully, Also here I didn’t run the command DBCC SHRINKFILE (tempdev3, EMPTYFILE), I ran Alter database directly to remove file and taken the instance offline ,started instance without /f Parameter . After starting the instance, connected to my default instance and checked tempdb files, it (tempdev3) was removed

But the file tempdev3 is still exist on physical location as shown below :

I am trying to DELETE file on physical location and was removed without any issue, After deleting Space also reclaimed to disk

Difference between HotFix,GDR,Cummulative Update and ServicePack

When ever new version of a product is being developed, it will be available to some selected community members and When a new version of a product is being produced, it will be made available for testing to a restricted group of community members and customers. For the product, this is referred to as ALPHA builds.

If the development process continues smoothly and the product becomes more reliable, it will be distributed to a larger number of clients. This is referred to as BETA RELEASES (for example, BETA1, BETA2, and so forth.)

Pre-releases of SQL SERVER are now known as Community Technology Previews, thanks to a change in terminology by Microsoft ( CTP ). This will be available on the Microsoft website for download.

Release Candidate refers to a product that has reached the last phases of testing before to release (RC).

After that, the product will go through extensive testing, which ensures that no further changes are required before it is released. This is known as General Availability (GA) product will go to significant testing , here confirms there will be no more changes needed for the product before release, it is called General Availability (GA).

After all of the testing is completed, the product will be given over to a company that will create media files such as CDs and DVDs. When the completed product is used by consumers, it is referred to as Release to Manufacturing (RTM).

Customers may encounter some specific challenges with the product. To address those difficulties, MS issues hotfixes that were created by the DEV Team.

When these concerns affect an increasing number of customers, Microsoft releases a General Distribution Release (GDR), which contains security fixes.

Both these hotfixes anBoth hotfixes and GDRs are designed to help consumers with specific issues right away. Many consumers find it impractical to apply the growing quantity of hotfixes and GDRs. As a result, Microsoft developed Cummulative Updates (CU), which comprise all of the hotfixes. This makes installation easier for all clients.

By gathering above all fixeMS has issued a ServicePack (SP) that includes all hotfixes, GDRs, and cumulative Updates, as well as some additional features that were not available at the time of General Availability..

Transaction log for Availability database is full due to AVAILABILITY_REPLICA

Error message :

The transaction log for database ‘Database’ is full due to ‘AVAILABILITY_REPLICA’

Error: 9002, Severity: 17, State: 9.

Cause : This happens when transaction modifications made on the primary replica have not yet been replicated on the secondary copy.

Troubleshoot the log file full Issue

When the log file grows, the circumstances below show the log reuse wait desc column as AVAILABILITY REPLICA.

Some of the Latencies that may occur when transmitting recorded updates to secondary replicas

Whatever transaction changes occur on the primary replica are enclosed in log record blocks, which are then sent and hardened to the database log file on the secondary replica. The primary replica will not be able to rewrite these log blocks into its own log file until the blocks have been sent and hardened to the secondary replica’s database log file.
If these blocks are not hardened to any secondary replica in the Availability group in a timely manner, the logged changes in the database at the primary replica will not be truncated, causing the log file to grow.

Redo Latency

A committed redo thread in the secondary replica instance applies the contained log records to the corresponding data file after logged blocks have been hardened to the secondary database log file. Until all redo threads on each secondary replica have applied the included log records, the primary replica cannot overwrite these log blocks into its own log file.

If the redo operation on any secondary replica is unable to keep up with the rate at which log blocks are toughened on the secondary replica, the primary replica’s log file will increase. Only up to the point where all secondary replica redo threads have applied can the primary replica truncate and reuse its own transaction log.
If there are several secondary replicas, we can use the DMV sys.dm hadr database replica states to compare the column Transaction lsn to determine which secondary replica database log truncation is delayed.

WorkAround

After determining which secondary database is causing the problem, we can try the following approaches to temporarily solve the problem:

  1. Remove the Availability group database from the secondary database and re-join..
  2. Set the Readable Secondary option in the Availability Group properties to NO if the redo thread is regularly blocked. Change the Readable Secondary option to YES if the redo queue has shrunk significantly.
  3. If Auto grow is deactivated, enable it and make sure you have enough storage space.