Monday, September 2, 2013

Identifying slow running queries

This post is for SQL Server Database Administrators who have find difficulty in some situations when identifying slow running queries. The DBAs who work in production environment specially might have faced situations where users or other production monitoring teams complaining that suddenly they see a slowness in their systems. The “slowness in the system” a very generic term. This could be due to various reasons. Such as;

  • Network issue
  • Application server issue
  • Database issue

As a DBA, it is a responsibility to make sure everything is clear in database perspective. Before start troubleshooting the broader phenomena, it is necessary to ask few questions to shed some light into the dark area. Below questions might be useful;

  • What is the subsystem they experience the slowness?
  • Any particular time they see the slowness?
  • Any recent system activity which have occurred? Such as large file upload, etc
  • Are there any recent alerts or notifications received?
  • Any other question which you might think is useful

By getting the answers for the above questions you might be able to narrow down your investigation to the database server level. After identifying the database servers you can monitor couple of things.

  • Check for blockings and waiting's
  • Check for high CPU
  • Check for memory usage
  • Check for IO
  • Check for slow running queries

Slowness of the database server does not need to be in memory pressure, high CPU always. It could be just queries getting slow suddenly. The focus of this post is to identify the slow running queries.

The next step is to select a suitable method for this. Most of the articles suggest to run Profiler but this is not possible in most of the production environments which have millions of transactions per day. This is a viable solution in non-production environment.

There are DMVs which is very useful to gather information about the database server which is happening currently and also the past data. Currently running details can be captured using the following three DMVs;

  1. sys.dm_exec_requests
  2. sys.dm_exec_sessions
  3.  sys.dm_exec_connections

However to identify the slow running queries it needs to look at how the queries or stored procedure behaved in past. What is their CPU time, duration, IO in normal conditions. If you know those figures only you can compare them with currently running figures and then get some insight. To look at the past data you can query the DMV, sys.dm_exec_query_stats. This has really good information which we can use to identify the slow running queries. It is necessary to add some intelligent to the scripts by using this DMV to get slow running queries in following three perspectives;

  1. CPU
  2. IO
  3. Elapsed time

I created below three queries for this purpose and thought of sharing it with the community so that we can get more information on how others are using it and what method they are using in production environment for the same purpose. All the queries are designed to capture the activities occurred in last 3 minutes.

Script 1 – Below query is useful to identify the slow queries in terms of physical reads. Physical reads great than 100 is output from the query.

select 
d.name AS [DatabaseName]
,sqltext.objectid AS [ObjectID]
,p.name AS [SPName]
,SUBSTRING(sqltext.text, (qs.statement_start_offset/2)+1,
((
CASE qs.statement_end_offset
WHEN -1 THEN DATALENGTH(sqltext.text)
ELSE qs.statement_end_offset
END - qs.statement_start_offset)/2) + 1) AS statement_text
,qs.total_physical_reads/qs.execution_count AS [AvgPhysReads]
,qs.last_physical_reads AS [LastPhysicalReads]
,(qs.last_physical_reads - (qs.total_physical_reads/qs.execution_count)) AS [DiffPhysicalReads]
,qs.total_logical_reads/qs.execution_count AS [AvgLogReads]
,qs.last_logical_reads AS [LastLogReads]
,qs.total_logical_writes/qs.execution_count AS [AvgLogWrites]
,qs.last_logical_writes AS [LastLogWrites]
,ISNULL(qs.execution_count/DATEDIFF(Second, qs.creation_time, GETDATE()), 0) AS [Calls/Second]
,(qs.total_worker_time/1000)/qs.execution_count AS [AvgCPUTime MilliSec]
,qs.last_worker_time/1000 AS [LastCPUTime MilliSec]
,(qs.total_elapsed_time/1000)/qs.execution_count AS [AvgElapsedTime MilliSec]
,qs.last_elapsed_time/1000 AS [LastExecTime MilliSec]
,qs.plan_handle
,qs.sql_handle
,sqltext.text AS [SqlText]
from sys.dm_exec_query_stats qs
cross apply sys.dm_exec_sql_text(qs.sql_handle) sqltext
inner join sys.databases d
on sqltext.dbid=d.database_id
left join sys.procedures p
on sqltext.objectid=p.object_id
where
last_execution_time > DATEADD(mi,-3,GETDATE()) -- 3 min before
AND qs.last_physical_reads > (qs.total_physical_reads/qs.execution_count) -- last physical reads > Avg physical reads
AND (qs.last_physical_reads - (qs.total_physical_reads/qs.execution_count)) > 100 -- Diff of physical reads > 100
order by
qs.last_physical_reads desc

Script 2 – Below query is useful to identify the slow queries in terms of CPU. CPU increased by 3 sec is captured in this query.

select 
d.name AS [DatabaseName]
,sqltext.objectid AS [ObjectID]
,p.name AS [SPName]
,SUBSTRING(sqltext.text, (qs.statement_start_offset/2)+1,
((
CASE qs.statement_end_offset
WHEN -1 THEN DATALENGTH(sqltext.text)
ELSE qs.statement_end_offset
END - qs.statement_start_offset)/2) + 1) AS statement_text
,ISNULL(qs.execution_count/DATEDIFF(Second, qs.creation_time, GETDATE()), 0) AS [Calls/Second]
,(qs.total_worker_time/1000)/qs.execution_count AS [AvgCPUTime MilliSec]
,qs.last_worker_time/1000 AS [LastCPUTime MilliSec]
,((qs.last_worker_time/1000) - ((qs.total_worker_time/1000)/qs.execution_count)) AS [CPUDiff MilliSec]
,qs.total_logical_writes/qs.execution_count AS [AvgLogWrites]
,qs.last_logical_writes AS [LastLogWrites]
,qs.total_logical_reads/qs.execution_count AS [AvgLogReads]
,qs.last_logical_reads AS [LastLogReads]
,qs.total_physical_reads/qs.execution_count AS [AvgPhysReads]
,qs.last_physical_reads AS [LastPhysicalReads]
,(qs.total_elapsed_time/1000)/qs.execution_count AS [AvgElapsedTime MilliSec]
,qs.last_elapsed_time/1000 AS [LastExecTime MilliSec]
,qs.plan_handle
,qs.sql_handle
,sqltext.text AS [SqlText]
from sys.dm_exec_query_stats qs
cross apply sys.dm_exec_sql_text(qs.sql_handle) sqltext
inner join sys.databases d
on sqltext.dbid=d.database_id
left join sys.procedures p
on sqltext.objectid=p.object_id
where
last_execution_time > DATEADD(mi,-3,GETDATE()) -- 3 min before
AND (qs.last_worker_time/1000) > ((qs.total_worker_time/1000)/qs.execution_count) -- last CPU time > Avg CPU time
AND ((qs.last_worker_time/1000) - ((qs.total_worker_time/1000)/qs.execution_count)) > 3000 -- Last CPU time is greater than 3 sec than Avg CPU
order by
qs.last_worker_time desc

Script 3 – Below query is useful to identify the slow queries in terms of Elapsed time. The elapsed time increased by 3 sec is captured in this query.

select 
d.name AS [DatabaseName]
,sqltext.objectid AS [ObjectID]
,p.name AS [SPName]
,SUBSTRING(sqltext.text, (qs.statement_start_offset/2)+1,
((
CASE qs.statement_end_offset
WHEN -1 THEN DATALENGTH(sqltext.text)
ELSE qs.statement_end_offset
END - qs.statement_start_offset)/2) + 1) AS statement_text
,(qs.total_elapsed_time/1000)/qs.execution_count AS [AvgElapsedTime MilliSec]
,qs.last_elapsed_time/1000 AS [LastElapsedTime MilliSec]
,(qs.last_elapsed_time/1000) - ((qs.total_elapsed_time/1000)/qs.execution_count) AS [DiffElapsedTime]
,qs.total_physical_reads/qs.execution_count AS [AvgPhysReads]
,qs.last_physical_reads AS [LastPhysicalReads]
,qs.total_logical_reads/qs.execution_count AS [AvgLogReads]
,qs.last_logical_reads AS [LastLogReads]
,qs.total_logical_writes/qs.execution_count AS [AvgLogWrites]
,qs.last_logical_writes AS [LastLogWrites]
,ISNULL(qs.execution_count/DATEDIFF(Second, qs.creation_time, GETDATE()), 0) AS [Calls/Second]
,(qs.total_worker_time/1000)/qs.execution_count AS [AvgCPUTime MilliSec]
,qs.last_worker_time/1000 AS [LastCPUTime MilliSec]
,qs.plan_handle
,qs.sql_handle
,sqltext.text AS [SqlText]
from sys.dm_exec_query_stats qs
cross apply sys.dm_exec_sql_text(qs.sql_handle) sqltext
inner join sys.databases d
on sqltext.dbid=d.database_id
left join sys.procedures p
on sqltext.objectid=p.object_id
where
last_execution_time > DATEADD(mi,-3,GETDATE()) -- 3 min before
AND (qs.last_elapsed_time/1000) > ((qs.total_elapsed_time/1000)/qs.execution_count) -- Last elapsed time > Avg elapsed time
AND (qs.last_elapsed_time/1000) - ((qs.total_elapsed_time/1000)/qs.execution_count) > 3000 -- Elapsed time increased by 3 sec
order by
[DiffElapsedTime] desc

It is better to customize the above queries to suite to your environment. You can look at the WHERE and ORDER by clause and modify them accordingly. They are self explanatory. You also can include additional information from the DMV which you think as useful.


References


Checklist for Analyzing Slow-Running Queries


sys.dm_exec_query_stats

8 comments:

  1. Ótimo post.
    Great Post.
    Parabéns.
    Congratulations.

    ReplyDelete
  2. (LinkedIn)
    Easy to understand with useful scripts.
    Short but useful documentation for a topic I am beginning to study.
    I will keep this article among my favorites and I am sure to use it in a short time to help posters on my favorite forums.
    By Patrick Lambin

    ReplyDelete
  3. Hi , good script selection , also highlights the need to tackle the query performance problem from different angles. As a DBA having a "system" is useful - although you may need to be flexible. Find a query performance workflow - here - http://www.sqlserver-dba.com/2012/11/sql-server-how-to-troubleshoot-a-slow-running-query.html

    ReplyDelete
  4. (LinkedIn)
    Very useful
    By Sadheera Vithanage

    ReplyDelete
  5. really useful...
    By Anupam Singh
    (http://www.linkedin.com/groupAnswers?viewQuestionAndAnswers=&discussionID=270209600&gid=125418&commentID=162032034&trk=view_disc&fromEmail=&ut=1TgRYcSSrKBBU1)

    ReplyDelete
  6. really useful...
    By Anupam Singh
    (http://www.linkedin.com/groupAnswers?viewQuestionAndAnswers=&discussionID=270209600&gid=125418&commentID=162032034&trk=view_disc&fromEmail=&ut=1TgRYcSSrKBBU1)

    ReplyDelete
  7. Have a similar problem with MySQL 5.5
    Hardik Tandel
    http://www.linkedin.com/groupAnswers?viewQuestionAndAnswers=&discussionID=270209603&gid=45183&commentID=167220414&trk=view_disc&fromEmail=&ut=3C-J_c_KS155Y1

    ReplyDelete
  8. Decent article Susanthab, but correct me if I'm wrong, aren't the time conversions to seconds incorrect? The time columns in dm_exec_query_stats are reported in microseconds, not milliseconds. Refer to: http://technet.microsoft.com/en-us/library/ms189741(v=sql.110).aspx

    ReplyDelete

How to interpret Disk Latency

I was analyzing IO stats in one of our SQL Servers and noticed that IO latency (Read / Write) are very high. As a rule of thumb, we know tha...