DBAs rarely use the full potential of sys.dm_exec_query_stats. It’s common to see the queries for looking at the most expensive queries according to any of the stats contained within the current cache, which is great to see. However, if you grab snapshots of this information the proper way then you can get more detail that will really make a difference in your job. Capturing this every 4 hours is probably enough, and here’s what it’s going to do for you.
First, using it the way most DBAs do and look at the most expensive queries according to a specific metric. It works the same for them all, so we’ll just say we’re interested in physical reads in this case because our PLE isn’t what we’re hoping it could be. You’ve probably seen the queries that directly query sys.dm_exec_query_stats to give you this similar to the one on MSDN for this DMV. That will give you everything that’s in the cache right now, which could contain queries that recompile every hour in combination with other queries that have remained in the cache since the server was rebooted two weeks ago. This means that when you’re querying to see what you can do to improve your PLE during business hours you’re getting anything that happens to be there now, not what ran in the time window you’re looking to improve.
Now look at it from the point of view that you capture this information every 4 hours. You can get the total number of physical reads grouped by sql_handle where the interval_end time is in the last month, on weekdays, between 9:00 AM and 5:00 PM. This is both a more complete and more accurate picture of what you care about, and you’re more likely to be focusing on the queries you should be tuning from the start.
Once you do start tuning, you want to know what effect you really had, and not just rely on the couple sets of test parameters you used in your NonProd environments. A couple days after your changes made it to production, it’s a good idea to make sure you had the desired effect. Query the data you have saved off for the last three days and for the same days last week.
What I went over so far is just realizing how this takes the traditional use of this to a new level, but there’s more functionality here now that it’s being saved off. Now this is extremely useful in incident response. Before you could see if anything was hogging resources as seen in my Recently Recompiled Resource Hogs post, but now you take it a step further by being able to verify if what’s running long now has always run that way or if something is different. Also, with having the plan_handles saved off you can even see if it’s getting a different execution plan, which will help you determine if this was a change in data or if it’s a bad execution plan.
Not to shoot myself in the foot, but this isn’t perfect. If you capture this data every 4 hours starting at 12:00 and it recompiles at 3:59 then you’ll only capture one minute for that interval. Also, if you’re dealing with a query that can’t be cached then it won’t be found here at all; this is why you’re also running traces to find the big queries and watching sys.dm_os_performance_counters to know when there are a lot of compiles and recompiles.
To start capturing this, here are the tables you’ll need to capture everything.
SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO SET ANSI_PADDING ON GO /* IF OBJECT_ID('QueryStats') IS NOT NULL BEGIN DROP TABLE QueryStats END IF OBJECT_ID('QueryStatsLast') IS NOT NULL BEGIN DROP TABLE QueryStatsLast END IF OBJECT_ID('QueryText') IS NOT NULL BEGIN DROP TABLE QueryText END --*/ CREATE TABLE [dbo].[QueryStats]( [interval_start] [smalldatetime] NOT NULL, [interval_end] [smalldatetime] NOT NULL, [sql_handle] [varbinary](64) NOT NULL, [plan_handle] [varbinary](64) NOT NULL, [statement_start_offset] [int] NOT NULL, [statement_end_offset] [int] NOT NULL, [execution_count] [int] NOT NULL, [total_elapsed_time_ms] [int] NOT NULL, [min_elapsed_time_ms] [int] NOT NULL, [max_elapsed_time_ms] [int] NOT NULL, [total_worker_time_ms] [int] NOT NULL, [min_worker_time_ms] [int] NOT NULL, [max_worker_time_ms] [int] NOT NULL, [total_logical_reads] [int] NOT NULL, [min_logical_reads] [int] NOT NULL, [max_logical_reads] [int] NOT NULL, [total_physical_reads] [int] NOT NULL, [min_physical_reads] [int] NOT NULL, [max_physical_reads] [int] NOT NULL, [total_logical_writes] [int] NOT NULL, [min_logical_writes] [int] NOT NULL, [max_logical_writes] [int] NOT NULL, [creation_time] [smalldatetime] NOT NULL, [last_execution_time] [smalldatetime] NOT NULL, ) ON [PRIMARY] GO --Wide is bad, if you have a lot of nonclustered indexes. --This is a high-writes, low-reads table, so a single clustered index is probably all I want. --With Interval_End being the first key field, they're inserted in order and eliminate mass fragmentation -- as well as bad write performance. CREATE UNIQUE CLUSTERED INDEX IX_QueryStats_intervalend_sqlhandle_statementstartoffset_planhandle_U_C ON QueryStats ( interval_end , sql_handle , statement_start_offset , plan_handle ) CREATE TABLE [dbo].[QueryStatsLast]( [sql_handle] [varbinary](64) NOT NULL, [plan_handle] [varbinary](64) NOT NULL, [statement_start_offset] [int] NOT NULL, [statement_end_offset] [int] NOT NULL, [objtype] [nvarchar](20) NOT NULL, [execution_count] [bigint] NOT NULL, [total_elapsed_time_ms] [bigint] NOT NULL, [min_elapsed_time_ms] [bigint] NOT NULL, [max_elapsed_time_ms] [bigint] NOT NULL, [total_worker_time_ms] [bigint] NOT NULL, [min_worker_time_ms] [bigint] NOT NULL, [max_worker_time_ms] [bigint] NOT NULL, [total_logical_reads] [bigint] NOT NULL, [min_logical_reads] [bigint] NOT NULL, [max_logical_reads] [bigint] NOT NULL, [total_physical_reads] [bigint] NOT NULL, [min_physical_reads] [bigint] NOT NULL, [max_physical_reads] [bigint] NOT NULL, [total_logical_writes] [bigint] NOT NULL, [min_logical_writes] [bigint] NOT NULL, [max_logical_writes] [bigint] NOT NULL, [creation_time] [datetime] NOT NULL, [last_execution_time] [datetime] NOT NULL, [DateAdded] [datetime] NOT NULL ) ON [PRIMARY] CREATE UNIQUE CLUSTERED INDEX [IX_QueryStatsLast_sqlhandle_planhandle_statementstartoffset_U_C] ON [dbo].[QueryStatsLast] ( [sql_handle] ASC, [plan_handle] ASC, [statement_start_offset] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] GO CREATE TABLE [dbo].[QueryText]( [sql_handle] [varbinary](64) NOT NULL, [QueryText] [nvarchar](max) NOT NULL, [DatabaseName] [nvarchar](128) NULL, [objtype] [nvarchar](20) NULL ) ON [PRIMARY] CREATE UNIQUE CLUSTERED INDEX [IX_QueryText_sqlhandle_U_C] ON [dbo].[QueryText] ( [sql_handle] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] GO
Now that the tables are created, here’s how I populate them. Set this up to run once every 4 hours to keep the data small during typical running, and kick it up to once an hour or more if needed to pinpoint an issue if you need it.
DECLARE @interval_start smalldatetime , @interval_end smalldatetime IF OBJECT_ID('tempdb..#QS') IS NOT NULL BEGIN DROP TABLE #QS END CREATE TABLE [dbo].[#QS]( [sql_handle] [varbinary](64) NOT NULL, [plan_handle] [varbinary](64) NOT NULL, [statement_start_offset] [int] NOT NULL, [statement_end_offset] [int] NOT NULL, [objtype] [nvarchar](20) NOT NULL, [execution_count] [bigint] NOT NULL, [total_elapsed_time_ms] [bigint] NOT NULL, [min_elapsed_time_ms] [bigint] NOT NULL, [max_elapsed_time_ms] [bigint] NOT NULL, [total_worker_time_ms] [bigint] NOT NULL, [min_worker_time_ms] [bigint] NOT NULL, [max_worker_time_ms] [bigint] NOT NULL, [total_logical_reads] [bigint] NOT NULL, [min_logical_reads] [bigint] NOT NULL, [max_logical_reads] [bigint] NOT NULL, [total_physical_reads] [bigint] NOT NULL, [min_physical_reads] [bigint] NOT NULL, [max_physical_reads] [bigint] NOT NULL, [total_logical_writes] [bigint] NOT NULL, [min_logical_writes] [bigint] NOT NULL, [max_logical_writes] [bigint] NOT NULL, [creation_time] [datetime] NOT NULL, [last_execution_time] [datetime] NOT NULL, [DateAdded] [datetime] NOT NULL ) /* --This sounded like a great idea, but it just slowed it down several seconds. CREATE UNIQUE CLUSTERED INDEX TempQS_Cluster ON #QS ( sql_handle , plan_handle , statement_start_offset ) */ --The Cached Plans Object Type is in here in case you want to treat ad-hoc or prepared statements differently INSERT INTO #QS SELECT qs.sql_handle , qs.plan_handle , qs.statement_start_offset , qs.statement_end_offset , cp.objtype , qs.execution_count , total_elapsed_time_ms = qs.total_elapsed_time/1000 , min_elapsed_time_ms = qs.min_elapsed_time/1000 , max_elapsed_time_ms = qs.max_elapsed_time/1000 , total_worker_time_ms = qs.total_worker_time/1000 , min_worker_time_ms = qs.min_worker_time/1000 , max_worker_time_ms = qs.max_worker_time/1000 , qs.total_logical_reads , qs.min_logical_reads , qs.max_logical_reads , qs.total_physical_reads , qs.min_physical_reads , qs.max_physical_reads , qs.total_logical_writes , qs.min_logical_writes , qs.max_logical_writes , qs.creation_time , qs.last_execution_time , DateAdded = getDate() FROM sys.dm_exec_query_stats AS qs INNER JOIN sys.dm_exec_cached_plans cp on qs.plan_handle = cp.plan_handle --WHERE cp.objtype NOT IN ('Adhoc') INSERT INTO QueryText (sql_handle, QueryText, DatabaseName, objtype) SELECT QS.sql_handle , QueryText = qt.text , DatabaseName = DB_NAME(max(qt.dbid)) , max(QS.objtype) FROM (SELECT #QS.sql_handle , #QS.objtype FROM #QS LEFT JOIN QueryText QST ON #QS.sql_handle = QST.sql_handle WHERE QST.sql_handle IS NULL) QS CROSS APPLY sys.dm_exec_sql_text(QS.sql_handle) qt GROUP BY QS.sql_handle , qt.text --All the values are the same in each table SELECT TOP 1 @interval_start = dateadded FROM QueryStatsLast SELECT TOP 1 @interval_end = dateadded FROM #QS IF @interval_start IS NULL BEGIN SELECT @interval_start = create_date FROM sys.databases WHERE name = 'tempdb' END INSERT INTO QueryStats (interval_start , interval_end , sql_handle , plan_handle , statement_start_offset , statement_end_offset , execution_count , total_elapsed_time_ms , min_elapsed_time_ms , max_elapsed_time_ms , total_worker_time_ms , min_worker_time_ms , max_worker_time_ms , total_logical_reads , min_logical_reads , max_logical_reads , total_physical_reads , min_physical_reads , max_physical_reads , total_logical_writes , min_logical_writes , max_logical_writes , creation_time , last_execution_time) SELECT @interval_start , @interval_end , qs.sql_handle , qs.plan_handle , qs.statement_start_offset , qs.statement_end_offset , qs.execution_count - ISNULL(qsl.execution_count, 0) , qs.total_elapsed_time_ms - ISNULL(qsl.total_elapsed_time_ms, 0) , qs.min_elapsed_time_ms , qs.max_elapsed_time_ms , qs.total_worker_time_ms - ISNULL(qsl.total_worker_time_ms, 0) , qs.min_worker_time_ms , qs.max_worker_time_ms , qs.total_logical_reads - ISNULL(qsl.total_logical_reads, 0) , qs.min_logical_reads , qs.max_logical_reads , qs.total_physical_reads - ISNULL(qsl.total_physical_reads, 0) , qs.min_physical_reads , qs.max_physical_reads , qs.total_logical_writes - ISNULL(qsl.total_logical_writes, 0) , qs.min_logical_writes , qs.max_logical_writes , qs.creation_time , qs.last_execution_time FROM #QS qs LEFT OUTER JOIN QueryStatsLast qsl ON qs.sql_handle = qsl.sql_handle AND qs.plan_handle = qsl.plan_handle AND qs.statement_start_offset = qsl.statement_start_offset AND qs.creation_time = qsl.creation_time WHERE qs.execution_count - ISNULL(qsl.execution_count, 0) > 0 --Only keep records for when it was executed in that interval TRUNCATE TABLE QueryStatsLast INSERT INTO QueryStatsLast SELECT sql_handle , plan_handle , statement_start_offset , statement_end_offset , objtype , execution_count , total_elapsed_time_ms , min_elapsed_time_ms , max_elapsed_time_ms , total_worker_time_ms , min_worker_time_ms , max_worker_time_ms , total_logical_reads , min_logical_reads , max_logical_reads , total_physical_reads , min_physical_reads , max_physical_reads , total_logical_writes , min_logical_writes , max_logical_writes , creation_time , last_execution_time , DateAdded FROM #QS
Related articles
- SQL Server Running Slow (simplesqlserver.com)
- Recently Recompiled Resource Hogs (simplesqlserver.com)
- Running Processes (simplesqlserver.com)
- Sys.dm_os_performance_counters Demystified (simplesqlserver.com)
- Fixing Page Life Expectancy (PLE) (simplesqlserver.com)
This is great. I’m saving the information for future investigation. We do have Quest ( Dell ) Spotlight saving sql performance information along with a job I set up to capture Sp_whoisactive to a table every 4 minutes. Need to finish Grant Fritchey’s execution plan book first.
Spotlight may do the query stats for you, but even if it does it may be difficult to report on it. It’s a fairly light-weight process, but you should be warned that you may be duplicating work.
Also, there may be better ways to handle what you’re doing with capturing sp_whoisactive. It’s something where you have to think about why you’re capturing it and if there is a better way to see the same information. If you’re looking for long running queries, you can do traces or extended events that captures everything that goes over a certain limit. If you’re looking for blocking, you can capture just the blocking.
There is no problem capturing this. It’s just a matter of making sure you get the right things. Think about what problems you want to solve then the methods you can use to get that information. Many people think about what they want to capture first so they grab what they want instead of what they need, which may be the case here.
Also, you mentioned you have several Sr DBAs in another comment. Take advantage of them excessively. The more you know, the easier their jobs get. Now you have smart people motivated to train you for free, abuse that situation!!!
This was modified to change the name of the table QueryStatsText to QueryText, making it more usable for other purposes.
Also, there was a slight bug if there were two queries with the same sql_handle that only differed by the object type. It turns out this is possible if you have a prepared and ad-hoc query with exactly the same text. Who knew? Anyways, now it does group by sql_handle and text to make sure it’s unique and grabs the max (no reason for max, just pick one) object type if there are multiples. Dirty fix, but it’s a fix. Let me know if you have anything cleaner.
Hi Steve,
Just stumbled upon this now… In the Query Stats and the proc stats I’ve been having issues with the execution counts being way off. A procedure that may get executed once an hour at most, was reporting to me that it ran over 1,000 times. I’ve confirmed the inaccuracies via a trace, but I can’t seem to figure out a way around it or a way to get a better number.