The dmv sys.dm_os_performance_counters is awesome, if you can understand it. This is how I make it easy to read and use. Here are the values I watch and why I watch them. My list isn’t going to be perfect and you’re not going to agree with it 100%, and I’m ok with that. First, there is no perfect. Second, if we agree 100% then one of us is just mindlessly following the other which isn’t good.
- Cache Hit Ratio – I ignore this value, but I still monitor it. I will, ideally, never be the only DBA on a team again, and everyone seems to think this value is cool.
- Page Life Exp – My favorite! When you read a page from disk into memory how many seconds will it stay there? Just don’t use the outdated “300″ rule or your disks will catch on fire!!!
- Page Lookups/Sec – How many pages are read from memory.
- Page Reads/Sec – How many pages are read from disk.
- Page Writes/Sec – How many pages are written to disk.
- Lazy Writes/sec – How many pages are written to disk outside of a checkpoint due to memory pressure.
- Batch Requests/sec – How busy is the server?
- Trans/sec – How busy is the server?
- Total Server Memory – How much memory SQL Server is currently using. Typically ramps up to Target value and PLE is low as it ramps up since new pages are in memory dropping the average.
- Target Server Memory – How much memory SQL Server is allowed to use. Should be the same as the max memory setting, but memory pressure can cause this to decrease.
- Memory Grants Pending – How many processes aren’t able to get enough memory to run. Should be 0, always 0, if not then find out why.
- Deadlocks – How many deadlocks are we getting. Most apps handle deadlocks gracefully, but they still lose time doing it. If this number starts going up, start looking into it.
- SQL Compilations/sec – This is a hidden performance killer! Some queries can’t be cached so they’re compiled every time they’re run. I’ve seen this with a query being run once a second and a big server was running slower than my laptop. It’s normal for things to compile throughout the day, it’s not normal for this number to be 10x higher than before that last upgrade.
- SQL Re-Compliations/sec – Same goes here. The counters aren’t that much different.
If you know a little about this DMV then you know these values are cryptic. There’s several ways this data is stored and it has to be retrieved differently for each type to be useful. Then many of these are cumulative since the server was restarted, which isn’t going to help too much. Even worse, MSDN failed us on this one and figuring out this DMV required help outside of that site. Now for the good news, the script below will take care of all of that for you and leave you with some easy reading with values you can filter to the time periods you care about. If you want to add some of your own counters then just follow my lead on one that has the same cntr_type, or you can go to Rabin’s blog post that I learned from.
IF object_id('tempdb..#OSPC') IS NOT NULL BEGIN DROP TABLE #OSPC END DECLARE @FirstCollectionTime DateTime , @SecondCollectionTime DateTime , @NumberOfSeconds Int , @BatchRequests Float , @LazyWrites Float , @Deadlocks BigInt , @PageLookups Float , @PageReads Float , @PageWrites Float , @SQLCompilations Float , @SQLRecompilations Float , @Transactions Float DECLARE @CounterPrefix NVARCHAR(30) SET @CounterPrefix = CASE WHEN @@SERVICENAME = 'MSSQLSERVER' THEN 'SQLServer:' ELSE 'MSSQL$' + @@SERVICENAME + ':' END --Grab the current values from dm_os_performance_counters --Doesn't do anything by instance or database because this is good enough and works unaltered in all envirornments SELECT counter_name, cntr_value--, cntr_type --I considered dynamically doing each counter type, but decided manual was better in this case INTO #OSPC FROM sys.dm_os_performance_counters WHERE object_name like @CounterPrefix + '%' AND instance_name IN ('', '_Total') AND counter_name IN ( N'Batch Requests/sec' , N'Buffer cache hit ratio' , N'Buffer cache hit ratio base' , N'Free Pages' , N'Lazy Writes/sec' , N'Memory Grants Pending' , N'Number of Deadlocks/sec' , N'Page life expectancy' , N'Page Lookups/Sec' , N'Page Reads/Sec' , N'Page Writes/Sec' , N'SQL Compilations/sec' , N'SQL Re-Compilations/sec' , N'Target Server Memory (KB)' , N'Total Server Memory (KB)' , N'Transactions/sec') --Just collected the second batch in the query above SELECT @SecondCollectionTime = GetDate() --Grab the most recent values, if they are appropriate (no reboot since grabbing them last) SELECT @FirstCollectionTime = DateAdded , @BatchRequests = BatchRequests , @LazyWrites = LazyWrites , @Deadlocks = Deadlocks , @PageLookups = PageLookups , @PageReads = PageReads , @PageWrites = PageWrites , @SQLCompilations = SQLCompilations , @SQLRecompilations = SQLRecompilations , @Transactions = Transactions FROM OSPerfCountersLast WHERE DateAdded > (SELECT create_date FROM sys.databases WHERE name = 'TempDB') --If there was a reboot then all these values would have been 0 at the time the server came online (AKA: TempDB's create date) SELECT @FirstCollectionTime = ISNULL(@FirstCollectionTime, (SELECT create_date FROM sys.databases WHERE name = 'TempDB')) , @BatchRequests = ISNULL(@BatchRequests, 0) , @LazyWrites = ISNULL(@LazyWrites, 0) , @Deadlocks = ISNULL(@Deadlocks, 0) , @PageLookups = ISNULL(@PageLookups, 0) , @PageReads = ISNULL(@PageReads, 0) , @PageWrites = ISNULL(@PageWrites, 0) , @SQLCompilations = ISNULL(@SQLCompilations, 0) , @SQLRecompilations = ISNULL(@SQLRecompilations, 0) , @Transactions = ISNULL(@Transactions, 0) SELECT @NumberOfSeconds = DATEDIFF(ss, @FirstCollectionTime, @SecondCollectionTime) --I put these in alphabetical order by counter_name, not column name. It looks a bit odd, but makes sense to me --Deadlocks are odd here. I keep track of the number of deadlocks in the time period, not average number of deadlocks per second. --AKA, I keep track of things the way I would refer to them when I talk to someone. "We had 2 deadlocks in the last 5 minutes", not "We averaged .00002 deadlocks per second there" INSERT INTO OSPerfCounters (DateAdded, Batch_Requests_Sec, Cache_Hit_Ratio, Free_Pages, Lazy_Writes_Sec, Memory_Grants_Pending , Deadlocks, Page_Life_Exp, Page_Lookups_Sec, Page_Reads_Sec, Page_Writes_Sec, SQL_Compilations_Sec, SQL_Recompilations_Sec , ServerMemoryTarget_KB, ServerMemoryTotal_KB, Transactions_Sec) SELECT @SecondCollectionTime , Batch_Request_Sec = ((SELECT cntr_value FROM #OSPC WHERE counter_name = N'Batch Requests/sec') - @BatchRequests) / @NumberOfSeconds , Cache_Hit_Ratio = (SELECT cntr_value FROM #OSPC WHERE counter_name = N'Buffer cache hit ratio')/(SELECT cntr_value FROM #OSPC WHERE counter_name = N'Buffer cache hit ratio base') , Free_Pages = (SELECT cntr_value FROM #OSPC WHERE counter_name =N'Free pages') , Lazy_Writes_Sec = ((SELECT cntr_value FROM #OSPC WHERE counter_name = N'Lazy Writes/sec') - @LazyWrites) / @NumberOfSeconds , Memory_Grants_Pending = (SELECT cntr_value FROM #OSPC WHERE counter_name = N'Memory Grants Pending') , Deadlocks = ((SELECT cntr_value FROM #OSPC WHERE counter_name = N'Number of Deadlocks/sec') - @Deadlocks) , Page_Life_Exp = (SELECT cntr_value FROM #OSPC WHERE counter_name = N'Page life expectancy') , Page_Lookups_Sec = ((SELECT cntr_value FROM #OSPC WHERE counter_name = N'Page lookups/sec') - @PageLookups) / @NumberOfSeconds , Page_Reads_Sec = ((SELECT cntr_value FROM #OSPC WHERE counter_name = N'Page reads/sec') - @PageReads) / @NumberOfSeconds , Page_Writes_Sec = ((SELECT cntr_value FROM #OSPC WHERE counter_name = N'Page writes/sec') - @PageWrites) / @NumberOfSeconds , SQL_Compilations_Sec = ((SELECT cntr_value FROM #OSPC WHERE counter_name = N'SQL Compilations/sec') - @SQLCompilations) / @NumberOfSeconds , SQL_Recompilations_Sec= ((SELECT cntr_value FROM #OSPC WHERE counter_name = N'SQL Re-Compilations/sec') - @SQLRecompilations) / @NumberOfSeconds , ServerMemoryTarget_KB = (SELECT cntr_value FROM #OSPC WHERE counter_name = N'Target Server Memory (KB)') , ServerMemoryTotal_KB = (SELECT cntr_value FROM #OSPC WHERE counter_name = N'Total Server Memory (KB)') , Transactions_Sec = ((SELECT cntr_value FROM #OSPC WHERE counter_name = N'Transactions/sec') - @Transactions) / @NumberOfSeconds TRUNCATE TABLE OSPerfCountersLast --Note, only saving the last value for ones that are done per second. INSERT INTO OSPerfCountersLast(DateAdded, BatchRequests, LazyWrites, Deadlocks, PageLookups, PageReads , PageWrites, SQLCompilations, SQLRecompilations, Transactions) SELECT DateAdded = @SecondCollectionTime , BatchRequests = (SELECT cntr_value FROM #OSPC WHERE counter_name = N'Batch Requests/sec') , LazyWrites = (SELECT cntr_value FROM #OSPC WHERE counter_name = N'Lazy Writes/sec') , Deadlocks = (SELECT cntr_value FROM #OSPC WHERE counter_name = N'Number of Deadlocks/sec') , PageLookups = (SELECT cntr_value FROM #OSPC WHERE counter_name = N'Page lookups/sec') , PageReads = (SELECT cntr_value FROM #OSPC WHERE counter_name = N'Page reads/sec') , PageWrites = (SELECT cntr_value FROM #OSPC WHERE counter_name = N'Page writes/sec') , SQLCompilations = (SELECT cntr_value FROM #OSPC WHERE counter_name = N'SQL Compilations/sec') , SQLRecompilations = (SELECT cntr_value FROM #OSPC WHERE counter_name = N'SQL Re-Compilations/sec') , Transactions = (SELECT cntr_value FROM #OSPC WHERE counter_name = N'Transactions/sec') DROP TABLE #OSPC
Throw that code above here in a proc, schedule it to run every so often (I like 5 minutes) and it’ll….fail. It kinda relies on a couple tables you should create first. Here ya go.
CREATE TABLE OSPerfCounters( DateAdded datetime NOT NULL , Batch_Requests_Sec int NOT NULL , Cache_Hit_Ratio float NOT NULL , Free_Pages int NOT NULL , Lazy_Writes_Sec int NOT NULL , Memory_Grants_Pending int NOT NULL , Deadlocks int NOT NULL , Page_Life_Exp int NOT NULL , Page_Lookups_Sec int NOT NULL , Page_Reads_Sec int NOT NULL , Page_Writes_Sec int NOT NULL , SQL_Compilations_Sec int NOT NULL , SQL_Recompilations_Sec int NOT NULL , ServerMemoryTarget_KB int NOT NULL , ServerMemoryTotal_KB int NOT NULL , Transactions_Sec int NOT NULL ) --You'll typically only query this by one value, which is added sequentually. No page splits!!! CREATE UNIQUE CLUSTERED INDEX IX_OSPerfCounters_DateAdded_U_C ON OSPerfCounters ( DateAdded ) WITH (FillFactor = 100) --Only holds one value at a time, indexes are a waste CREATE TABLE OSPerfCountersLast( DateAdded datetime NOT NULL , BatchRequests bigint NOT NULL , LazyWrites bigint NOT NULL , Deadlocks bigint NOT NULL , PageLookups bigint NOT NULL , PageReads bigint NOT NULL , PageWrites bigint NOT NULL , SQLCompilations bigint NOT NULL , SQLRecompilations bigint NOT NULL , Transactions bigint NOT NULL )
The important part of all this is how you use it. It’s tempting to just look at the last 7 records and say that you know what’s going on, that makes me want to slap you. Every server is different, every server has different loads and baselines, and you’re either underworked or you don’t know what those baselines are for every server you manage. I do simple baselines every time I look at an incident and look at the last hour, the same time yesterday, and the same time a week ago. That gives you a chance to see what’s normal for this server and what’s different right now. This query is so simple you’ll wonder why I even posted it, but it’s effective which is why it’s here. The 7 records per day thing, that’s because 21 records show up on my screen without me scrolling, it is NOT a magic number!
SELECT 'Today', * FROM ( SELECT TOP 7 * FROM OSPerfCounters ORDER BY dateadded DESC ) X UNION SELECT 'Yesterday', * FROM (SELECT TOP 7 * FROM OSPerfCounters WHERE dateadded <= GETDATE()-1 ORDER BY dateadded DESC ) Y UNION SELECT 'Last Week', * FROM ( SELECT TOP 7 * FROM OSPerfCounters WHERE dateadded <= GETDATE()-7 ORDER BY dateadded DESC) Z ORDER BY dateadded DESC
And, well, something I’ve been skipping on my posts and telling people to handle cleanup on their own…. Here’s step 2 of my jobs that populate my monitoring tables to keep your data from being the ever-growing data you’re struggling with in every other app. I delete in batches according to the clustered index. It’s overkill for something deleting one row at a time, or, even if you put this in a separate daily job, 288 rows if the process is scheduled every 5 minutes. So, why the batches? Because I copy/paste my own code everywhere, batches is reusable, and this is how I chop off the tail end of EVERYTHING!
SELECT 'Start' --Give me a rowcount of 1 WHILE @@ROWCOUNT > 0 BEGIN DELETE TOP (100000) FROM OSPerfCounters where dateadded < (GetDate() - 400) END
In the beginning I mentioned that if you agreed with me 100% then one of us is a mindless monkey. Look, I put this out there first, so I’m obviously not the mindless monkey here, am I? There’s a box below that gives you a chance to show that you’re not a mindless monkey either! Tell me I’m wrong, how I can do better, and how everyone else reading this can benefit from it even more! I’ll promote you from mindless monkey to talking monkey!