Sys.dm_os_performance_counters Demystified

The dmv sys.dm_os_performance_counters is awesome, if you can understand it. This is how I make it easy to read and use. Here are the values I watch and why I watch them. My list isn’t going to be perfect and you’re not going to agree with it 100%, and I’m ok with that. First, there is no perfect. Second, if we agree 100% then one of us is just mindlessly following the other which isn’t good.

  • Cache Hit Ratio – I ignore this value, but I still monitor it. I will, ideally, never be the only DBA on a team again, and everyone seems to think this value is cool.
  • Page Life Exp – My favorite! When you read a page from disk into memory how many seconds will it stay there? Just don’t use the outdated “300” rule or your disks will catch on fire!!!

    Burning Drive!!!

    PLE = 300

  • Page Lookups/Sec – How many pages are read from memory.
  • Page Reads/Sec – How many pages are read from disk.
  • Page Writes/Sec – How many pages are written to disk.
  • Lazy Writes/sec – How many pages are written to disk outside of a checkpoint due to memory pressure.
  • Batch Requests/sec – How busy is the server?
  • Trans/sec – How busy is the server?
  • Total Server Memory – How much memory SQL Server is currently using. Typically ramps up to Target value and PLE is low as it ramps up since new pages are in memory dropping the average.
  • Target Server Memory – How much memory SQL Server is allowed to use. Should be the same as the max memory setting, but memory pressure can cause this to decrease.
  • Memory Grants Pending – How many processes aren’t able to get enough memory to run. Should be 0, always 0, if not then find out why.
  • Deadlocks – How many deadlocks are we getting. Most apps handle deadlocks gracefully, but they still lose time doing it. If this number starts going up, start looking into it.
  • SQL Compilations/sec – This is a hidden performance killer! Some queries can’t be cached so they’re compiled every time they’re run. I’ve seen this with a query being run once a second and a big server was running slower than my laptop. It’s normal for things to compile throughout the day, it’s not normal for this number to be 10x higher than before that last upgrade.
  • SQL Re-Compliations/sec – Same goes here. The counters aren’t that much different.

If you know a little about this DMV then you know these values are cryptic. There’s several ways this data is stored and it has to be retrieved differently for each type to be useful. Then many of these are cumulative since the server was restarted, which isn’t going to help too much. Even worse, MSDN failed us on this one and figuring out this DMV required help outside of that site. Now for the good news, the script below will take care of all of that for you and leave you with some easy reading with values you can filter to the time periods you care about. If you want to add some of your own counters then just follow my lead on one that has the same cntr_type, or you can go to Rabin’s blog post that I learned from.

IF object_id('tempdb..#OSPC') IS NOT NULL BEGIN

DECLARE @FirstCollectionTime DateTime
    , @SecondCollectionTime DateTime
    , @NumberOfSeconds Int
    , @BatchRequests Float
    , @LazyWrites Float
    , @Deadlocks BigInt
    , @PageLookups Float
    , @PageReads Float
    , @PageWrites Float
    , @SQLCompilations Float
    , @SQLRecompilations Float
    , @Transactions Float

DECLARE @CounterPrefix NVARCHAR(30)
                            THEN 'SQLServer:'
                        ELSE 'MSSQL$' + @@SERVICENAME + ':'

--Grab the current values from dm_os_performance_counters
--Doesn't do anything by instance or database because this is good enough and works unaltered in all envirornments
SELECT counter_name, cntr_value--, cntr_type --I considered dynamically doing each counter type, but decided manual was better in this case
FROM sys.dm_os_performance_counters 
WHERE object_name like @CounterPrefix + '%'
    AND instance_name IN ('', '_Total')
    AND counter_name IN ( N'Batch Requests/sec'
                        , N'Buffer cache hit ratio'
                        , N'Buffer cache hit ratio base'
                        , N'Free Pages'
                        , N'Lazy Writes/sec'
                        , N'Memory Grants Pending'
                        , N'Number of Deadlocks/sec'
                        , N'Page life expectancy'
                        , N'Page Lookups/Sec'
                        , N'Page Reads/Sec'
                        , N'Page Writes/Sec'
                        , N'SQL Compilations/sec'
                        , N'SQL Re-Compilations/sec'
                        , N'Target Server Memory (KB)'
                        , N'Total Server Memory (KB)'
                        , N'Transactions/sec')

--Just collected the second batch in the query above
SELECT @SecondCollectionTime = GetDate()

--Grab the most recent values, if they are appropriate (no reboot since grabbing them last)
SELECT @FirstCollectionTime = DateAdded
    , @BatchRequests = BatchRequests
    , @LazyWrites = LazyWrites
    , @Deadlocks = Deadlocks
    , @PageLookups = PageLookups
    , @PageReads = PageReads
    , @PageWrites = PageWrites
    , @SQLCompilations = SQLCompilations
    , @SQLRecompilations = SQLRecompilations
    , @Transactions = Transactions
FROM OSPerfCountersLast 
WHERE DateAdded > (SELECT create_date FROM sys.databases WHERE name = 'TempDB')

--If there was a reboot then all these values would have been 0 at the time the server came online (AKA: TempDB's create date)
SELECT @FirstCollectionTime = ISNULL(@FirstCollectionTime, (SELECT create_date FROM sys.databases WHERE name = 'TempDB'))
    , @BatchRequests = ISNULL(@BatchRequests, 0)
    , @LazyWrites = ISNULL(@LazyWrites, 0)
    , @Deadlocks = ISNULL(@Deadlocks, 0)
    , @PageLookups = ISNULL(@PageLookups, 0)
    , @PageReads = ISNULL(@PageReads, 0)
    , @PageWrites = ISNULL(@PageWrites, 0)
    , @SQLCompilations = ISNULL(@SQLCompilations, 0)
    , @SQLRecompilations = ISNULL(@SQLRecompilations, 0)
    , @Transactions = ISNULL(@Transactions, 0)

SELECT @NumberOfSeconds = DATEDIFF(ss, @FirstCollectionTime, @SecondCollectionTime)

--I put these in alphabetical order by counter_name, not column name.  It looks a bit odd, but makes sense to me
--Deadlocks are odd here.  I keep track of the number of deadlocks in the time period, not average number of deadlocks per second.
  --AKA, I keep track of things the way I would refer to them when I talk to someone.  "We had 2 deadlocks in the last 5 minutes", not "We averaged .00002 deadlocks per second there"
INSERT INTO OSPerfCounters (DateAdded, Batch_Requests_Sec, Cache_Hit_Ratio, Free_Pages, Lazy_Writes_Sec, Memory_Grants_Pending
    , Deadlocks, Page_Life_Exp, Page_Lookups_Sec, Page_Reads_Sec, Page_Writes_Sec, SQL_Compilations_Sec, SQL_Recompilations_Sec
    , ServerMemoryTarget_KB, ServerMemoryTotal_KB, Transactions_Sec)
SELECT @SecondCollectionTime
    , Batch_Request_Sec        = ((SELECT cntr_value FROM #OSPC WHERE counter_name = N'Batch Requests/sec') - @BatchRequests) / @NumberOfSeconds
    , Cache_Hit_Ratio        = (SELECT cntr_value FROM #OSPC WHERE counter_name = N'Buffer cache hit ratio')/(SELECT cntr_value FROM #OSPC WHERE counter_name = N'Buffer cache hit ratio base')
    , Free_Pages            = (SELECT cntr_value FROM #OSPC WHERE counter_name =N'Free pages')
    , Lazy_Writes_Sec        = ((SELECT cntr_value FROM #OSPC WHERE counter_name = N'Lazy Writes/sec') - @LazyWrites) / @NumberOfSeconds
    , Memory_Grants_Pending    = (SELECT cntr_value FROM #OSPC WHERE counter_name = N'Memory Grants Pending')
    , Deadlocks                = ((SELECT cntr_value FROM #OSPC WHERE counter_name = N'Number of Deadlocks/sec') - @Deadlocks) 
    , Page_Life_Exp         = (SELECT cntr_value FROM #OSPC WHERE counter_name = N'Page life expectancy')
    , Page_Lookups_Sec      = ((SELECT cntr_value FROM #OSPC WHERE counter_name = N'Page lookups/sec') - @PageLookups) / @NumberOfSeconds
    , Page_Reads_Sec        = ((SELECT cntr_value FROM #OSPC WHERE counter_name = N'Page reads/sec') - @PageReads) / @NumberOfSeconds
    , Page_Writes_Sec        = ((SELECT cntr_value FROM #OSPC WHERE counter_name = N'Page writes/sec') - @PageWrites) / @NumberOfSeconds
    , SQL_Compilations_Sec  = ((SELECT cntr_value FROM #OSPC WHERE counter_name = N'SQL Compilations/sec') - @SQLCompilations) / @NumberOfSeconds
    , SQL_Recompilations_Sec= ((SELECT cntr_value FROM #OSPC WHERE counter_name = N'SQL Re-Compilations/sec') - @SQLRecompilations) / @NumberOfSeconds
    , ServerMemoryTarget_KB = (SELECT cntr_value FROM #OSPC WHERE counter_name = N'Target Server Memory (KB)')
    , ServerMemoryTotal_KB  = (SELECT cntr_value FROM #OSPC WHERE counter_name = N'Total Server Memory (KB)')
    , Transactions_Sec        = ((SELECT cntr_value FROM #OSPC WHERE counter_name = N'Transactions/sec') - @Transactions) / @NumberOfSeconds


--Note, only saving the last value for ones that are done per second.
INSERT INTO OSPerfCountersLast(DateAdded, BatchRequests, LazyWrites, Deadlocks, PageLookups, PageReads
    , PageWrites, SQLCompilations, SQLRecompilations, Transactions)
SELECT DateAdded            = @SecondCollectionTime
    , BatchRequests            = (SELECT cntr_value FROM #OSPC WHERE counter_name = N'Batch Requests/sec')
    , LazyWrites            = (SELECT cntr_value FROM #OSPC WHERE counter_name = N'Lazy Writes/sec')
    , Deadlocks             = (SELECT cntr_value FROM #OSPC WHERE counter_name = N'Number of Deadlocks/sec')
    , PageLookups            = (SELECT cntr_value FROM #OSPC WHERE counter_name = N'Page lookups/sec')
    , PageReads                = (SELECT cntr_value FROM #OSPC WHERE counter_name = N'Page reads/sec')
    , PageWrites            = (SELECT cntr_value FROM #OSPC WHERE counter_name = N'Page writes/sec')
    , SQLCompilations        = (SELECT cntr_value FROM #OSPC WHERE counter_name = N'SQL Compilations/sec')
    , SQLRecompilations        = (SELECT cntr_value FROM #OSPC WHERE counter_name = N'SQL Re-Compilations/sec')
    , Transactions             = (SELECT cntr_value FROM #OSPC WHERE counter_name = N'Transactions/sec')


Throw that code above here in a proc, schedule it to run every so often (I like 5 minutes) and it’ll….fail. It kinda relies on a couple tables you should create first. Here ya go.

    DateAdded datetime NOT NULL
    , Batch_Requests_Sec int NOT NULL
    , Cache_Hit_Ratio float NOT NULL
    , Free_Pages int NOT NULL
    , Lazy_Writes_Sec int NOT NULL
    , Memory_Grants_Pending int NOT NULL
    , Deadlocks int NOT NULL
    , Page_Life_Exp int NOT NULL
    , Page_Lookups_Sec int NOT NULL
    , Page_Reads_Sec int NOT NULL
    , Page_Writes_Sec int NOT NULL
    , SQL_Compilations_Sec int NOT NULL
    , SQL_Recompilations_Sec int NOT NULL
    , ServerMemoryTarget_KB int NOT NULL
    , ServerMemoryTotal_KB int NOT NULL
    , Transactions_Sec int NOT NULL

--You'll typically only query this by one value, which is added sequentually.  No page splits!!!
) WITH (FillFactor = 100)

--Only holds one value at a time, indexes are a waste
CREATE TABLE OSPerfCountersLast(
    DateAdded datetime NOT NULL
    , BatchRequests bigint NOT NULL
    , LazyWrites bigint NOT NULL
    , Deadlocks bigint NOT NULL
    , PageLookups bigint NOT NULL
    , PageReads bigint NOT NULL
    , PageWrites bigint NOT NULL
    , SQLCompilations bigint NOT NULL
    , SQLRecompilations bigint NOT NULL
    , Transactions bigint NOT NULL

The important part of all this is how you use it. It’s tempting to just look at the last 7 records and say that you know what’s going on, that makes me want to slap you. Every server is different, every server has different loads and baselines, and you’re either underworked or you don’t know what those baselines are for every server you manage. I do simple baselines every time I look at an incident and look at the last hour, the same time yesterday, and the same time a week ago. That gives you a chance to see what’s normal for this server and what’s different right now. This query is so simple you’ll wonder why I even posted it, but it’s effective which is why it’s here. The 7 records per day thing, that’s because 21 records show up on my screen without me scrolling, it is NOT a magic number!

SELECT 'Today', * FROM (
FROM OSPerfCounters
ORDER BY dateadded DESC
) X

SELECT 'Yesterday', * FROM (SELECT TOP 7 *
FROM OSPerfCounters
WHERE dateadded <= GETDATE()-1
ORDER BY dateadded DESC
) Y

SELECT 'Last Week', * FROM (
FROM OSPerfCounters
WHERE dateadded <= GETDATE()-7
ORDER BY dateadded DESC) Z

ORDER BY dateadded DESC

And, well, something I’ve been skipping on my posts and telling people to handle cleanup on their own…. Here’s step 2 of my jobs that populate my monitoring tables to keep your data from being the ever-growing data you’re struggling with in every other app. I delete in batches according to the clustered index. It’s overkill for something deleting one row at a time, or, even if you put this in a separate daily job, 288 rows if the process is scheduled every 5 minutes. So, why the batches? Because I copy/paste my own code everywhere, batches is reusable, and this is how I chop off the tail end of EVERYTHING!

SELECT 'Start' --Give me a rowcount of 1

    DELETE TOP (100000)
    FROM OSPerfCounters
    where dateadded < (GetDate() - 400)

In the beginning I mentioned that if you agreed with me 100% then one of us is a mindless monkey. Look, I put this out there first, so I’m obviously not the mindless monkey here, am I? There’s a box below that gives you a chance to show that you’re not a mindless monkey either! Tell me I’m wrong, how I can do better, and how everyone else reading this can benefit from it even more! I’ll promote you from mindless monkey to talking monkey!


23 thoughts on “Sys.dm_os_performance_counters Demystified

  1. Pingback: What is Deadlock? | weathercode

  2. Steve,

    Great post! Have you tested this on SQL 2012? I can get it to work on 08R2, but on 2012 I can’t get the following columns to populate data.

    [Batch_Requests_Sec], [Free_Pages], [Lazy_Writes_Sec], [Page_Lookups_Sec],[Page_Reads_Sec] ,[Page_Writes_Sec] ,[SQL_Compilations_Sec] ,[SQL_Recompilations_Sec]

    The OSPerfCountersLast table populates, but the OSPerfCounters table stays NULL.

    • I have to admit I’ve never tried this on different collations, but I couldn’t imagine it not working properly as long as everything is consistent.

      I’ll test this out tomorrow and let you know what I find.

  3. From doing more research it almost seems as if something is going wrong on the calculation part, for example, Page_Reads_Sec = ((SELECT cntr_value FROM #OSPC WHERE counter_name = N’Page reads/sec’) – @PageReads) / @NumberOfSeconds

    • Somewhere the collation names are inconsistent, probably between master database and the database that you’re storing this information in. I haven’t had a chance to reproduce this yet, but that would be the first thing I would do.

      • Mixing collations is a bad idea if you can avoid it. When you have one database talking to another then you’ll need to do hints to get almost anything to work properly. Is it possible to have the database you’re storing this in and the master database to have the same collation?

  4. Upon, more research, it looks like it’s not holding the values for the variables. For example, Lazy_Writes_Sec = ((SELECT cntr_value FROM #OSPC WHERE counter_name = N’Lazy writes/sec’) – @LazyWrites) / @NumberOfSeconds. Once it gets to this point in the script, there are no values for @LazyWrites or @NumberOfSeconds so it results in a NULL. Have you ever saw this before?

  5. There should at least be a 0 in there for those with this already being run: @LazyWrites = ISNULL(@LazyWrites, 0)

    Is the error you’re getting a collation error or something related to nulls? Also, I’ll probably have to see how you modified this to know what’s going on.

  6. I haven’t made many adjustments to the code. Just inserting over a linked server. @LazyWrites = ISNULL(@LazyWrites, 0) is still in here. I haven’t removed anything.

    There are no errors. Just NULL values in these columns: [Batch_Requests_Sec], [Free_Pages], [Lazy_Writes_Sec], [Page_Lookups_Sec],[Page_Reads_Sec] ,[Page_Writes_Sec] ,[SQL_Compilations_Sec] ,[SQL_Recompilations_Sec]

    it’s really strange.

    • I’m sorry, I haven’t had time to play with this recently. My advice, if you haven’t figured this out already, would be to make sure the queries are working locally then start putting in collation hints everywhere for that server.

      The best way would probably be to have a service outside of SQL Server read the data from each server then put it on a central server, which I believe would eliminate worries about different collations, but that’s not exactly a quick and easy fix.

    • This data is so lightweight in both resources to retrieve it and store it that every minute wouldn’t hurt, but it also wouldn’t help much. The question is how often will you look at it and what will you be using it for. This isn’t one that you’ll kill your system by over-monitoring, you’d just have more results to filter through when you want to use it.

      If you’re trying to find out what’s causing PLE to drop and you want to know right when it’s dropping, every 5 minutes makes sense. If you’re looking for general baselines then every hour is fine.

      Other areas of monitoring are a lot worse, specifically getting things like what indexes are in the cache where it has to aggregate a lot of data you can cause damage to your performance by watching it too much. In all cases with all monitoring, know how much each query costs and put thought into if you’d take advantage of the data being that granular.

      A good starting point is every 30 minutes for this one. Check back on it in a week or two, look through the data and ask if you need more detail or if you’re collecting more than you’d ever use.

      Read this before you apply any monitoring:

  7. Nice post, very similar to what we are doing.

    I have set-up a CACTI poller with a custom PHP script that runs every 5 Minutes and pulls these statistics and puts them onto a CACTI graph.

    I have been running these for about 2 years and also an upgrade from SQL 2008 to 2012, it was interesting to see the difference 2012 made and how the performance tweaking over the years has affected the servers.

    a lot of your posts I have read this evening have given me new ideas of how we can further improve our monitoring. thanks for all the great posts.

    • I’m glad I was able to help. Please let me know if you run into any issues as this running in a variety of instances like this is more testing than I’m personally able to do.

      Just remember that while this counter is perfectly fine to capture every 5 minutes, not everything I blog about it. Some of them risk you being on the wrong side of the performance issues, especially for the ones that look in the cache.

  8. I had to modify the Cache_Hit_Ratio to show percentage
    (SELECT cntr_value FROM #OSPC WHERE counter_name = N’Buffer cache hit ratio’) * 1.0
    / (SELECT cntr_value FROM #OSPC WHERE counter_name = N’Buffer cache hit ratio base’) * 100

  9. Free Pages counter is no longer available in SQL Server 2012. Some counters have been standardized as “kb” rather than bytes or pages and a new Memory Node object has now been added in SQL 2012. Hence instead of Free Pages counter, you can now use Free Memory (kb) or Free Node Memory (KB) in 2012 and instead of Total Pages you can use Total Node Memory (KB) and so on

Leave a Reply to sqlbrady (@sqlbrady) Cancel reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s