This page is to document my original content of Fixing Page Life Expectancy (PLE) before a major rewrite. The problem was that this post was one of my earlier ones and it crept up the search engine results, which you may think is great news (it is). However, I thought the content really didn’t live up to where I felt it should be and needed some drastic changes.
Being that so many people have seen it this way and some may have quoted what I said, I made it available here with a link to this on the new post.
So, here we go with the original content:
Page Life Expectancy (PLE) is perhaps the best single indicator of SQL performance, and this will help you raise it the right way. I’ll talk in this post on how to make PLE higher by looking at your server one way, then I have my post on Cleaning Up the Buffer Pool that is getting to the same results another way. Keep in mind that there’s always more than one way to do things. It’s not that one is right and the other is wrong, it’s actually that both are right and should be used together.
In my last post on OS Performance Counters I showed how I capture it, but stopped there. Now that you know how to measure it, lets get to work on changing those measurements!
My descriptions are the simplified version that doesn’t account for pages left free or other details about how SQL Server handles memory. My blog is meant to help people understand the concepts, not take the deep dives you’ll find on an MVP’s site. While there’s more to it, you need to learn Newton’s laws of motion before you can call yourself a rocket scientist.
This counter shows the average age of the pages stored in the page cache, which accounts for most of the memory you allocated to SQL Server. The highest this number can be is the number of seconds since SQL Server started, and it can technically drop as low as 0 if everything was just flushed out of memory. If this counter says 300 that means that the average page in cache was only there for 5 minutes. If you’re talking about a server with 512 GB of RAM that has 6% of memory left for the OS and 480 GB left for SQL Server to play with, then a vast majority of that 480 GB is being read from your physical disks every 5 minutes. If you try explaining that to your storage team then you’ll understand why I have a burning drive with a caption of “PLE = 300” on my last post.
As far as I know, 300 is still the limit Microsoft recommends that PLE stays above. It’s not that this is bad advice, it’s that it was accurate for the 32-bit systems with up to 4 GB of RAM on a high-end system that existed when it was written. If you look at Page 36 of Troubleshooting SQL Server – A Guide for the Accidental DBA by Jonathan Kehayias and Ted Krueger you’ll see an actual scalable version of this; PLE should be 300 for every 4 GB of RAM on your server. On that 512 GB server above, you should have (512/4)*300 PLE, or 38,400. I’m not sure if I would recommend a number quite that high, but it’s a lot better than saying 300.
This type of information is the difference between a data analyst and a DBA…
Queries don’t read from or write to disk for the data they want, only to TempDB for processing. For reads, queries tell SQL Server what information they need and SQL Server reads it into cache if it wasn’t there already. If it does have to read it into cache and the cache was already full then it removes other pages from the cache to make room. On the writes side it writes to cache, marks it as a dirty page (the cached version has a different value than what’s on disk), then it eventually writes the correct values to disk during a checkpoint that shows up in OS Perf Counters as the Writes Per Second. If there is too much going on and pages need to be written to disk outside of a checkpoint then they are recorded as Lazy Writes which probably slowed down a query while they were being performed.
Good to know, why do I care?
You need to know this because it shows how important the cache is and you need to know the basics of how SQL Server works to fix it. Going to disk is slower than memory, I don’t care what kind of disks you have. If you’re abusing your disks then your users will notice. So now you just need to know if you’re abusing your disks and what to do about it.
PLE is the main indicator, along with physical (from disk) and logical (from cache) reads per second, and writes and lazy writes per second. To keep it simple, we’ll just keep this talk about PLE, and only focus on the reads. Also, this is referring to OLTP environments and OLAP brings up its own challenges and expectations. I’m doing this to get the point across without diving too deep.
So, with this simplified view, there are a couple things that will cause PLE to be lower than you want it that are OK. Database maintenance such as index reorg or rebuilds will pull a lot into cache, and updating stats will pull about 1% of the data into cache with default settings. These are expected. It’s also why these jobs should be kicked off outside of peak hours for that server. Beyond skipping beneficial maintenance, you’re going to just take these hits and schedule them when that hit won’t hurt so bad.
Then there are other reasons it would drop that aren’t as good. If you see PLE sitting at 30,000 all day then it drops to 1,000 then you just had a query request a lot of data that was not in cache. 95% of the time, that means that if you look at the execution plan of the offending query you’ll see a table scan or clustered index scan of one of your larger tables.
Search and Destroy!!!
It’s good to know what you’re looking for on that execution plan, but what query was it? Lets start searching… What queries were running at the time that did a lot of reads? If you have a trace capturing anything over 10 seconds in duration then there’s a good chance it would have been captured there. What queries do you have running in this time frame that did the most physical reads? If you capture sys.dm_exec_query_stats on a regular basis you can find that (oooh, I’ll owe you a post with that statement, won’t I?). There are several ways to find it, you just have to be a little careful because some queries do a TON of logical reads which aren’t as bad; you want to tune for physical reads before logical if you care about PLE.
Now to find out where those reads are coming from. There’s a couple ways to do this. You can look at the estimated execution plan in Prod for that query and you’ll usually find a table or clustered index scan that had a high percent of the total cost of the plan. Or, you could use my favorite method by including the actual execution plan and running this on a somewhat current NONPROD environment:
SET STATISTICS IO ON BEGIN TRAN EXEC OffendingProc ROLLBACK
Look at the messages returned by this query and focus on the logical reads. Yeah, you probably weren’t expecting me to say logical, but you’re in NonProd now (please tell me you’re not in Prod) and you’re not working with the same cache. One table usually pops out as the most offensive, and you can usually do something about it through indexing. It doesn’t always have to be a scan to be crushing your hopes and dreams of a fast server, it could be an overly large seek or tons of key lookups. No matter what it is, indexing needs to be considered as a possible solution, but things aren’t perfectly simple and it is not the only solution.
If PLE is low and stays low then there are two basic conclusions, your database needs tuned or you need more RAM. Some people will say that buying RAM is throwing money at your problem, but time is money and tuning is throwing money at the problem, too. Tuning is the preferred method as it truly fixes issues and scales better, but you need to weigh your time and the benefits you’ll get out of it against the price of continuously cheaper RAM. Keep in mind that there are recommendations that Brent Ozar infamously discussed how cheap 64 GB of RAM is, while also discussing the value of your time by stating that any time you spend on task A is time you don’t have for task B. I’m in no way saying that buying memory is the answers to all your problems, just that there are times when it’s cheaper to buy $500 worth of RAM to avoid $5,000 of time thrown at tuning when you have other problems that need you.
Lets get on to the tuning!!!
Tuning is the direction I’m hoping you go with this, it’s the right choice most of the time. Tuning scales well, RAM buys time. The trick is that you’re not going to tune your entire database, you’re going to start with low-hanging fruit and work your way up to the middle. That stuff at the top is only touched on a one-off basis when users specifically complain about it. There are two places to start with this. Before I tell you what they are, I want to tell you that I’m not just messing with you here, and many apps have both of these issues….too many indexes and not enough indexes.
What’s lowest then? Unused and duplicate indexes are just sitting on the ground waiting for you. I wrote all about finding them in my Indexes – Unused and Duplicates post, but didn’t get into how they affect PLE. If an index is truly unused, it won’t affect anything other than disk space. However, if an index is used in any way at all then it’s read into cache, even if all you’re doing is writing to it. If you have an index you write to a lot and never read it then drop it, you’re reading that thing into cache and never benefiting from it, not to mention your queries are writing to an extra place they shouldn’t need to. If you have two indexes that have the same first two key fields and both are being used, you can probably find a way to combine them into a single index; if both are being used then you have 2 indexes worth of data in cache, if you combine them and the resulting index is 10% larger than the originals then you just went from 200 to 110 in how much you could potentially have in cache. I’m sure there’s times when this isn’t true, but a wider index in cache is going to complete your query faster than a narrow index on disk, especially if that narrow index is going to do key lookups.
You’ll want to look at your query stats to get the top 10 most expensive queries as far as physical reads go and work through them. I mentioned it earlier that you should be monitoring sys.dm_exec_query_stats, and you’re going to use your monitored version more than the current version. The current version has everything since the last recompile. Some of the data here is from overnight tasks, and some of your mid-day heavy hitters were just recompiled because auto-stats just kicked off for one of the tables in the query. The monitored version will capture this data every so often (it’s your monitoring, you pick how often), and you can query your results for specific times of the day. This is AWESOME!!!! Instead of what’s in your proc cache now, you’re talking about being able to query “What did the most amount of physical reads between 10 AM and 4 PM on weekdays for the last month?”
You can also take another approach to this by looking at what’s in cache and start trying to trim it down. If you see a couple indexes that are taking up 10% of your cache each then it’s worth looking into them to see if they’re being used improperly, such as doing a full scan of a ten million rows to grab 100 rows. I really get into the details of that in my post Cleaning Up the Buffer Pool.
There’s more to PLE than what I had to say here.
- Paul Randal talks about NUMA nodes and PLE
- Paul Randal talks about wasted space in the buffer pool that would kill your PLE
- Jonathan Kehayias talks about the plan cache and PLE
- Brent Ozar does comprehensive tuning and even offers online classes
- Kendra Little explains index tuning and how you should just DO IT!
You have a start at where to go now, so you’re left with two choices…. You can either say “Thank You” or “WTF?!?! Ignorance was bliss!!! WAS!!!!”