TempDB memory leak?

I found a bug where I’m seeing TempDB use more memory than it should on multiple versions of SQL Server, especially on servers set up with common best practices. There’s a workaround that has a profound affect on server performance, adding to my belief that this is a legitimate bug and leading me to open a Connect Item on the issue.

Querying the Buffer Pool

I have a query to show me what’s in the buffer pool for the entire server, showing the expected results as well as excessive space being used by TempDB.  The newest version showing these details is on my post Querying the Buffer Pool.

It shows number of pages in the buffer pool grouped by the database, table, and index.  The query makes use of left joins so it can see space in memory that’s not currently allocated to a specific object.

The results are surprising in many ways.

The good surprises are seeing what indexes are hogging up your buffer pool so you have an idea of where to start tuning.  I’m a huge fan of this and have blogged about it in Cleaning Up the Buffer Pool to Increase PLE, although the name of my older post is misleading because it does more than just help memory management in SQL Server.

The Bug

The bad surprise was a bug which has been harassing me for quite some time now.  As I mentioned, the query will return all the space in the buffer pool, specifically the contents of sys.dm_os_buffer_descriptors, and does a left join to the tables leading up to and including sys.indexes so space not currently allocated to a table will show up.  The problem is that the space that shows up as unallocated for TempDB is much larger than expected, in this case taking up 1/3 of my buffer pool.

QueryBufferPool_TempDB

On this post I’m talking about a single server, but the problem wasn’t limited to a single server.  It showed up at the same time, caused by the same change (implementing a common best practice), partially resolved by the same partial rollback (undoing the best practice) on SQL 2008 R2, SQL 2012, and SQL 2014.

Details About the Bug

So the query I have on yesterday’s post, Querying the Buffer Pool, showed I had unallocated space in TempDB in memory, and a lot of it.  However, it doesn’t show details.

To start looking at the details, what kind of pages are these that exist in sys.dm_os_buffer_descriptors, but not in sys.allocation_units?

SELECT bd.page_type
	, MB = count(1) / 128
FROM sys.dm_os_buffer_descriptors bd
	LEFT JOIN sys.allocation_units au ON bd.allocation_unit_id = au.allocation_unit_id
WHERE bd.database_id = 2 --TempDB
	AND bd.is_modified = 0 --Let's not play dirty, only clean pages
	AND au.allocation_unit_id IS NULL --It's not even allocated
GROUP BY bd.page_type 
ORDER BY 2 DESC

TempDB_BufferPool_Unallocated

Ok, so we’re dealing with typical data in TempDB.  Well, other than it not being allocated, of course.

So I run another query to get more details.  This time I want to look inside the pages to see if they tell a different story.

SELECT TOP 100 bd.*
FROM sys.dm_os_buffer_descriptors bd
	LEFT JOIN sys.allocation_units au ON bd.allocation_unit_id = au.allocation_unit_id
WHERE bd.database_id = 2 --TempDB
	AND bd.is_modified = 0 --Let's not play dirty, only clean pages
	AND au.allocation_unit_id IS NULL --It's not even allocated

TempDB_PageLevel_Unallocated

Then I follow that up with Paul Randal’s How to use DBCC PAGE, which comes with all the disclaimers about using an undocumented and unsupported trace flag and command.  This one isn’t horrible in my mind or Paul’s comments, but remember the undocumented and unsupported parts.

DBCC TRACEON (3604);

DBCC PAGE (2, 5, 502219	, 0)
DBCC PAGE (2, 5, 374929	, 0)
DBCC PAGE (2, 5, 69868	, 0)
DBCC PAGE (2, 5, 453687	, 0)
DBCC PAGE (2, 5, 214988	, 0)
DBCC PAGE (2, 5, 440966	, 0)

DBCC TRACEOFF (3604);

The results all looked about the same to me.

DBCC_Page_Results

There are several important parts to me.  The m_objId is a negative value I can’t find in TempDB.sys.objects, so it WAS a temporary object that no longer exists.  Across the board, these are “NOT ALLOCATED”, “NOT CHANGED”, “NOT MIN_LOGGED”, “0_PCT_FULL”, so there’s nothing there.

To me it looks like temp objects made it into memory and remained in memory after the temporary objects were dropped.  I have no idea what objects these were or how they were dropped, but I’m imagining these were temp tables automatically dropped when the session was either closed or reset.

A Recent Change (A CLUE)

I found this by noticing that PLE for several servers was lower now than it has been in the past, so I was peeking in the buffer pool to see who was playing nice.  Going off of “when did PLE start to be lower” I noticed that I implemented a change around that time to use a common best practice.

That change was presizing TempDB data files to take up a vast majority of the dedicated LUN instead of letting them grow as needed.  It avoids waiting for file growth, especially if you’re using TDE (I’m not) and can’t use IFI (I can), but for several other reasons as well, including file fragmentation and the slight pause even IFI causes.  So at the start of all these festivities, I took the 4 TempDB data files from 100 MB each to 12 GB each, using up 48 GB of the 50 GB available.

A Workaround

Seeing this, I wanted to partially roll back the change the next opportunity I had.  100 MB was too small and I was aware that it invoked file growths every month (we reboot monthly for OS updates).  48 GB wasn’t right though, we just have that much space on the drive due to server build standards and paranoia (I’m a DBA).  So I went through our Idera Diagnostic Manager monitoring software and found the most space TempDB used, which is captured once an hour.  I found that 4.8 GB was the peak usage with several incidents of usage going over 4.5 GB.

With that information available and still not wanting an autogrowth for all the reasons listed above, I decided that all 4 files should be 1.5 GB, so 6 GB total.  That means peak usage was about 75% full, leaving plenty of room for error, especially with my baseline only being captured once an hour.  Autogrowth is set to 256 MB, so it’d add 1 GB total each growth.  I can live with that.

I can’t say it eliminated the issue because I still have 2 GB of unallocated TempDB space in cache, but it’s better than 8 GB.  It can be considered more acceptable than other issues I need to tackle right now, but it still bugs me.

What’s the Best Practice?

It’s a best practice to have TempDB data files on their own LUN, drive, array, however you want to word it.  Then it just make sense to have the total size of your data files add up to 90% or more of the drive size.  I see this advice everywhere, with these two standing out:

  • Solar Winds – Configuration Best Practices for SQL Server Tempdb–Initial Sizing
    • “Next, if you can give tempdb its own disk, then configure it to almost fill the drive. If nothing else will ever be on the drive, then you’re better off setting it to be larger than you’ll ever need. There’s no performance penalty, and you’ll never have to worry about autogrow again.”
  • Brent Ozar – SQL Server 2005/2008/2012/2014 Setup Checklist
    • “Notice that I don’t have filegrowth enabled.  You want to proactively create the TempDB files at their full sizes to avoid drive fragmentation.”

Jonathan Kehayias does it a little bit differently in his post SQL Server Installation Checklist saying to add space to TempDB files in 4 GB increments.  Although he doesn’t fill the drive by default, this isn’t mentioned by him, either.

Now I need to be perfectly clear on this, I trust these three sources.  I trust Jonathan and Brent more than I trust myself with setting up SQL Server.  I also feel the same about the authors I know on the Solar Winds post.  This does not change that.

Sizing TempDB like that often means it’s much larger than you need.  The workaround I’m using is to right-size these files instead.  For me, for now, I’m going to stick with seeing how large TempDB gets and make it slightly larger than that until I have a solid answer to my problem.

What Was It?

I still don’t know.  The workaround managed to knock it off of my priority list enough where I’m not actively working on it.  However, my drive to understand SQL Server better won’t leave me alone.

This post is my solution.  I have some very intelligent people reading this who I hope will at least lead me further down the rabbit hole, even if they don’t have a conclusive answer.  There’s a good chance I’ll be asking for help on Twitter with #sqlhelp or opening a connect item on this, for which I have a very well documented description of the issue that I can link to.

Updates:

2016-01-06 – Caching of Temporary Objects

Due to a comment, I started looking into the caching of temporary objects to see if this was the root cause.  The comment specifically mentioned Paul White’s (b|t) post Temporary Object Caching Explained, and I also read over Itzik Ben-Gan’s (b|t) post Caching Temporary Objects.

Both of these left me with the impression that smaller amounts of data would be left in the cache linked to temporary objects linked to the proc cache.  What I’m seeing is large amounts of data in the buffer pool that did not drop when I ran DBCC FREEPROCCACHE (on a test server that wasn’t in active use) as I expected if this was the full explanation.

While it’s very likely this is related to the issue on hand, I’m not ready to accept it as a full explanation.  If the memory associated with TempDB dropped when clearing the proc cache (on a test server) then it would have been a great explanation with a poor side effect of going too far with the memory being used.

2016-01-07 – Opened a Connect Item

I mentioned this issue on the comments of Paul White’s blog post mentioned in the last update and comments below on this post.  His response concluded with this:

So, when memory pressure is detected, I would expect memory use like this to be freed up for reuse in pretty short order, by design. If it is not, and bad things happen because memory for unallocated tempdb is not released/reused, that would be a bug.

While I was already leaning that way, it pushed me over the edge to decided it was time to open up a connect item on this issue.  I feel it’s well worth the read going to Paul’s post and the connect item.  Also, if you’re seeing this as well, an upvote on connect is very appreciated.

https://connect.microsoft.com/SQLServer/feedback/details/2215297

Advertisement

Cleaning up the Buffer Pool to Increase PLE

Chances are you have extra information in the buffer pool for a bad query and it’s dragging down your PLE, causing SQL Server to run slower because it’s reading more from disk. Although this approach is taking a 180 from my post Fixing Page Life Expectancy it has the same effect, with that post focusing on fixing your worst queries and this one focused on fixing your most misused indexes.  One approach doesn’t replace the other, it’s more like burning the candle at both ends, except that you end up with a better running database instead of getting burnt out.

With this approach we start with what’s in the cache.  You’ll see the same types of issues in almost any database, and this just happens to be a production database I’m going through today.

ScreenHunter_01 2014-01-06 14.52.28

The results were found with this query:

SELECT count(1)/128 AS cached_MB 
    , name 
    , index_id 
FROM sys.dm_os_buffer_descriptors AS bd with (NOLOCK) 
    INNER JOIN 
    (
        SELECT name = OBJECT_SCHEMA_NAME(object_id) + '.' + object_name(object_id)
            --name = 'dbo.' + cast(object_id as varchar(100))
            , index_id 
            , allocation_unit_id
        FROM sys.allocation_units AS au with (NOLOCK)
            INNER JOIN sys.partitions AS p with (NOLOCK) 
                ON au.container_id = p.hobt_id 
                    AND (au.type = 1 OR au.type = 3)
        UNION ALL
        SELECT name = OBJECT_SCHEMA_NAME(object_id) + '.' + object_name(object_id) 
            --name = 'dbo.' + cast(object_id as varchar(100))   
            , index_id
            , allocation_unit_id
        FROM sys.allocation_units AS au with (NOLOCK)
            INNER JOIN sys.partitions AS p with (NOLOCK)
                ON au.container_id = p.partition_id 
                    AND au.type = 2
    ) AS obj 
        ON bd.allocation_unit_id = obj.allocation_unit_id
WHERE database_id = db_id()
GROUP BY name, index_id 
HAVING Count(*) > 128
ORDER BY 1 DESC;

Service Broker Errors Table

First, there is an errors table here with Service_Broker_Errors, and that should never be in the top 10. What are we doing wrong? The index ID of 1 tells me right away that this is a clustered index, the fact that the table is at least 1.5 GB tells me that we’re probably not purging old data, and 1.5 GB in memory on this table is hinting that we’re probably doing a clustered index scan.

I’ll start by looking at the proc cache to see what’s going on. There’s only one execution plan that used that index and it is, in fact, doing a clustered index scan as I expected.

ScreenHunter_01 2014-01-06 15.24.05

Predicates or Seek Predicates

In an execution plan you’ll have a seek predicate and just a plain old predicate. The seek predicate is what you were able to do taking advantage of the index being in order, and the predicate is what you had to scan for.

ScreenHunter_01 2014-01-06 15.26.02

This case is easy because we’re only searching by a single column, but others could have both a seek predicate and a predicate. For instance, if I had an index on my customers table with the key columns Active, First_Name, Last_Name then searched where Active = 1 and Last_Name = ‘Hood’ then it will show up as an index seek with a seek predicate of Active = 1 and a predicate of Last_Name = ‘Hood’. Anyways, lets get back on topic with the issue of my Service_Broker_Errors table…

Now this sounds like a reasonable query looking for errors. I’m looking for the errors that occurred in the last so many days. The CONVERT_IMPLICIT(datetime,[@1],0) shows me that someone typed this the lazy way of GetDate()-1, which isn’t as efficient as DateAdd(Day, -1, GetDate()), but you’re getting me off topic again.

Fixing a useless index

Looking at the indexes on this table I realize there is only one, and it has the single key column of ID. For uniqueness you can’t do much better than an ID column, but you have to ask yourself if you’ll ever use it to query by.  In this case the index has never had a seek against it, only scans.  Although there table is rarely queried with only 4 uses in the last 2 months (I limited my historical data for this query), it’s still pulling 1.5 GB into cache for every use.  After a couple seconds of shaking my head I start to create a change request to make add TimeStamp in as the first key column in the clustered index.

ScreenHunter_01 2014-01-06 15.35.27

However, I then remembered that this table is 1.5 GB. Is that right? It’s an error table, so if it’s really 1.5 GB then I should either be cleaning up old data or there are so many problems that there is no reason I should be spending my time tuning. Seeing that it has 0 updates in the last 2 months, I already know it’s old data.  To double-check on this I run a simple query, keeping in mind the ID is still the clustered index, to find the oldest record, and discover that we haven’t removed anything from this table in years.

SELECT timestamp
FROM Service_Broker_Errors
WHERE id = (SELECT Min(ID) FROM Service_Broker_Errors)

So I have to talk to people about how old an error can be before we say we just don’t care. It was determined that we probably don’t care about anything more than a month old. I’m paranoid, it comes with the job title, so I made it three months with my change being this:

DECLARE @ID Int

SELECT @ID = MAX(ID) FROM Service_Broker_Errors WHERE TimeStamp < GETDATE()-90 

WHILE @@ROWCOUNT > 0 BEGIN
    DELETE TOP (10000)
    FROM Service_Broker_Errors
    WHERE ID <= @ID 
END 

IF EXISTS (SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID(N'[dbo].[Service_Broker_Errors]') AND name = N'PK_Service_Broker_Errors') 
    ALTER TABLE [dbo].[Service_Broker_Errors] 
    DROP CONSTRAINT [PK_Service_Broker_Errors] 
GO 

ALTER TABLE [dbo].[Service_Broker_Errors] 
ADD CONSTRAINT [PK_Service_Broker_Errors] PRIMARY KEY CLUSTERED 
( 
    [TimeStamp] ASC
    , [id] ASC 
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] 
GO 

The reason I did it like this is because I don’t want to fool around with the junk values as I drop the clustered index (rebuilding the table) then recreate the clustered index (rebuilding the table) with all of that information in there, then delete it all and wreak havoc on the new index. Sure, the fragmentation at the page level would be fine since I’m deleting in order, but that’s a lot of changes to the b-tree.

Finally, I add these lines to my nightly cleanup job to keep things trimmed up from now on:

DECLARE @BatchSize Int 

SELECT @BatchSize = 10000 

WHILE @@ROWCOUNT > 0 BEGIN
    DELETE TOP (@BatchSize)
    FROM Service_Broker_Errors
    WHERE TimeStamp < GETDATE()-90
END

I know, I just got done saying that GetDate()-90 was less efficient than DateAdd(Day, -90, GetDate()), but it’s an implicit conversion that’s done once per call. I admit this is a bad habit of mine, and any time there’s even a chance of this being called once per row I have to write it differently than I normally do. Nobody’s perfect, and the most important thing is that I’m getting rid of all those records using batches, right?

In Conclusion with Service Broker Errors

Now that all of this is done I took a 1.5 GB table with all of it in cache to a 20 MB table with only 1 MB in cache. Cleaning up the data had more of an impact that my index change, but it’s usually not that easy to clean stuff up. Even if I wasn’t able to clean up the data, the index change alone would have allowed for the cache usage to be decreased by at least 1.4 GB.

On to the next one

That’s not the only issue I’m seeing here, in fact I bet I could take half of these off the top 10 list. However, today I’m picking up low-hanging fruit and moving on to the next task on my list. In this case, External_Messages is the next one that doesn’t look right to me. Once again, it’s a clustered index (index_id = 1) of a table that isn’t part of the primary focus of the database, which is handling orders.

Starting with the same steps I look in the proc cache to see what put this here. Once again I see a single query in cache referencing the table, but this one is different. It’s the typical IF X = @X or @X IS NULL that we’ve all written, and it’s being used as the predicate.

ScreenHunter_01 2014-01-06 16.34.32

I check the indexes on the table and it has the ID as the only key column of the clustered index, so that’s not an issue. Why isn’t it able to use the clustered index with a seek predicate? After all, I’m rather certain that they almost always, if not always, call this with a parameter where it should be filtered down.

Lets look at the execution plan to make sure. The sniffed parameter value from the first time it was called, as found at the bottom of that statement in the XML version of the execution plan, is, in fact, a non-null value.

ScreenHunter_01 2014-01-06 16.39.02

However, SQL Server can’t guarantee that you’re going to pass it a non-null value, and it has to make an execution plan that can account for either possibility. I’ve seen this before, so I know the basic options for a single optional parameter (there are more options, with increasing complexity). I can either add OPTION (RECOMPILE) to the query or I can rewrite it to be two separate queries.

OPTION (RECOMPILE)

Here’s the option I didn’t choose. This will recompile the statement every run, which isn’t too much of a problem because it’s a quick compile on something that only runs a couple times a day. However, it will make it so this doesn’t show up in the proc cache anymore, which I take advantage of quite a bit (for instance, look at the blog post you’re currently reading). Also, it goes against my rule of using hints as a last resort.
That’s not saying you can’t do it this way, just that I didn’t. The reason it works is because SQL Server knows when it makes this execution plan that it will only be used once, so it doesn’t have to account for the possibility of it being a NULL value next run. In fact, it just makes the execution plan with your parameter as a constant value.

ScreenHunter_01 2014-01-06 16.51.09

Two queries

Here’s the method I decided to go with. Assuming third-normal form doesn’t apply to query text, you should be good to go here. If it’s simple enough then it will be very obvious to anyone editing this at a later point that they need to make any changes in two places.

CREATE PROC NotTheRealProcName
    @id INT = NULL
AS

IF @id IS NULL BEGIN 
    SELECT ID
        , Subject
        , Message
        , DateAdded 
    FROM ExternalMessage EM
END ELSE BEGIN
    SELECT ID
        , Subject
        , Message
        , DateAdded 
    FROM ExternalMessage EM
    WHERE ID = @ID
END

This solution isn’t too complex with just a single parameter as it only creates two queries, but if you get just three parameters and try to do this then you’re up to 8 queries. The OPTION (RECOMPILE) method starts to look pretty good as a quick and easy fix before too long. I would still prefer one of the long and drawn out methods, such as getting interim results to a temp table, making it look a little more complex, but getting it to run efficiently with a cached plan.

It worked as I expected, with no one actually getting all of the IDs. I left the code in place to make it possible to get all of them to avoid digging through all of the application code to find where it could return everything then make sure it gets changed there. The result I was looking for was cleaning up the cache, which I got when this dropped from 1.4 GB down to 25 MB in cache.

The End

Overall I was able to drop about 3 GB out of cache, making room for other data while raising the PLE, in two simple changes that were rather easy to track down just by saying “that doesn’t look like it belongs here”. Even the ones that do look like they belong there probably don’t need to have that large of a presence in the cache.

Although I didn’t actually measure the impact that it would have on the end users in these cases because I didn’t start with a performance complaint, I would imagine that SQL Server tends to run faster returning processing 1 row it found quickly instead of reading through a couple million. So, measured or not, I’ll count that as a success as well.

PS. Jonathan Kehayias helped me

My queries looking into the proc cache are slightly modified versions of Jonathan’s work on his blog. Honestly, it didn’t need tweaked or touched for any reason other than the fact that I learn by tinkering. I did help myself by adding in filtering by database because I have a couple servers that have multiple copies of the same database, but, again, it worked great before I touched it, which is something you should expect from Jonathan’s work.

Here’s my altered version:

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
DECLARE @IndexName SYSNAME = '[PK_ExternalMessage]'; 
DECLARE @DatabaseName SYSNAME;

SELECT @DatabaseName = '[' + DB_NAME() + ']';

WITH XMLNAMESPACES
   (DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan')
SELECT
    n.value('(@StatementText)[1]', 'VARCHAR(4000)') AS sql_text,
    n.query('.'),
    cp.plan_handle,
    i.value('(@PhysicalOp)[1]', 'VARCHAR(128)') AS PhysicalOp,
    i.value('(./IndexScan/@Lookup)[1]', 'VARCHAR(128)') AS IsLookup,
    i.value('(./IndexScan/Object/@Database)[1]', 'VARCHAR(128)') AS DatabaseName,
    i.value('(./IndexScan/Object/@Schema)[1]', 'VARCHAR(128)') AS SchemaName,
    i.value('(./IndexScan/Object/@Table)[1]', 'VARCHAR(128)') AS TableName,
    i.value('(./IndexScan/Object/@Index)[1]', 'VARCHAR(128)') as IndexName,
    i.query('.'),
    STUFF((SELECT DISTINCT ', ' + cg.value('(@Column)[1]', 'VARCHAR(128)')
       FROM i.nodes('./OutputList/ColumnReference') AS t(cg)
       FOR  XML PATH('')),1,2,'') AS output_columns,
    STUFF((SELECT DISTINCT ', ' + cg.value('(@Column)[1]', 'VARCHAR(128)')
       FROM i.nodes('./IndexScan/SeekPredicates/SeekPredicateNew//ColumnReference') AS t(cg)
       FOR  XML PATH('')),1,2,'') AS seek_columns,
    RIGHT(i.value('(./IndexScan/Predicate/ScalarOperator/@ScalarString)[1]', 'VARCHAR(4000)'), len(i.value('(./IndexScan/Predicate/ScalarOperator/@ScalarString)[1]', 'VARCHAR(4000)')) - charindex('.', i.value('(./IndexScan/Predicate/ScalarOperator/@ScalarString)[1]', 'VARCHAR(4000)'))) as Predicate,
    cp.usecounts,
    query_plan
FROM (  SELECT plan_handle, query_plan
        FROM (  SELECT DISTINCT plan_handle
                FROM sys.dm_exec_query_stats WITH(NOLOCK)) AS qs
        OUTER APPLY sys.dm_exec_query_plan(qs.plan_handle) tp
      ) as tab (plan_handle, query_plan)
INNER JOIN sys.dm_exec_cached_plans AS cp 
    ON tab.plan_handle = cp.plan_handle
CROSS APPLY query_plan.nodes('/ShowPlanXML/BatchSequence/Batch/Statements/*') AS q(n)
CROSS APPLY n.nodes('.//RelOp[IndexScan/Object[@Index=sql:variable("@IndexName") and @Database=sql:variable("@DatabaseName")]]' ) as s(i)
--WHERE i.value('(./IndexScan/@Lookup)[1]', 'VARCHAR(128)') = 1
OPTION(RECOMPILE, MAXDOP 1);