Database Growth Trends – Idera Diagnostic Manager

How fast are your databases growing?  Should I panic about a 1 TB database that has 100 GB free on disk?  Am I safe with a 100 GB database that also has 100 GB free?  Based on those sizes…I have no idea.

It’s possible that the 1 TB database is purging data as fast as it takes it in, so it’s been at 1 TB for a long time and may never use the 100 GB it has free.  Looking at the 100 GB database, it may have been a new 1 MB database created two months ago and grew 1 MB at a time to get to where it’s at now. (Bonus points if you panicked reading that last sentence.)

The thing to look for is how big that database was before.  What did you write down for how much space it used last week, last month, and several months ago?  Do you remember all those stock broker commercials saying “Past performance doesn’t predict future results” which is supposed to lead you on to think “Yeah, but it’s the best indicator I have.”  The same goes for database growths.  So, you did write down what it was before, right?

My Default Monitoring Message

I’ll copy/paste this section on multiple posts, so feel free to skip it if you’ve seen it before.

The biggest issue is the word “before”.  If you weren’t watching your servers before then you have nothing to compare the current state of your server to.  Also, if you’re watching it consistently then you’ll be able to make an apples to apples comparison.

Your production servers should have some kind of monitoring software in place, and the monitoring software is already capturing this type of information to help make all the pretty charts and graphs you see through the GUI.  You need this software to run so you can get alerted when things go wrong, and pulling this data will just make it more valuable.

Monitoring software doesn’t perform magic, so you can pull this data on your own if you don’t have any.  However, that’s taking on a lot of responsibility, time, and effort.  If something is collecting the data already, take advantage of that.

For me, I use Idera Diagnostic Manager, so my queries are going to look at the data collected using this tool.  Other monitoring tools are going to collect the same basic data in a database you can read.

If anyone from one of these companies wants to provide me with scripts that fill the same need, I’ll make a write up on them and give you credit for being awesome supporting your product like that.  I can only write against what I can test against, it’s nothing personal.

What I’m Watching

I want to know when my files will hit an autogrowth and when they’ll run out of space.

Running out of space is pretty obvious.  If you’re out of space, every command that needs more space will fail, and users will run to the supply closet for torches and pitchforks.  It’s a bad day for the DBA.

Autogrowth is less of an issue and less obvious.  That’s not saying it’s not an issues, especially for larger databases that can’t take advantage of IFI.  The command that needs more space, and any that pile up behind it, will wait patiently (barring any timeouts) for the growth to take place.  However, SQL Server can be more patient than your users or applications.

That being said, if I knew I’d need more space than is allocated in the database using a simple script then I could do it when no one was waiting for the space.  No complaints, just make the files larger and move on with my life.  Oh, happy days.

To get all of this information I need to know a couple things:

  • How much space is used in each filegroup
  • How much space is allocated to each filegroup
  • How much can each filegroup grow according to disk free space and growth rates

Idera DM Captures That

Well, Idera does a pretty good job.  It captures the data size (used), data expansion (growth by growth rate and disk free space), and data file size (allocated).  The problem is that it captures this stuff on the database level, not the filegroup level.  This means that it’s possible for a filegroup to run out of space while the database as a whole has tons of free space.

I’m currently working in an environment where all data files for a single database are on the same drive, so it’s less concerning to me.  It is something to keep in mind, and may be very important to your environment.  If this is critical to you, I touch on how to capture this data yourself below.

If separating filegroups isn’t an issue for you (it often isn’t) then all you have to do to trend growths with Idera is to write a very long script that hits SQLdmRepository..DatabaseSize a couple times.  My script goes a little like this:

/*
MaxUsed is always from a week long range.  The reason is that this captures any spikes in usage such as off-hours index maintenance.
*/

;WITH MostRecent AS
(
	SELECT X.DatabaseID
		, X.UTCCollectionDateTime
		, S.InstanceName
		, D.DatabaseName
		, DataFileSizeMB = DataFileSizeInKilobytes/1024
		, DataFilePotentialSizeMB = (X.DataFileSizeInKilobytes + X.DataExpansionInKilobytes)/1024
	FROM (
			SELECT DS.DatabaseID
				, RowNum = DENSE_RANK() OVER (PARTITION BY DS.DatabaseID ORDER BY UTCCollectionDateTime DESC)
				, UTCCollectionDateTime
				, DataFileSizeInKilobytes
				, DataExpansionInKilobytes
			FROM SQLdmRepository..DatabaseSize DS
			WHERE UTCCollectionDateTime > GETUTCDATE()-7 
		) X
		INNER JOIN SQLdmrepository..SQLServerDatabaseNames D ON X.DatabaseID = D.DatabaseID
		INNER JOIN SQLdmrepository..MonitoredSQLServers S ON S.SQLServerID = D.SQLServerID 
	WHERE RowNum = 1
)
, ThisWeek AS
(
	SELECT DS.DatabaseID
		, MaxUsedMB = Max(DataSizeInKilobytes + TextSizeInKilobytes + IndexSizeInKilobytes)/1024
	FROM SQLdmRepository..DatabaseSize DS
	WHERE UTCCollectionDateTime > GETUTCDATE()-7 
	GROUP BY DS.DatabaseID 
)
, OneMonth AS
(
	SELECT DS.DatabaseID
		, MaxUsedMB = Max(DataSizeInKilobytes + TextSizeInKilobytes + IndexSizeInKilobytes)/1024
	FROM SQLdmRepository..DatabaseSize DS
	WHERE UTCCollectionDateTime BETWEEN GETUTCDATE()-37 AND GETUTCDATE()-30 
	GROUP BY DS.DatabaseID 
)
, ThreeMonth AS
(
	SELECT DS.DatabaseID
		, MaxUsedMB = Max(DataSizeInKilobytes + TextSizeInKilobytes + IndexSizeInKilobytes)/1024
	FROM SQLdmRepository..DatabaseSize DS
	WHERE UTCCollectionDateTime BETWEEN GETUTCDATE()-97 AND GETUTCDATE()-90 
	GROUP BY DS.DatabaseID 
)
, SixMonth AS
(
	SELECT DS.DatabaseID
		, MaxUsedMB = Max(DataSizeInKilobytes + TextSizeInKilobytes + IndexSizeInKilobytes)/1024
	FROM SQLdmRepository..DatabaseSize DS
	WHERE UTCCollectionDateTime BETWEEN GETUTCDATE()-187 AND GETUTCDATE()-180 
	GROUP BY DS.DatabaseID 
)
, OneYear AS
(
	SELECT DS.DatabaseID
		, MaxUsedMB = Max(DataSizeInKilobytes + TextSizeInKilobytes + IndexSizeInKilobytes)/1024
	FROM SQLdmRepository..DatabaseSize DS
	WHERE UTCCollectionDateTime BETWEEN GETUTCDATE()-365 AND GETUTCDATE()-358 
	GROUP BY DS.DatabaseID 
)
, PreResults AS 
(
	SELECT MR.InstanceName
		, MR.DatabaseName 
		, MR.DataFileSizeMB
		, ThisWeek_MaxUsedMB = CAST(W.MaxUsedMB AS INT)
		, OneMonth_MaxUsedMB = CAST(M1.MaxUsedMB AS INT)
		, ThreeMonth_MaxUsedMB = CAST(M3.MaxUsedMB AS INT)
		, SixMonth_MaxUsedMB = CAST(M6.MaxUsedMB AS INT)
		, OneYear_MaxUsedMB = CAST(M12.MaxUsedMB AS INT)
		, NextGrowthInDays_OneMonth = CAST((MR.DataFileSizeMB-W.MaxUsedMB)/(W.MaxUsedMB - M1.MaxUsedMB + .0001)*30 AS BIGINT)
		, NextGrowthInDays_ThreeMonth = CAST((MR.DataFileSizeMB-W.MaxUsedMB)/(W.MaxUsedMB - M3.MaxUsedMB + .0001)*90 AS BIGINT)  
		, NextGrowthInDays_SixMonth = CAST((MR.DataFileSizeMB-W.MaxUsedMB)/(W.MaxUsedMB - M6.MaxUsedMB + .0001)*180 AS BIGINT)  
		, NextGrowthInDays_OneYear = CAST((MR.DataFileSizeMB-W.MaxUsedMB)/(W.MaxUsedMB - M12.MaxUsedMB + .0001)*360 AS BIGINT)  
		, OutOfSpaceInDays_OneMonth = CAST((MR.DataFilePotentialSizeMB-W.MaxUsedMB)/(W.MaxUsedMB - M1.MaxUsedMB + .0001)*30 AS BIGINT)  
		, OutOfSpaceInDays_ThreeMonth = CAST((MR.DataFilePotentialSizeMB-W.MaxUsedMB)/(W.MaxUsedMB - M3.MaxUsedMB + .0001)*90 AS BIGINT)  
		, OutOfSpaceInDays_SixMonth = CAST((MR.DataFilePotentialSizeMB-W.MaxUsedMB)/(W.MaxUsedMB - M6.MaxUsedMB + .0001)*180 AS BIGINT)  
		, OutOfSpaceInDays_OneYear = CAST((MR.DataFilePotentialSizeMB-W.MaxUsedMB)/(W.MaxUsedMB - M12.MaxUsedMB + .0001)*360 AS BIGINT) 
	FROM MostRecent MR
		INNER JOIN ThisWeek W ON MR.DatabaseID = W.DatabaseID
		LEFT JOIN OneMonth M1 ON MR.DatabaseID = M1.DatabaseID AND W.MaxUsedMB > M1.MaxUsedMB
		LEFT JOIN ThreeMonth M3 ON MR.DatabaseID = M3.DatabaseID AND W.MaxUsedMB > M3.MaxUsedMB
		LEFT JOIN SixMonth M6 ON MR.DatabaseID = M6.DatabaseID AND W.MaxUsedMB > M6.MaxUsedMB
		LEFT JOIN OneYear M12 ON MR.DatabaseID = M12.DatabaseID AND W.MaxUsedMB > M12.MaxUsedMB
)
SELECT InstanceName
	, DatabaseName 
	, NextGrowthInDays_Min = CASE WHEN NextGrowthInDays_OneMonth <= ISNULL(NextGrowthInDays_ThreeMonth , 1000000)
										AND NextGrowthInDays_OneMonth <= ISNULL(NextGrowthInDays_SixMonth, 1000000) 
										AND NextGrowthInDays_OneMonth <= ISNULL(NextGrowthInDays_OneYear, 1000000)
									THEN NextGrowthInDays_OneMonth 
								WHEN NextGrowthInDays_ThreeMonth <= ISNULL(NextGrowthInDays_SixMonth, 1000000)
										AND NextGrowthInDays_ThreeMonth <= ISNULL(NextGrowthInDays_OneYear, 1000000)
									THEN NextGrowthInDays_ThreeMonth
								WHEN NextGrowthInDays_SixMonth <= ISNULL(NextGrowthInDays_OneYear, 1000000)
									THEN NextGrowthInDays_SixMonth
								ELSE NextGrowthInDays_OneYear
							END
	, OutOfSpaceInDays_Min = CASE WHEN OutOfSpaceInDays_OneMonth <= ISNULL(OutOfSpaceInDays_ThreeMonth, 1000000)
										AND OutOfSpaceInDays_OneMonth <= ISNULL(OutOfSpaceInDays_SixMonth, 1000000)
										AND OutOfSpaceInDays_OneMonth <= ISNULL(OutOfSpaceInDays_OneYear, 1000000)
									THEN OutOfSpaceInDays_OneMonth 
								WHEN OutOfSpaceInDays_ThreeMonth <= ISNULL(OutOfSpaceInDays_SixMonth, 1000000)
										AND OutOfSpaceInDays_ThreeMonth <= ISNULL(OutOfSpaceInDays_OneYear, 1000000)
									THEN OutOfSpaceInDays_ThreeMonth
								WHEN OutOfSpaceInDays_SixMonth <= ISNULL(OutOfSpaceInDays_OneYear, 1000000)
									THEN OutOfSpaceInDays_SixMonth
								ELSE OutOfSpaceInDays_OneYear
							END
	, DataFileSizeMB
	, ThisWeek_MaxUsedMB
	, OneMonth_MaxUsedMB 
	, ThreeMonth_MaxUsedMB 
	, SixMonth_MaxUsedMB
	, OneYear_MaxUsedMB
	, NextGrowthInDays_OneMonth 
	, NextGrowthInDays_ThreeMonth 
	, NextGrowthInDays_SixMonth 
	, NextGrowthInDays_OneYear 
	, OutOfSpaceInDays_OneMonth
	, OutOfSpaceInDays_ThreeMonth
	, OutOfSpaceInDays_SixMonth
	, OutOfSpaceInDays_OneYear 
FROM PreResults
WHERE OutOfSpaceInDays_OneMonth < 100
	OR OutOfSpaceInDays_ThreeMonth < 100
	OR OutOfSpaceInDays_SixMonth < 100
	OR OutOfSpaceInDays_OneYear < 100
	OR NextGrowthInDays_OneMonth < 15
	OR NextGrowthInDays_ThreeMonth < 15
	OR NextGrowthInDays_SixMonth < 15
	OR NextGrowthInDays_OneYear < 15
ORDER BY DataFileSizeMB DESC

What this does is look at the trends based on what the database used 30, 90, 180, and 360 days ago to try to predict the future.  By “predict the future” I mean it’s not perfect, and can’t be perfect.  However, it gives you very good data with a few false positives and very few false negatives.

Speaking of this not being perfect, it double-dips.  If there are two databases growing on the same drive with 10 GB free, it says that each one has 10 GB of growth available.  I was not able to rewrite this to take that into account while also accounting for databases with multiple files in a single filegroup spread across multiple drives.

Now I’m two weeks ahead of projected file growths and over three months ahead of projected out-of-space errors.

This is very basic information, and I’d have trouble imagining any SQL monitoring software not capturing this in a table you can query.

If I Didn’t Have Idera…

If I didn’t have Idera, or any other monitoring software I could query, I’d be going out at least once a day and pulling the allocated and used space from every file on every server.  I may want to do it more often to find peak usage, such as when online index rebuilds are using twice their typical space over the weekends; I take advantage of Idera capturing this hourly.  I’ll capture it using something like this:

SELECT @SQL = 
'USE [?]

INSERT INTO DBA..FileSizes (DateAdded, Database_ID, File_ID, Data_Space_ID, SizeMB, UsedMB)
	select GETDATE() 
		, Database_ID = DB_ID()
		, f.File_ID
		, f.data_space_id
		, SizeMB = ((f.size*8/1024)) 
		, UsedMB = (FileProperty(f.name, ''SpaceUsed'')*8/1024) 
	from sys.database_files f
	'

exec sp_MSforeachdb @SQL

Then I’d need to see how much free space I have on each drive, so I’d go to Ed Wagner’s article Time and Space: How to Monitor Drive Space in SQL Server.  Knowing that we use MountPoints in places, I’d have to change his script to use “volume” instead of “logicaldisk” (also mentioned in his comments).  This minor change is about the best case scenario you’ll run across.

Remember, the scripts you find on the internet are typically designed to work perfect on the AUTHOR’s system, not yours.  It’s not Ed’s fault they don’t use MountPoints, in fact he should probably be congratulated for not making his environment more complex than it needs to be.  He should also be congratulated on not making a change to his article that he can’t properly test.  At least you know you have to be extra careful testing this change instead of being led to believe the author did something they couldn’t do properly.

After getting the drive’s free space, I’d have to look at the file growth rates to see how large my files could grow.  Keep in mind that growth in sys.database_files could be either a percentage or the number of 8kb pages the file will grow by, depending on the value in is_percent_growth.  Static growths are easy, just divide by 128 and that’s how many MBs it will grow by.  It’s easier (and better for your server) to change percentage growths to MB than it is to look up the logarithmic functions used to calculate compound interest that also apply here.

Now that you have the free space on the drive and the growth rates on the files, you can calculate the maximum size the file can grow to.  Use this with trends on the used space to calculate when that size won’t be enough for you.

Make sure you test your results, you’ll have to support it.

So Your Database Is Growing…

Databases get larger, it’s what they do.  A normal DBA then goes through and makes sure that there’s room for that growth.  A better DBA will do that, too, but then follow it up by figuring out why the database is growing and try to justify it.

I have seen way too many databases out there that don’t purge old data that’s useless to the company.  Sometimes the purge was never written, sometimes the code to do that is never being called for several somewhat humorous reasons.  Any way you look at it, the data is growing and it’s easy to fall into the trap of just adding space every month, forever.

The way I handle this is by watching my tables and seeing how they’re growing.  These are treated just like databases where the largest one isn’t necessarily the growth issue, and looking at historical data to see the growth trend is a better solution.

In the past I wrote a post called Monitoring Database and Table Sizes to get this info yourself.  In the future, I’ll write Table Growth Trends – Idera Diagnostic Manager.  At the time of this writing, I’m planning on the Table Growth post coming out three weeks after this one.  Until then, I put this script up on my Idera Diagnostic Manager Scripts page, which is where I keep these posts organized and where I put my scripts to wait for me to blog about them.

 

Advertisements

I/O Baseline Comparison – Idera Diagnostic Manager

What servers need the most attention when it comes to I/O?  When I increase the memory on a server what effect does it have on I/O?  What was it before the change and what was it afterwards?

These are tough questions, and I have a script that I hope will help you out.

Most monitoring tools, including Idera, are great at looking at what a single server is doing right now, but not so much with comparing performance and loads between multiple larger windows of time and seeing which servers are doing the most work.  So I wrote a script to see how much I/O load I’m doing on each server and compare that to how much I did in that same time window in previous weeks.

Your servers will vary, a lot.  They’ll vary from my servers, and they’ll vary from each other as well.  Depending on the workload, specifically the difference between OLTP and OLAP, higher I/O may be expected and additional memory will have less of an impact.  For OLAP, if you have 32 GB of memory and read a 64 GB table followed by a different 92 GB table then it’s all physical reads.  If you up your memory to 48 GB, they’re still all physical reads.  The cost to up this server (or VM) is the same as the cost to upgrade an OLTP server from 32 GB to 48 GB, but the story will end quite a bit differently there.

The problem is that this change, like any other change, should be measured before, measured afterwards, then have the costs of the change analyzed and justified.  The results you’ll see below coupled with my Wait Stats Baseline Comparison using Idera post will help you figure out where to make a change like that and how to measure it.

There are other reasons you’d want to look at this data as well.  Even a new index that dramatically dropped the physical reads for one query could have the opposite effect on your server as a whole.  The point is, even something very straightforward needs to be verified that it’s doing what you expect it to.  Looking at the details you see in the canned reports is great, but so is the ability to stand back and look at the big picture that I’ll get into here.

The solutions I’ve put into place using this type of query to varied from query tuning, timing of larger processes, adding memory like I mentioned above, buying faster disks, and many other options including acknowledging we were doing a ton of reads and leaving it be.

My Default Monitoring Message

I’ll copy/paste this section on multiple posts, so feel free to skip it:

The biggest issue is the word “before”.  If you weren’t watching your servers before then you have nothing to compare the current state of your server to.  Also, if you’re watching it consistently then you’ll be able to make an apples to apples comparison.

Your production servers should have some kind of monitoring software in place, and the monitoring software is capturing this type of information to help make all the pretty charts and graphs you see through the GUI.  You need this software to run so you can get alerted when things go wrong, but pulling this data will just make in more valuable.

Monitoring software doesn’t perform magic, so you can pull this data on your own if you don’t have any.  However, that’s taking on a lot of responsibility, time, and effort.  If something is collecting the data already, take advantage of that.

For me, I use Idera Diagnostic Manager, so my queries are going to look at the data collected using this tool.  Other monitoring tools are going to collect the same basic data in a database you can read.

If anyone from one of these companies wants to provide me with scripts that fill the same need, I’ll tack them on here and give you credit for writing them.  I can only write against what I can test against, it’s nothing personal.

I/O Differences Script

What I look for are the true apples-to-apples comparisons, so I’m typically looking at a longer time range (I like 24 hours) during peak hours (12:00 to 21:00 UTC on weekdays for me, yours will differ, even server-to-server) comparing this week to the last couple weeks.

DECLARE @StartTime DateTime
	, @EndTime DateTime
	, @InstanceName sysname
	, @Weekdays bit
	, @BusinessHours bit
		
SET @EndTime = GetUTCDate()
SET @StartTime = DateAdd(Hour, -24, @EndTime)
SET @InstanceName = NULL --Do 'Server\Instance' for individual server
SET @Weekdays = 1
SET @BusinessHours = 1

SELECT S.InstanceName
	, StartTime = @StartTime 
	, EndTime = @EndTime
	, Reads_GB = CAST(SUM(St.PageReads)/128/1024.0 AS DEC(20,1)) 
	, ReadAhead_GB = CAST(SUM(St.ReadAheadPages)/128/1024.0 AS DEC(20,1))
	, Writes_GB = CAST(SUM(St.PageWrites)/128/1024.0 AS DEC(20,1)) 
	, Lookups_GB = CAST(SUM(St.PageLookups)/128/1024.0 AS DEC(20,1)) 
	, PctPhysical = CAST(CAST(SUM(St.PageReads)/128/1024.0 AS DEC(20,1)) / CAST(SUM(St.PageLookups)/128/1024.0 AS DEC(20,1)) * 100 as DEC(20,1))
	, AvgPLE = Avg(St.PageLifeExpectancy)
	, AvgCache_MB = AVG(St.BufferCacheSizeInKilobytes)/1024
FROM SQLdmRepository..MonitoredSQLServers S
	INNER JOIN SQLdmRepository..ServerStatistics St ON S.SQLServerID = St.SQLServerID
WHERE UTCCollectionDateTime BETWEEN @StartTime AND @EndTime 
	AND (DATEPART(WEEKDAY, UTCCollectionDateTime) BETWEEN 2 and 6 or @Weekdays = 0)
	AND (DATEPART(HOUR, UTCCollectionDateTime) BETWEEN 12 and 21 OR @BusinessHours = 0)
	AND (UPPER(S.InstanceName) = UPPER(@InstanceName) OR @InstanceName IS NULL)
GROUP BY S.InstanceName
ORDER BY 4 DESC

SELECT @StartTime = @StartTime - 7 
	, @EndTime = @EndTime - 7

SELECT S.InstanceName
	, StartTime = @StartTime 
	, EndTime = @EndTime
	, Reads_GB = CAST(SUM(St.PageReads)/128/1024.0 AS DEC(20,1)) 
	, ReadAhead_GB = CAST(SUM(St.ReadAheadPages)/128/1024.0 AS DEC(20,1))
	, Writes_GB = CAST(SUM(St.PageWrites)/128/1024.0 AS DEC(20,1)) 
	, Lookups_GB = CAST(SUM(St.PageLookups)/128/1024.0 AS DEC(20,1)) 
	, PctPhysical = CAST(CAST(SUM(St.PageReads)/128/1024.0 AS DEC(20,1)) / CAST(SUM(St.PageLookups)/128/1024.0 AS DEC(20,1)) * 100 as DEC(20,1))
	, AvgPLE = Avg(St.PageLifeExpectancy)
	, AvgCache_MB = AVG(St.BufferCacheSizeInKilobytes)/1024
FROM SQLdmRepository..MonitoredSQLServers S
	INNER JOIN SQLdmRepository..ServerStatistics St ON S.SQLServerID = St.SQLServerID
WHERE UTCCollectionDateTime BETWEEN @StartTime AND @EndTime 
	AND (DATEPART(WEEKDAY, UTCCollectionDateTime) BETWEEN 2 and 6 or @Weekdays = 0)
	AND (DATEPART(HOUR, UTCCollectionDateTime) BETWEEN 12 and 21 OR @BusinessHours = 0)
	AND (UPPER(S.InstanceName) = UPPER(@InstanceName) OR @InstanceName IS NULL)
GROUP BY S.InstanceName
ORDER BY 4 DESC

SELECT @StartTime = @StartTime - 7 
	, @EndTime = @EndTime - 7

SELECT S.InstanceName
	, StartTime = @StartTime 
	, EndTime = @EndTime
	, Reads_GB = CAST(SUM(St.PageReads)/128/1024.0 AS DEC(20,1)) 
	, ReadAhead_GB = CAST(SUM(St.ReadAheadPages)/128/1024.0 AS DEC(20,1))
	, Writes_GB = CAST(SUM(St.PageWrites)/128/1024.0 AS DEC(20,1)) 
	, Lookups_GB = CAST(SUM(St.PageLookups)/128/1024.0 AS DEC(20,1)) 
	, PctPhysical = CAST(CAST(SUM(St.PageReads)/128/1024.0 AS DEC(20,1)) / CAST(SUM(St.PageLookups)/128/1024.0 AS DEC(20,1)) * 100 as DEC(20,1))
	, AvgPLE = Avg(St.PageLifeExpectancy)
	, AvgCache_MB = AVG(St.BufferCacheSizeInKilobytes)/1024
FROM SQLdmRepository..MonitoredSQLServers S
	INNER JOIN SQLdmRepository..ServerStatistics St ON S.SQLServerID = St.SQLServerID
WHERE UTCCollectionDateTime BETWEEN @StartTime AND @EndTime 
	AND (DATEPART(WEEKDAY, UTCCollectionDateTime) BETWEEN 2 and 6 or @Weekdays = 0)
	AND (DATEPART(HOUR, UTCCollectionDateTime) BETWEEN 12 and 21 OR @BusinessHours = 0)
	AND (UPPER(S.InstanceName) = UPPER(@InstanceName) OR @InstanceName IS NULL)
GROUP BY S.InstanceName
ORDER BY 4 DESC


SELECT @StartTime = @StartTime - 7 
	, @EndTime = @EndTime - 7

SELECT S.InstanceName
	, StartTime = @StartTime 
	, EndTime = @EndTime
	, Reads_GB = CAST(SUM(St.PageReads)/128/1024.0 AS DEC(20,1)) 
	, ReadAhead_GB = CAST(SUM(St.ReadAheadPages)/128/1024.0 AS DEC(20,1))
	, Writes_GB = CAST(SUM(St.PageWrites)/128/1024.0 AS DEC(20,1)) 
	, Lookups_GB = CAST(SUM(St.PageLookups)/128/1024.0 AS DEC(20,1)) 
	, PctPhysical = CAST(CAST(SUM(St.PageReads)/128/1024.0 AS DEC(20,1)) / CAST(SUM(St.PageLookups)/128/1024.0 AS DEC(20,1)) * 100 as DEC(20,1))
	, AvgPLE = Avg(St.PageLifeExpectancy)
	, AvgCache_MB = AVG(St.BufferCacheSizeInKilobytes)/1024
FROM SQLdmRepository..MonitoredSQLServers S
	INNER JOIN SQLdmRepository..ServerStatistics St ON S.SQLServerID = St.SQLServerID
WHERE UTCCollectionDateTime BETWEEN @StartTime AND @EndTime 
	AND (DATEPART(WEEKDAY, UTCCollectionDateTime) BETWEEN 2 and 6 or @Weekdays = 0)
	AND (DATEPART(HOUR, UTCCollectionDateTime) BETWEEN 12 and 21 OR @BusinessHours = 0)
	AND (UPPER(S.InstanceName) = UPPER(@InstanceName) OR @InstanceName IS NULL)
GROUP BY S.InstanceName
ORDER BY 4 DESC

The Results

This is, by default, going to show you your biggest I/O users that are being monitored.  It’s great to know, but it’s not the whole picture.  Some servers are going to be almost pure OLTP, others are going to be almost pure OLAP, and most will be some kind of mix.  The more OLAP you do, the more physical reads (Reads_GB and ReadAhead_GB) should be expected.  It’s really not a bad thing, it’s how your databases are used.

Then there’s the realistic version of the users’ expectation (not always the version they tell you) to consider.  Are the queries coming back as fast as they should?  Large annual reports can take a bit, but a website that takes more than a couple seconds isn’t going to be used.

This report shows the load you’re throwing at the disks, but user expectations are measured in seconds, not gigabytes.  Yeah, more gigabytes means more seconds, but it’s not like there’s a solid ratio.

That being said, the true value of this information is when you combine it with the type of server it is, the reasonable expectations of the users, the amount of time the server is waiting on I/O, and so much more.  To start you out, look at the Wait Stats Baseline Comparison using Idera, too.

If there are OLTP servers that stand out as doing a ton of physical reads and the wait stats show that it’s an issue as well, look into more memory and more time for tuning.  Take it a step further and use all of the trending info (not just this report) to split out I/O load and plan for faster disks.  There’s a lot of potential to use this info to do a lot of good.

My Idera Scripts

I keep all the scripts that query Idera Diagnostic Manager in one place, on my Idera Diagnostic Manager Scripts page.  Before this script had it’s own post, it was posted there.  As I make posts for each script, that page will live on as a directory of what scripts I have to get more use out of this product.

This is the first time I’m doing something like this, but I’ll eventually make other pages that pull together topics like this as well.  It feels good to find a great way to take blogging to the next level like this.

Speaking of taking it to the next level, let me know your thoughts on this script, others you’d like to see, and anything else in the comments below.

Wait Stats Baseline Comparison – Idera Diagnostic Manager

Database servers have to wait on different resources, and these waits are huge to the performance of SQL Server.  Sometimes something changes without our knowledge and is running differently than normal, this is the first place you look to see what’s going on.  On the other side, you may have made a change that could have been anything from increasing memory, tuning a query, moving to different hardware, changing indexes, etc..  All of these things should reduce waits, but how effective were they?

The trick is to know what things looked like before so you have something to compare it to.  If you know what it looked like before then you’ll know how much of a difference you made.

My Default Monitoring Message

I’ll copy/paste this section on multiple posts, so feel free to skip it if you’ve seen it before.

The biggest issue is the word “before”.  If you weren’t watching your servers before then you have nothing to compare the current state of your server to.  Also, if you’re watching it consistently then you’ll be able to make an apples to apples comparison.

Your production servers should have some kind of monitoring software in place, and the monitoring software is already capturing this type of information to help make all the pretty charts and graphs you see through the GUI.  You need this software to run so you can get alerted when things go wrong, and pulling this data will just make it more valuable.

Monitoring software doesn’t perform magic, so you can pull this data on your own if you don’t have any.  However, that’s taking on a lot of responsibility, time, and effort.  If something is collecting the data already, take advantage of that.

For me, I use Idera Diagnostic Manager, so my queries are going to look at the data collected using this tool.  Other monitoring tools are going to collect the same basic data in a database you can read.

If anyone from one of these companies wants to provide me with scripts that fill the same need, I’ll make a write up on them and give you credit for being awesome supporting your product like that.  I can only write against what I can test against, it’s nothing personal.

The Script

There are 3 parameters at the start, and that’s typically all I ever change.  A couple things stick out with these parameters.  The times are in UTC, so I default to GetUTCDate().  I want the time range to be at least an hour so I’m not letting a single query that ran skew the results too much.  Then I’m also converting my instance name to upper case because Idera DM may very well be your only database that’s case sensitive, so I cheat by putting everything in upper case.

My upper case trick isn’t a recommended trick.  It causes scans and possible bad joins.  In this case I know it’s against small tables and that I don’t have two servers with the same name using different cases.  It’s safe here, but not everywhere.

DECLARE @StartTime DateTime
	, @EndTime DateTime
	, @InstanceName sysname
		
SET @EndTime = GetUTCDate()
SET @StartTime = DateAdd(Hour, -24, @EndTime)
SET @InstanceName = UPPER('Server\Instance')

IF OBJECT_ID('tempdb..#IgnoredWaits') IS NOT NULL BEGIN
    DROP TABLE #IgnoredWaits
END

IF OBJECT_ID('tempdb..#Now') IS NOT NULL BEGIN
    DROP TABLE #Now
END

IF OBJECT_ID('tempdb..#Yesterday') IS NOT NULL BEGIN
    DROP TABLE #Yesterday
END

IF OBJECT_ID('tempdb..#LastWeek') IS NOT NULL BEGIN
    DROP TABLE #LastWeek
END

IF OBJECT_ID('tempdb..#TwoWeeks') IS NOT NULL BEGIN
    DROP TABLE #TwoWeeks
END

CREATE TABLE #IgnoredWaits (Wait_Type NVarChar(60) PRIMARY KEY) 

--Values taken from Paul Randal of SQLskills
--http://www.sqlskills.com/blogs/paul/wait-statistics-or-please-tell-me-where-it-hurts/
INSERT INTO #IgnoredWaits 
SELECT N'BROKER_EVENTHANDLER'            
UNION SELECT N'BROKER_RECEIVE_WAITFOR'
UNION SELECT N'BROKER_TASK_STOP'               
UNION SELECT N'BROKER_TO_FLUSH'
UNION SELECT N'BROKER_TRANSMITTER'             
UNION SELECT N'CHECKPOINT_QUEUE'
UNION SELECT N'CHKPT'                          
UNION SELECT N'CLR_AUTO_EVENT'
UNION SELECT N'CLR_MANUAL_EVENT'               
UNION SELECT N'CLR_SEMAPHORE'
UNION SELECT N'DBMIRROR_DBM_EVENT'             
UNION SELECT N'DBMIRROR_EVENTS_QUEUE'
UNION SELECT N'DBMIRROR_WORKER_QUEUE'          
UNION SELECT N'DBMIRRORING_CMD'
UNION SELECT N'DIRTY_PAGE_POLL'                
UNION SELECT N'DISPATCHER_QUEUE_SEMAPHORE'
UNION SELECT N'EXECSYNC'                       
UNION SELECT N'FSAGENT'
UNION SELECT N'FT_IFTS_SCHEDULER_IDLE_WAIT'    
UNION SELECT N'FT_IFTSHC_MUTEX'
UNION SELECT N'HADR_CLUSAPI_CALL'              
UNION SELECT N'HADR_FILESTREAM_IOMGR_IOCOMPLETION'
UNION SELECT N'HADR_LOGCAPTURE_WAIT'           
UNION SELECT N'HADR_NOTIFICATION_DEQUEUE'
UNION SELECT N'HADR_TIMER_TASK'                
UNION SELECT N'HADR_WORK_QUEUE'
UNION SELECT N'KSOURCE_WAKEUP'                 
UNION SELECT N'LAZYWRITER_SLEEP'
UNION SELECT N'LOGMGR_QUEUE'                   
UNION SELECT N'ONDEMAND_TASK_QUEUE'
UNION SELECT N'PWAIT_ALL_COMPONENTS_INITIALIZED'
UNION SELECT N'QDS_PERSIST_TASK_MAIN_LOOP_SLEEP'
UNION SELECT N'QDS_SHUTDOWN_QUEUE'
UNION SELECT N'QDS_CLEANUP_STALE_QUERIES_TASK_MAIN_LOOP_SLEEP'
UNION SELECT N'REQUEST_FOR_DEADLOCK_SEARCH'    
UNION SELECT N'RESOURCE_QUEUE'
UNION SELECT N'SERVER_IDLE_CHECK'              
UNION SELECT N'SLEEP_BPOOL_FLUSH'
UNION SELECT N'SLEEP_DBSTARTUP'                
UNION SELECT N'SLEEP_DCOMSTARTUP'
UNION SELECT N'SLEEP_MASTERDBREADY'            
UNION SELECT N'SLEEP_MASTERMDREADY'
UNION SELECT N'SLEEP_MASTERUPGRADED'           
UNION SELECT N'SLEEP_MSDBSTARTUP'
UNION SELECT N'SLEEP_SYSTEMTASK'               
UNION SELECT N'SLEEP_TASK'
UNION SELECT N'SLEEP_TEMPDBSTARTUP'            
UNION SELECT N'SNI_HTTP_ACCEPT'
UNION SELECT N'SP_SERVER_DIAGNOSTICS_SLEEP'    
UNION SELECT N'SQLTRACE_BUFFER_FLUSH'
UNION SELECT N'SQLTRACE_INCREMENTAL_FLUSH_SLEEP'
UNION SELECT N'SQLTRACE_WAIT_ENTRIES'          
UNION SELECT N'WAIT_FOR_RESULTS'
UNION SELECT N'WAITFOR'                        
UNION SELECT N'WAITFOR_TASKSHUTDOWN'
UNION SELECT N'WAIT_XTP_HOST_WAIT'             
UNION SELECT N'WAIT_XTP_OFFLINE_CKPT_NEW_LOG'
UNION SELECT N'WAIT_XTP_CKPT_CLOSE'            
UNION SELECT N'XE_DISPATCHER_JOIN'
UNION SELECT N'XE_DISPATCHER_WAIT'             
UNION SELECT N'XE_TIMER_EVENT'

SELECT s.InstanceName
	, t.WaitType
	, WaitTime_Minutes = SUM(WaitTimeInMilliseconds)/1000/60
	, dStart = Min(UTCCollectionDateTime)
	, dEnd = Max(UTCCollectionDateTime)
INTO #Now
FROM SQLdmrepository..WaitStatisticsDetails wsd
	INNER JOIN SQLdmrepository..WaitTypes t ON wsd.WaitTypeID = t.WaitTypeID
	INNER JOIN SQLdmrepository..WaitStatistics ws ON ws.WaitStatisticsID = wsd.WaitStatisticsID
	INNER JOIN SQLdmrepository..MonitoredSQLServers s ON ws.SQLServerID = s.SQLServerID
WHERE ws.UTCCollectionDateTime BETWEEN @StartTime and @EndTime
	AND	UPPER(s.InstanceName) = @InstanceName
	AND t.WaitType NOT IN (SELECT Wait_Type COLLATE SQL_Latin1_General_CP1_CI_AS FROM #IgnoredWaits)
	--AND t.WaitType LIKE 'PAGEIO%'
GROUP BY s.InstanceName, t.WaitType

SELECT @StartTime = @StartTime - 1
	, @EndTime = @EndTime - 1

SELECT s.InstanceName
	, t.WaitType
	, WaitTime_Minutes = SUM(WaitTimeInMilliseconds)/1000/60
	, dStart = Min(UTCCollectionDateTime)
	, dEnd = Max(UTCCollectionDateTime)
INTO #Yesterday
FROM SQLdmrepository..WaitStatisticsDetails wsd
	INNER JOIN SQLdmrepository..WaitTypes t ON wsd.WaitTypeID = t.WaitTypeID
	INNER JOIN SQLdmrepository..WaitStatistics ws ON ws.WaitStatisticsID = wsd.WaitStatisticsID
	INNER JOIN SQLdmrepository..MonitoredSQLServers s ON ws.SQLServerID = s.SQLServerID
WHERE ws.UTCCollectionDateTime BETWEEN @StartTime and @EndTime
	AND	UPPER(s.InstanceName) = @InstanceName	
	AND t.WaitType NOT IN (SELECT Wait_Type COLLATE SQL_Latin1_General_CP1_CI_AS FROM #IgnoredWaits)
	--AND t.WaitType LIKE 'PAGEIO%'
GROUP BY s.InstanceName, t.WaitType

SELECT @StartTime = @StartTime - 6
	, @EndTime = @EndTime - 6

SELECT s.InstanceName
	, t.WaitType
	, WaitTime_Minutes = SUM(WaitTimeInMilliseconds)/1000/60
	, dStart = Min(UTCCollectionDateTime)
	, dEnd = Max(UTCCollectionDateTime)
INTO #LastWeek
FROM SQLdmrepository..WaitStatisticsDetails wsd
	INNER JOIN SQLdmrepository..WaitTypes t ON wsd.WaitTypeID = t.WaitTypeID
	INNER JOIN SQLdmrepository..WaitStatistics ws ON ws.WaitStatisticsID = wsd.WaitStatisticsID
	INNER JOIN SQLdmrepository..MonitoredSQLServers s ON ws.SQLServerID = s.SQLServerID
WHERE ws.UTCCollectionDateTime BETWEEN @StartTime and @EndTime
	AND	UPPER(s.InstanceName) = @InstanceName	
	AND t.WaitType NOT IN (SELECT Wait_Type COLLATE SQL_Latin1_General_CP1_CI_AS FROM #IgnoredWaits)
	--AND t.WaitType LIKE 'PAGEIO%'
GROUP BY s.InstanceName, t.WaitType

SELECT @StartTime = @StartTime - 7
	, @EndTime = @EndTime - 7

SELECT s.InstanceName
	, t.WaitType
	, WaitTime_Minutes = SUM(WaitTimeInMilliseconds)/1000/60
	, dStart = Min(UTCCollectionDateTime)
	, dEnd = Max(UTCCollectionDateTime)
INTO #TwoWeeks
FROM SQLdmrepository..WaitStatisticsDetails wsd
	INNER JOIN SQLdmrepository..WaitTypes t ON wsd.WaitTypeID = t.WaitTypeID
	INNER JOIN SQLdmrepository..WaitStatistics ws ON ws.WaitStatisticsID = wsd.WaitStatisticsID
	INNER JOIN SQLdmrepository..MonitoredSQLServers s ON ws.SQLServerID = s.SQLServerID
WHERE ws.UTCCollectionDateTime BETWEEN @StartTime and @EndTime
	AND	UPPER(s.InstanceName) = @InstanceName	
	AND t.WaitType NOT IN (SELECT Wait_Type COLLATE SQL_Latin1_General_CP1_CI_AS FROM #IgnoredWaits)
	--AND t.WaitType LIKE 'PAGEIO%'
GROUP BY s.InstanceName, t.WaitType

SELECT TOP 10 n.InstanceName
	, n.WaitType
	, Now_Minutes = n.WaitTime_Minutes
	, Yesterday_Minutes = d1.WaitTime_Minutes
	, OneWeek_Minutes = d7.WaitTime_Minutes
	, TwoWeeks_Minutes = d14.WaitTime_Minutes  
FROM #Now n
	LEFT JOIN #Yesterday d1 ON n.WaitType = d1.WaitType
	LEFT JOIN #LastWeek d7 ON n.WaitType = d7.WaitType
	LEFT JOIN #TwoWeeks d14 ON n.WaitType = d14.WaitType 
ORDER BY n.WaitTime_Minutes DESC

The Results

The results pretty much speak for themselves.  What were the top 10 things you waited on recently and how much did you wait on them in the past.  It’s great to have a view of it where you’re comparing totals side-by-side.  Seeing the same thing in charts just doesn’t have the same effect for me.

These numbers are going to have natural variances to them.  Did a user run a couple large queries during one interval but not another?  Was the shared storage under different loads?  Really anything can come up.  The longer the period the more you smooth out these numbers.

So, the real question is what do these waits mean?  I’m not going to pretend like I can cover all the wait stats, their meanings, and how to affect them in this post.  What I can say is that there are good resources out there for you.

Paul Randal (b|t) has the best source I know of on his post Wait Statistics, or please tell me where it hurts.

CXPACKET

One wait type I like to talk about, and discredit a bit, is CXPACKET.  This wait type is just saying you used parallelism, nothing more.  I first covered this topic in my post Why worry about CXPACKET.

Think of it in terms of a manager who delegates a large project to multiple employees.  As the employees are hard at work accomplishing the task, the manager goes about their day and records their time as CXPACKET.  The task is typically completed in less overall duration because it was a group effort, but the total amount of work done goes up some because there’s overhead and small duplicated tasks when you delegate to a group.

That less overall duration could make or break your application, so the idea of having CXPACKET isn’t necessarily bad (setting MAXDOP = 1 is rarely a good idea).  However, a task being so large that it needed to be delegated to multiple employees may mean it’s not efficient, making query and index tuning a good approach.  Also, your manager may be too eager to delegate their tasks, so possibly adjust your Cost Threshold for Parallelism as suggested in Jeremiah Peschka’s (b|t) post Five SQL Server Settings to Change.

As I said before, I can’t cover all the wait types here, so go to Paul Randal’s post I suggested above for a better list.

What I Skipped

This is looking specifically at WaitTimeInMilliseconds, which is one of two technically correct things to do.  It does not look at ResourceWaitTimeInMilliseconds, which is the other technically correct thing to do.  I wrote it one way and was happy with the results so I didn’t change it or clutter it with returning both.

Here’s the difference so you can decide if you want to change it or talk me into changing mine.  It’s a conversation I’d enjoy…I’m strange like that.

ResourceWaitTimeInMilliseconds is how long you’re waiting on your actual resource.  If you’re waiting for a lock, this is how long you waited for that lock to be made so you could get back in line for CPU time.

WaitTimeInMilliseconds also includes the time from when you get in line for CPU time until you’re using the CPU.  This additional time is called your signal wait time.

The reason I like using WaitTimeInMilliseconds is that the reason you gave up the CPU and had to get back in line is due to the fact you weren’t ready due to what you were waiting on.  In that way, it’s the whole picture.

The argument going the other way is that you were only waiting for the resource.  It’s not the resource’s fault there was a line all queued up at the CPU when it got there.  Why should I tack on extra time on a non-CPU wait for that?

I’m undecided on this part, but having the info both ways allows you to figure out the signal wait time on your entire instance and see if you have too much CPU contention.  There’s no arguing about that being a good thing to know.

Steal my query, make it your own, and write that part the way that works best for you.

When Idera Finds Out What I’m Doing….

Someone asked me for queries pulling from Diagnostic Manager, and I posted on Twitter when I made my Idera Diagnostic Manager Scripts page.  The outcry and rage from this was…very positive.

Vicky Harp (t), who is now Idera’s Corporate Strategist wrote pieces of Diagnostic Manager and its schema, took time out of her day to meet with me and talk about the details of querying the data they collect.

Scott Stone (t) who is Diagnostic Manager’s Product Manager also jumped in very quickly to see if I’d like to be a beta tester for them, bringing real power users into the loop to make their products better.

This is the level of support you want from your monitoring software.

While I don’t officially endorse or recommend any specific software at this time, I do recommend you look for responses like this.