Monitoring Database and Table Sizes


Trending database and table sizes helps give you an idea of what to expect, and, sometimes, points out problems and their root causes. I even go so far as to monitor the disk space, file growth rates, and have a report going out telling me that “according to your current growth trends, you’ll run out of space on this day”. That last part will have to be another post since getting disk space can be kinda tricky depending on how your disks are set up; you need to go outside of SQL Server to grab the size of mountpoints if you use them. Don’t worry about mountpoints right now, if you don’t know the term that usually means that you can swap out the word “drive” and have everything I say be accurate for your environment.

First, this is lightweight stuff, there’s really no reason to capture it more than once a day, and it will only add up to several MB for a year’s worth of data. You’re not going to kill yourself doing this, you’re not going to stare at the data every day taking up your time, but you’re going to look like a heavyweight when you put this into action and reference this data to others.

Second, I do have other similar posts doing snapshots of this same info for Table and File sizes. It’s basically the same thing just different in how you’re using it and if you’ll have a historical view of it.

Database sizes

Lets start with the databases because, well, they’re bigger. I capture it all, the database name, logical file name, file type (row, logs, etc), filegroup, allocated space, used space, max size, growth rate, if it’s percent growth, and the drive/mountpoint it’s on. Other than dividing the appropriate numbers by 1024 twice to store all my sizes in MB, everything gets stored in native format. The reason for that is you have raw data in a format easy for you to verify and easy for any outsider (consultant, new employee, etc) to understand everything you have. I may make some of you cringe because I don’t normalize this; it’s small data and not worth the effort, especially with page compression (see Brad McGehee’s post on compression) turned on.

What I like doing is having automated reports with logic behind it saying “if drive size is X, file growth is Y, then you can grow to a potential size of Z”, follow that up with “the used space in this database has grown X in the last 90 days and Y in the last 30 days, so according to those trends I’ll hit the potential size in Z days”. One of my favorite reports takes that last Z as a parameter and tells me anything that will run out of space before a specific number of days. I get that report daily for anything that will run out of space in the next two weeks and a separate instance of that report monthly telling me what will run out of space in the next 100 days. For me to run out of space without having a couple automated emails sitting in my inbox yelling at me saying “See, I told you that would happen!!!”, a database has to do something crazy and out of character. The only part of that last sentence that doesn’t really happen is the automated email yelling, those voices don’t exist outside of my head (do they?).

Even without monitoring disk space, you can predict autogrowths. This is useful for trying preventing autogrowth all together if you’re into that, but with instant file initialization this has never been a spot I’ve chosen to spend my time on. However, I do care about fragmentation, and I have a report that will tell me if we continue growing according to our 30 or 90 day trends then the data files will have to grow X times. If X is higher than 4 and filegrowth is below 2 GB chunks, it shows up on my report and I change the filegrowth to make it more reasonable. If you allow percent filegrowths in your environment (just say no), or even if one sneaks through, things get complex and you have to stroll through financial websites to find the calculations for compounding interest to figure out how much the files will grow.

Cast(CEILING(LOG((Cast((ProjectedGrowthInAMonthMB - FileGroupFreeMB) as Dec(20,4)) + Cast(FileSizeMB as Dec(20,4)))/(Cast(FileSizeMB as Dec(20,4)))) /LOG(1+(Cast(Growth as Dec(20,4))/100))) AS Int)

Something inside that database is growing, but what is it?

Now if a database is growing, the first thing someone will ask you is “why is it growing”. That’s an answer I can’t give you, but “where is it growing” I can do. This is why I’m also tracking the table sizes in a database. Be fair to yourself here and admit it up front that you don’t care if a table changes size radically if the largest size is still insignificant to that database. So make up rules on “I want to watch tables that are bigger than…” and only store the stats for those tables to keep your little collections from being the source of your size issues. For me, that means that a table has to be both over 100MB in total size including indexes and over 0.5% of the size of the database. I’ve used these sizes for a while now and have never had an issue on either side; no one needed info I didn’t have for space trending issues and the collection time and space I used never caused any complaints. This isn’t saying that someone won’t ask “did that table have 10 or 11 records this time last month”, but I’m not willing to collect enough to answer every little question.

If a database is growing at an alarming rate it’s easy to assume the largest table is causing it. That’s not always the case, and it’s not too uncommon for a database change to either create a new table that grows quick intentionally or accidentally cause an old table to stop cleaning itself up. No matter what the reason is, if you want to see how a database is growing, look at your table stats documenting the growth. Developers always listen to you better when you tell them this is what happened and here’s how I know while the words “I think” are viewed as instant discredidation.

How am I supposed to do this stuff?

You’ll need somewhere to store all the info. I like to store two weeks of info on the local server and pull everything to a central server nightly where data is held for a full 13 months. However, to keep me under control, lets just focus on getting everything to the local server in the following tables.

CREATE TABLE [Drives](
    [DateAdded] [smalldatetime] NOT NULL,
    [DriveLetter] [char](1) NOT NULL,
    --[MountPoint] [smallint] NOT NULL,
    [CapacityMB] [int] NULL,
    [FreeSpaceMB] [int] NULL,
 CONSTRAINT [DBInven_Drives_DriveLetter_MountPoint_DateAdded] PRIMARY KEY CLUSTERED 
(
    [DateAdded] ASC,
    [DriveLetter] ASC,
    --[MountPoint] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]
GO

CREATE TABLE [Files](
    [DateAdded] [smalldatetime] NOT NULL,
    [DatabaseName] [sysname] NOT NULL,
    [FileName] [nvarchar](128) NOT NULL,
    [TypeDesc] [nvarchar](60) NULL,
    [FileGroup] [sysname] NULL,
    [SizeMB] [int] NULL,
    [UsedMB] [int] NULL,
    [FreeMB] [int] NULL,
    [MaxSizeMB] [int] NULL,
    [Growth] [int] NULL,
    [IsPercentGrowth] [bit] NOT NULL,
    [DriveLetter] [char](1) NOT NULL,
    --[MountPoint] [smallint] NOT NULL,
 CONSTRAINT [DBInven_Files_DatabaseID_FileID_DateAdded_U_C_IX] PRIMARY KEY CLUSTERED 
(
    [DateAdded] ASC,
    [DatabaseName] ASC,
    [FileName] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]
GO

CREATE TABLE [Tables](
    [DateAdded] [datetime] NOT NULL,
    [DatabaseName] [nvarchar](128) NOT NULL,
    [SchemaName] [nvarchar](128) NOT NULL,
    [TableName] [nvarchar](128) NOT NULL,
    [RowCounts] [bigint] NULL,
    [AllocatedMB] [int] NULL,
    [DataSizeMB] [int] NULL,
    [IndexSizeMB] [int] NULL,
    [PercentOfDB] [decimal](5, 2) NULL,
 CONSTRAINT [PK_Tables] PRIMARY KEY CLUSTERED 
(
    [DateAdded] ASC,
    [DatabaseName] ASC,
    [SchemaName] ASC,
    [TableName] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]
GO

If you use mountpoints, and if you call them “mountpoint” followed by an incrementing digit, and you never have more than can fit into a tinyint, uncommenting those pieces of the tables will do wonders for you. In case you didn’t guess, they’re commented out because the people who do use them probably don’t use them that way.

DECLARE @RunTime SmallDateTime
DECLARE @SQL NVarChar(max)

SET @RunTime = (SELECT Max(DateAdded) FROM Perf.DBInven.Drives)

SET @SQL = N'use [?]; 
    INSERT INTO Perf..Files 
    select ''' + cast(@RunTime as nvarchar(50)) + ''' 
        , DatabaseName = db_name()
        , FileName = f.Name
        , TypeDesc = f.Type_Desc
        , FileGroup = fg.Name
        , SizeMB = ((f.size*8/1024)) 
        , UsedMB = (FileProperty(f.name, ''SpaceUsed'')*8/1024) 
        , FreeMB = ((f.size-FileProperty(f.name, ''SpaceUsed''))*8/1024) 
        , MaxSizeMB = Case f.Max_Size
            WHEN -1 THEN -1
            ELSE f.Max_Size / 128
            END
        , f.Growth
        , f.Is_Percent_Growth
        , DriveLetter = left(f.physical_name,1) 
        /*, MountPoint = CASE substring(f.physical_name,4,10)
            WHEN ''MountPoint'' THEN 
                Case isnumeric(substring(f.physical_name,14,4)) --MountPoint >= 1000
                WHEN 1 THEN substring(f.physical_name,14,4)
                ELSE Case isnumeric(substring(f.physical_name,14,3)) --MountPoint >= 100
                    WHEN 1 THEN substring(f.physical_name,14,3)
                    ELSE Case isnumeric(substring(f.physical_name,14,2)) --MountPoint >= 10
                        WHEN 1 THEN substring(f.physical_name,14,2)
                        ELSE substring(f.physical_name,14,1) -- MountPoint single digit
                        END
                    END
                END
            ELSE Cast(0 as varchar(10)) --Root drive is MountPoint 0 
            END*/
    from sys.database_files f
        LEFT JOIN sys.filegroups fg ON f.data_space_id = fg.data_space_id
    '

exec sp_MSforeachdb @SQL

And now to grab the tables. Keep in mind that I specifically only want to grab stuff over 0.5% of the database size AND over 100 MB. You’ll see that towards the end of the code.

DECLARE @SQL NVarChar(max)
DECLARE @DateAdded SmallDateTime

SELECT @DateAdded = MAX(DateAdded) FROM Perf.DBInven.Drives 

SET @SQL = ' Use [?];
    DECLARE @DatabaseSize Float

    IF OBJECT_ID(''tempdb..#TableSizes'') IS NOT NULL BEGIN
          DROP TABLE #TableSizes
    END

    SELECT DatabaseName = DB_NAME()
    , TableName = a2.name
    , SchemaName = a3.name
    , RowCounts = a1.rows
    , TableSizeMB = (a1.reserved + ISNULL(a4.reserved,0)) / 128
    , DataSizeMB = a1.data / 128
    , IndexSizeMB = (CASE WHEN (a1.used + ISNULL(a4.used,0)) > a1.data 
                        THEN (a1.used + ISNULL(a4.used,0)) - a1.data 
                        ELSE 0 
                    END) /128
INTO #TableSizes
FROM (SELECT ps.object_id
            , [rows] = SUM(CASE
                                WHEN (ps.index_id < 2) THEN row_count
                                ELSE 0
                            END)
            , reserved = SUM(ps.reserved_page_count)
            , data = SUM(CASE
                            WHEN (ps.index_id < 2) 
                                THEN (ps.in_row_data_page_count + ps.lob_used_page_count + ps.row_overflow_used_page_count)
                            ELSE (ps.lob_used_page_count + ps.row_overflow_used_page_count)
                        END)
            , used = SUM (ps.used_page_count) 
        FROM sys.dm_db_partition_stats ps
        GROUP BY ps.object_id) AS a1
    INNER JOIN sys.all_objects a2  ON a1.object_id = a2.object_id
    INNER JOIN sys.schemas a3 ON a2.schema_id = a3.schema_id
    LEFT JOIN (SELECT it.parent_id
            , reserved = SUM(ps.reserved_page_count)
            , used = SUM(ps.used_page_count)
        FROM sys.dm_db_partition_stats ps
            INNER JOIN sys.internal_tables it ON it.object_id = ps.object_id
        WHERE it.internal_type IN (202,204)
        GROUP BY it.parent_id) AS a4 ON a4.parent_id = a1.object_id
WHERE a2.type <> ''S'' and a2.type <> ''IT''

    SELECT @DatabaseSize = Sum(TableSizeMB) FROM #TableSizes

    INSERT INTO Perf..[Tables]
    SELECT ''' + Cast(@DateAdded as VarChar(50)) + '''
        , DatabaseName
        , SchemaName
        , TableName
        , RowCounts
        , TableSizeMB
        , DataSizeMB
        , IndexSizeMB
        , (TableSizeMB/@DatabaseSize) * 100 
    FROM #TableSizes
    WHERE TableSizeMB > .005 * @DatabaseSize 
        AND TableSizeMB > 100

    DROP TABLE #TableSizes'

exec sp_MSforeachdb @SQL

So this leaves us with the size of the drives. This isn’t an easy subject. You can get everything you need through SQL Server if you don’t use mountpoints, but you’re left with either Powershell or VBScript to capture it with mountpoints so you can query WMI. Another post, another day. I’d actually prefer to rewrite my VBScript as PowerShell to both learn PowerShell and put a better solution out here for you.

This was my last post I needed to prep for my Monitoring and Baselines Presentation for SQL Saturday where I talk about what you want to watch and why. Since the presentation didn’t get into how to watch everything, I made sure that everything was on my blog ahead of time.

Query Stats


DBAs rarely use the full potential of sys.dm_exec_query_stats.  It’s common to see the queries for looking at the most expensive queries according to any of the stats contained within the current cache, which is great to see.  However, if you grab snapshots of this information the proper way then you can get more detail that will really make a difference in your job.  Capturing this every 4 hours is probably enough, and here’s what it’s going to do for you.

First, using it the way most DBAs do and look at the most expensive queries according to a specific metric.  It works the same for them all, so we’ll just say we’re interested in physical reads in this case because our PLE isn’t what we’re hoping it could be.  You’ve probably seen the queries that directly query sys.dm_exec_query_stats to give you this similar to the one on MSDN for this DMV.  That will give you everything that’s in the cache right now, which could contain queries that recompile every hour in combination with other queries that have remained in the cache since the server was rebooted two weeks ago.  This means that when you’re querying to see what you can do to improve your PLE during business hours you’re getting anything that happens to be there now, not what ran in the time window you’re looking to improve.

Now look at it from the point of view that you capture this information every 4 hours.  You can get the total number of physical reads grouped by sql_handle where the interval_end time is in the last month, on weekdays, between 9:00 AM and 5:00 PM.  This is both a more complete and more accurate picture of what you care about, and you’re more likely to be focusing on the queries you should be tuning from the start.

Once you do start tuning, you want to know what effect you really had, and not just rely on the couple sets of test parameters you used in your NonProd environments.  A couple days after your changes made it to production, it’s a good idea to make sure you had the desired effect.  Query the data you have saved off for the last three days and for the same days last week.

What I went over so far is just realizing how this takes the traditional use of this to a new level, but there’s more functionality here now that it’s being saved off.  Now this is extremely useful in incident response.  Before you could see if anything was hogging resources as seen in my Recently Recompiled Resource Hogs post, but now you take it a step further by being able to verify if what’s running long now has always run that way or if something is different.  Also, with having the plan_handles saved off you can even see if it’s getting a different execution plan, which will help you determine if this was a change in data or if it’s a bad execution plan.

Not to shoot myself in the foot, but this isn’t perfect.  If you capture this data every 4 hours starting at 12:00 and it recompiles at 3:59 then you’ll only capture one minute for that interval.  Also, if you’re dealing with a query that can’t be cached then it won’t be found here at all; this is why you’re also running traces to find the big queries and watching sys.dm_os_performance_counters to know when there are a lot of compiles and recompiles.

To start capturing this, here are the tables you’ll need to capture everything.

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

SET ANSI_PADDING ON
GO

/*
IF OBJECT_ID('QueryStats') IS NOT NULL BEGIN
    DROP TABLE QueryStats
END

IF OBJECT_ID('QueryStatsLast') IS NOT NULL BEGIN
    DROP TABLE QueryStatsLast
END

IF OBJECT_ID('QueryText') IS NOT NULL BEGIN
    DROP TABLE QueryText
END
--*/

CREATE TABLE [dbo].[QueryStats](
    [interval_start] [smalldatetime] NOT NULL,
    [interval_end] [smalldatetime] NOT NULL,
    [sql_handle] [varbinary](64) NOT NULL,
    [plan_handle] [varbinary](64) NOT NULL,
    [statement_start_offset] [int] NOT NULL,
    [statement_end_offset] [int] NOT NULL,
    [execution_count] [int] NOT NULL,
    [total_elapsed_time_ms] [int] NOT NULL,
    [min_elapsed_time_ms] [int] NOT NULL,
    [max_elapsed_time_ms] [int] NOT NULL,
    [total_worker_time_ms] [int] NOT NULL,
    [min_worker_time_ms] [int] NOT NULL,
    [max_worker_time_ms] [int] NOT NULL,
    [total_logical_reads] [int] NOT NULL,
    [min_logical_reads] [int] NOT NULL,
    [max_logical_reads] [int] NOT NULL,
    [total_physical_reads] [int] NOT NULL,
    [min_physical_reads] [int] NOT NULL,
    [max_physical_reads] [int] NOT NULL,
    [total_logical_writes] [int] NOT NULL,
    [min_logical_writes] [int] NOT NULL,
    [max_logical_writes] [int] NOT NULL,
    [creation_time] [smalldatetime] NOT NULL,
    [last_execution_time] [smalldatetime] NOT NULL,
) ON [PRIMARY]
GO

--Wide is bad, if you have a lot of nonclustered indexes.  
--This is a high-writes, low-reads table, so a single clustered index is probably all I want.
--With Interval_End being the first key field, they're inserted in order and eliminate mass fragmentation
  -- as well as bad write performance.
CREATE UNIQUE CLUSTERED INDEX IX_QueryStats_intervalend_sqlhandle_statementstartoffset_planhandle_U_C ON QueryStats
(
    interval_end
    , sql_handle
    , statement_start_offset
    , plan_handle
)

CREATE TABLE [dbo].[QueryStatsLast](
    [sql_handle] [varbinary](64) NOT NULL,
    [plan_handle] [varbinary](64) NOT NULL,
    [statement_start_offset] [int] NOT NULL,
    [statement_end_offset] [int] NOT NULL,
    [objtype] [nvarchar](20) NOT NULL,
    [execution_count] [bigint] NOT NULL,
    [total_elapsed_time_ms] [bigint] NOT NULL,
    [min_elapsed_time_ms] [bigint] NOT NULL,
    [max_elapsed_time_ms] [bigint] NOT NULL,
    [total_worker_time_ms] [bigint] NOT NULL,
    [min_worker_time_ms] [bigint] NOT NULL,
    [max_worker_time_ms] [bigint] NOT NULL,
    [total_logical_reads] [bigint] NOT NULL,
    [min_logical_reads] [bigint] NOT NULL,
    [max_logical_reads] [bigint] NOT NULL,
    [total_physical_reads] [bigint] NOT NULL,
    [min_physical_reads] [bigint] NOT NULL,
    [max_physical_reads] [bigint] NOT NULL,
    [total_logical_writes] [bigint] NOT NULL,
    [min_logical_writes] [bigint] NOT NULL,
    [max_logical_writes] [bigint] NOT NULL,
    [creation_time] [datetime] NOT NULL,
    [last_execution_time] [datetime] NOT NULL,
    [DateAdded] [datetime] NOT NULL
) ON [PRIMARY]

CREATE UNIQUE CLUSTERED INDEX [IX_QueryStatsLast_sqlhandle_planhandle_statementstartoffset_U_C] ON [dbo].[QueryStatsLast] 
(
    [sql_handle] ASC,
    [plan_handle] ASC,
    [statement_start_offset] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
GO

CREATE TABLE [dbo].[QueryText](
    [sql_handle] [varbinary](64) NOT NULL,
    [QueryText] [nvarchar](max) NOT NULL,
    [DatabaseName] [nvarchar](128) NULL,
    [objtype] [nvarchar](20) NULL
) ON [PRIMARY]

CREATE UNIQUE CLUSTERED INDEX [IX_QueryText_sqlhandle_U_C] ON [dbo].[QueryText] 
(
    [sql_handle] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
GO

Now that the tables are created, here’s how I populate them.  Set this up to run once every 4 hours to keep the data small during typical running, and kick it up to once an hour or more if needed to pinpoint an issue if you need it.

DECLARE @interval_start smalldatetime
    , @interval_end smalldatetime

IF OBJECT_ID('tempdb..#QS') IS NOT NULL BEGIN
    DROP TABLE #QS 
END 

CREATE TABLE [dbo].[#QS](
    [sql_handle] [varbinary](64) NOT NULL,
    [plan_handle] [varbinary](64) NOT NULL,
    [statement_start_offset] [int] NOT NULL,
    [statement_end_offset] [int] NOT NULL,
    [objtype] [nvarchar](20) NOT NULL,
    [execution_count] [bigint] NOT NULL,
    [total_elapsed_time_ms] [bigint] NOT NULL,
    [min_elapsed_time_ms] [bigint] NOT NULL,
    [max_elapsed_time_ms] [bigint] NOT NULL,
    [total_worker_time_ms] [bigint] NOT NULL,
    [min_worker_time_ms] [bigint] NOT NULL,
    [max_worker_time_ms] [bigint] NOT NULL,
    [total_logical_reads] [bigint] NOT NULL,
    [min_logical_reads] [bigint] NOT NULL,
    [max_logical_reads] [bigint] NOT NULL,
    [total_physical_reads] [bigint] NOT NULL,
    [min_physical_reads] [bigint] NOT NULL,
    [max_physical_reads] [bigint] NOT NULL,
    [total_logical_writes] [bigint] NOT NULL,
    [min_logical_writes] [bigint] NOT NULL,
    [max_logical_writes] [bigint] NOT NULL,
    [creation_time] [datetime] NOT NULL,
    [last_execution_time] [datetime] NOT NULL,
    [DateAdded] [datetime] NOT NULL
) 
/*
--This sounded like a great idea, but it just slowed it down several seconds.
CREATE UNIQUE CLUSTERED INDEX TempQS_Cluster ON #QS  
(
    sql_handle
    , plan_handle
    , statement_start_offset
)
*/

--The Cached Plans Object Type is in here in case you want to treat ad-hoc or prepared statements differently
INSERT INTO #QS 
SELECT qs.sql_handle
    , qs.plan_handle
    , qs.statement_start_offset
    , qs.statement_end_offset
    , cp.objtype
    , qs.execution_count 
    , total_elapsed_time_ms = qs.total_elapsed_time/1000
    , min_elapsed_time_ms = qs.min_elapsed_time/1000
    , max_elapsed_time_ms = qs.max_elapsed_time/1000 
    , total_worker_time_ms = qs.total_worker_time/1000 
    , min_worker_time_ms = qs.min_worker_time/1000
    , max_worker_time_ms = qs.max_worker_time/1000
    , qs.total_logical_reads
    , qs.min_logical_reads
    , qs.max_logical_reads
    , qs.total_physical_reads
    , qs.min_physical_reads 
    , qs.max_physical_reads
    , qs.total_logical_writes 
    , qs.min_logical_writes 
    , qs.max_logical_writes
    , qs.creation_time
    , qs.last_execution_time
    , DateAdded = getDate()
FROM sys.dm_exec_query_stats AS qs
    INNER JOIN sys.dm_exec_cached_plans cp on qs.plan_handle = cp.plan_handle
--WHERE cp.objtype NOT IN ('Adhoc')

INSERT INTO QueryText (sql_handle, QueryText, DatabaseName, objtype)
SELECT QS.sql_handle
    , QueryText = qt.text
    , DatabaseName = DB_NAME(max(qt.dbid))
    , max(QS.objtype)
FROM (SELECT #QS.sql_handle
        , #QS.objtype  
        FROM #QS
            LEFT JOIN QueryText QST ON #QS.sql_handle = QST.sql_handle
        WHERE QST.sql_handle IS NULL) QS
    CROSS APPLY sys.dm_exec_sql_text(QS.sql_handle) qt
GROUP BY QS.sql_handle
    , qt.text

--All the values are the same in each table
SELECT TOP 1 @interval_start = dateadded FROM QueryStatsLast
SELECT TOP 1 @interval_end = dateadded FROM #QS

IF @interval_start IS NULL BEGIN
    SELECT @interval_start = create_date
    FROM sys.databases
    WHERE name = 'tempdb'
END

INSERT INTO QueryStats (interval_start
    , interval_end
    , sql_handle
    , plan_handle
    , statement_start_offset
    , statement_end_offset
    , execution_count 
    , total_elapsed_time_ms
    , min_elapsed_time_ms 
    , max_elapsed_time_ms
    , total_worker_time_ms  
    , min_worker_time_ms 
    , max_worker_time_ms 
    , total_logical_reads
    , min_logical_reads
    , max_logical_reads
    , total_physical_reads
    , min_physical_reads 
    , max_physical_reads
    , total_logical_writes 
    , min_logical_writes 
    , max_logical_writes
    , creation_time
    , last_execution_time)
SELECT @interval_start
    , @interval_end
    , qs.sql_handle
    , qs.plan_handle
    , qs.statement_start_offset
    , qs.statement_end_offset
    , qs.execution_count - ISNULL(qsl.execution_count, 0)
    , qs.total_elapsed_time_ms - ISNULL(qsl.total_elapsed_time_ms, 0)
    , qs.min_elapsed_time_ms 
    , qs.max_elapsed_time_ms 
    , qs.total_worker_time_ms - ISNULL(qsl.total_worker_time_ms, 0)
    , qs.min_worker_time_ms
    , qs.max_worker_time_ms 
    , qs.total_logical_reads - ISNULL(qsl.total_logical_reads, 0)
    , qs.min_logical_reads 
    , qs.max_logical_reads 
    , qs.total_physical_reads - ISNULL(qsl.total_physical_reads, 0)
    , qs.min_physical_reads 
    , qs.max_physical_reads 
    , qs.total_logical_writes - ISNULL(qsl.total_logical_writes, 0)
    , qs.min_logical_writes 
    , qs.max_logical_writes 
    , qs.creation_time
    , qs.last_execution_time
FROM #QS qs
    LEFT OUTER JOIN QueryStatsLast qsl ON qs.sql_handle = qsl.sql_handle
        AND qs.plan_handle = qsl.plan_handle
        AND qs.statement_start_offset = qsl.statement_start_offset
        AND qs.creation_time = qsl.creation_time 
WHERE qs.execution_count - ISNULL(qsl.execution_count, 0) > 0 --Only keep records for when it was executed in that interval

TRUNCATE TABLE QueryStatsLast

INSERT INTO QueryStatsLast
SELECT sql_handle
    , plan_handle
    , statement_start_offset
    , statement_end_offset
    , objtype
    , execution_count 
    , total_elapsed_time_ms
    , min_elapsed_time_ms 
    , max_elapsed_time_ms
    , total_worker_time_ms  
    , min_worker_time_ms 
    , max_worker_time_ms 
    , total_logical_reads
    , min_logical_reads
    , max_logical_reads
    , total_physical_reads
    , min_physical_reads 
    , max_physical_reads
    , total_logical_writes 
    , min_logical_writes 
    , max_logical_writes
    , creation_time
    , last_execution_time
    , DateAdded
FROM #QS

Sys.dm_os_performance_counters Demystified


The dmv sys.dm_os_performance_counters is awesome, if you can understand it. This is how I make it easy to read and use. Here are the values I watch and why I watch them. My list isn’t going to be perfect and you’re not going to agree with it 100%, and I’m ok with that. First, there is no perfect. Second, if we agree 100% then one of us is just mindlessly following the other which isn’t good.

  • Cache Hit Ratio – I ignore this value, but I still monitor it. I will, ideally, never be the only DBA on a team again, and everyone seems to think this value is cool.
  • Page Life Exp – My favorite! When you read a page from disk into memory how many seconds will it stay there? Just don’t use the outdated “300” rule or your disks will catch on fire!!!

    Burning Drive!!!

    PLE = 300

  • Page Lookups/Sec – How many pages are read from memory.
  • Page Reads/Sec – How many pages are read from disk.
  • Page Writes/Sec – How many pages are written to disk.
  • Lazy Writes/sec – How many pages are written to disk outside of a checkpoint due to memory pressure.
  • Batch Requests/sec – How busy is the server?
  • Trans/sec – How busy is the server?
  • Total Server Memory – How much memory SQL Server is currently using. Typically ramps up to Target value and PLE is low as it ramps up since new pages are in memory dropping the average.
  • Target Server Memory – How much memory SQL Server is allowed to use. Should be the same as the max memory setting, but memory pressure can cause this to decrease.
  • Memory Grants Pending – How many processes aren’t able to get enough memory to run. Should be 0, always 0, if not then find out why.
  • Deadlocks – How many deadlocks are we getting. Most apps handle deadlocks gracefully, but they still lose time doing it. If this number starts going up, start looking into it.
  • SQL Compilations/sec – This is a hidden performance killer! Some queries can’t be cached so they’re compiled every time they’re run. I’ve seen this with a query being run once a second and a big server was running slower than my laptop. It’s normal for things to compile throughout the day, it’s not normal for this number to be 10x higher than before that last upgrade.
  • SQL Re-Compliations/sec – Same goes here. The counters aren’t that much different.

If you know a little about this DMV then you know these values are cryptic. There’s several ways this data is stored and it has to be retrieved differently for each type to be useful. Then many of these are cumulative since the server was restarted, which isn’t going to help too much. Even worse, MSDN failed us on this one and figuring out this DMV required help outside of that site. Now for the good news, the script below will take care of all of that for you and leave you with some easy reading with values you can filter to the time periods you care about. If you want to add some of your own counters then just follow my lead on one that has the same cntr_type, or you can go to Rabin’s blog post that I learned from.

IF object_id('tempdb..#OSPC') IS NOT NULL BEGIN
    DROP TABLE #OSPC
END

DECLARE @FirstCollectionTime DateTime
    , @SecondCollectionTime DateTime
    , @NumberOfSeconds Int
    , @BatchRequests Float
    , @LazyWrites Float
    , @Deadlocks BigInt
    , @PageLookups Float
    , @PageReads Float
    , @PageWrites Float
    , @SQLCompilations Float
    , @SQLRecompilations Float
    , @Transactions Float

DECLARE @CounterPrefix NVARCHAR(30)
SET @CounterPrefix = CASE WHEN @@SERVICENAME = 'MSSQLSERVER'
                            THEN 'SQLServer:'
                        ELSE 'MSSQL$' + @@SERVICENAME + ':'
                        END

--Grab the current values from dm_os_performance_counters
--Doesn't do anything by instance or database because this is good enough and works unaltered in all envirornments
SELECT counter_name, cntr_value--, cntr_type --I considered dynamically doing each counter type, but decided manual was better in this case
INTO #OSPC 
FROM sys.dm_os_performance_counters 
WHERE object_name like @CounterPrefix + '%'
    AND instance_name IN ('', '_Total')
    AND counter_name IN ( N'Batch Requests/sec'
                        , N'Buffer cache hit ratio'
                        , N'Buffer cache hit ratio base'
                        , N'Free Pages'
                        , N'Lazy Writes/sec'
                        , N'Memory Grants Pending'
                        , N'Number of Deadlocks/sec'
                        , N'Page life expectancy'
                        , N'Page Lookups/Sec'
                        , N'Page Reads/Sec'
                        , N'Page Writes/Sec'
                        , N'SQL Compilations/sec'
                        , N'SQL Re-Compilations/sec'
                        , N'Target Server Memory (KB)'
                        , N'Total Server Memory (KB)'
                        , N'Transactions/sec')

--Just collected the second batch in the query above
SELECT @SecondCollectionTime = GetDate()

--Grab the most recent values, if they are appropriate (no reboot since grabbing them last)
SELECT @FirstCollectionTime = DateAdded
    , @BatchRequests = BatchRequests
    , @LazyWrites = LazyWrites
    , @Deadlocks = Deadlocks
    , @PageLookups = PageLookups
    , @PageReads = PageReads
    , @PageWrites = PageWrites
    , @SQLCompilations = SQLCompilations
    , @SQLRecompilations = SQLRecompilations
    , @Transactions = Transactions
FROM OSPerfCountersLast 
WHERE DateAdded > (SELECT create_date FROM sys.databases WHERE name = 'TempDB')

--If there was a reboot then all these values would have been 0 at the time the server came online (AKA: TempDB's create date)
SELECT @FirstCollectionTime = ISNULL(@FirstCollectionTime, (SELECT create_date FROM sys.databases WHERE name = 'TempDB'))
    , @BatchRequests = ISNULL(@BatchRequests, 0)
    , @LazyWrites = ISNULL(@LazyWrites, 0)
    , @Deadlocks = ISNULL(@Deadlocks, 0)
    , @PageLookups = ISNULL(@PageLookups, 0)
    , @PageReads = ISNULL(@PageReads, 0)
    , @PageWrites = ISNULL(@PageWrites, 0)
    , @SQLCompilations = ISNULL(@SQLCompilations, 0)
    , @SQLRecompilations = ISNULL(@SQLRecompilations, 0)
    , @Transactions = ISNULL(@Transactions, 0)

SELECT @NumberOfSeconds = DATEDIFF(ss, @FirstCollectionTime, @SecondCollectionTime)

--I put these in alphabetical order by counter_name, not column name.  It looks a bit odd, but makes sense to me
--Deadlocks are odd here.  I keep track of the number of deadlocks in the time period, not average number of deadlocks per second.
  --AKA, I keep track of things the way I would refer to them when I talk to someone.  "We had 2 deadlocks in the last 5 minutes", not "We averaged .00002 deadlocks per second there"
INSERT INTO OSPerfCounters (DateAdded, Batch_Requests_Sec, Cache_Hit_Ratio, Free_Pages, Lazy_Writes_Sec, Memory_Grants_Pending
    , Deadlocks, Page_Life_Exp, Page_Lookups_Sec, Page_Reads_Sec, Page_Writes_Sec, SQL_Compilations_Sec, SQL_Recompilations_Sec
    , ServerMemoryTarget_KB, ServerMemoryTotal_KB, Transactions_Sec)
SELECT @SecondCollectionTime
    , Batch_Request_Sec        = ((SELECT cntr_value FROM #OSPC WHERE counter_name = N'Batch Requests/sec') - @BatchRequests) / @NumberOfSeconds
    , Cache_Hit_Ratio        = (SELECT cntr_value FROM #OSPC WHERE counter_name = N'Buffer cache hit ratio')/(SELECT cntr_value FROM #OSPC WHERE counter_name = N'Buffer cache hit ratio base')
    , Free_Pages            = (SELECT cntr_value FROM #OSPC WHERE counter_name =N'Free pages')
    , Lazy_Writes_Sec        = ((SELECT cntr_value FROM #OSPC WHERE counter_name = N'Lazy Writes/sec') - @LazyWrites) / @NumberOfSeconds
    , Memory_Grants_Pending    = (SELECT cntr_value FROM #OSPC WHERE counter_name = N'Memory Grants Pending')
    , Deadlocks                = ((SELECT cntr_value FROM #OSPC WHERE counter_name = N'Number of Deadlocks/sec') - @Deadlocks) 
    , Page_Life_Exp         = (SELECT cntr_value FROM #OSPC WHERE counter_name = N'Page life expectancy')
    , Page_Lookups_Sec      = ((SELECT cntr_value FROM #OSPC WHERE counter_name = N'Page lookups/sec') - @PageLookups) / @NumberOfSeconds
    , Page_Reads_Sec        = ((SELECT cntr_value FROM #OSPC WHERE counter_name = N'Page reads/sec') - @PageReads) / @NumberOfSeconds
    , Page_Writes_Sec        = ((SELECT cntr_value FROM #OSPC WHERE counter_name = N'Page writes/sec') - @PageWrites) / @NumberOfSeconds
    , SQL_Compilations_Sec  = ((SELECT cntr_value FROM #OSPC WHERE counter_name = N'SQL Compilations/sec') - @SQLCompilations) / @NumberOfSeconds
    , SQL_Recompilations_Sec= ((SELECT cntr_value FROM #OSPC WHERE counter_name = N'SQL Re-Compilations/sec') - @SQLRecompilations) / @NumberOfSeconds
    , ServerMemoryTarget_KB = (SELECT cntr_value FROM #OSPC WHERE counter_name = N'Target Server Memory (KB)')
    , ServerMemoryTotal_KB  = (SELECT cntr_value FROM #OSPC WHERE counter_name = N'Total Server Memory (KB)')
    , Transactions_Sec        = ((SELECT cntr_value FROM #OSPC WHERE counter_name = N'Transactions/sec') - @Transactions) / @NumberOfSeconds

TRUNCATE TABLE OSPerfCountersLast

--Note, only saving the last value for ones that are done per second.
INSERT INTO OSPerfCountersLast(DateAdded, BatchRequests, LazyWrites, Deadlocks, PageLookups, PageReads
    , PageWrites, SQLCompilations, SQLRecompilations, Transactions)
SELECT DateAdded            = @SecondCollectionTime
    , BatchRequests            = (SELECT cntr_value FROM #OSPC WHERE counter_name = N'Batch Requests/sec')
    , LazyWrites            = (SELECT cntr_value FROM #OSPC WHERE counter_name = N'Lazy Writes/sec')
    , Deadlocks             = (SELECT cntr_value FROM #OSPC WHERE counter_name = N'Number of Deadlocks/sec')
    , PageLookups            = (SELECT cntr_value FROM #OSPC WHERE counter_name = N'Page lookups/sec')
    , PageReads                = (SELECT cntr_value FROM #OSPC WHERE counter_name = N'Page reads/sec')
    , PageWrites            = (SELECT cntr_value FROM #OSPC WHERE counter_name = N'Page writes/sec')
    , SQLCompilations        = (SELECT cntr_value FROM #OSPC WHERE counter_name = N'SQL Compilations/sec')
    , SQLRecompilations        = (SELECT cntr_value FROM #OSPC WHERE counter_name = N'SQL Re-Compilations/sec')
    , Transactions             = (SELECT cntr_value FROM #OSPC WHERE counter_name = N'Transactions/sec')

DROP TABLE #OSPC

Throw that code above here in a proc, schedule it to run every so often (I like 5 minutes) and it’ll….fail. It kinda relies on a couple tables you should create first. Here ya go.

CREATE TABLE OSPerfCounters(
    DateAdded datetime NOT NULL
    , Batch_Requests_Sec int NOT NULL
    , Cache_Hit_Ratio float NOT NULL
    , Free_Pages int NOT NULL
    , Lazy_Writes_Sec int NOT NULL
    , Memory_Grants_Pending int NOT NULL
    , Deadlocks int NOT NULL
    , Page_Life_Exp int NOT NULL
    , Page_Lookups_Sec int NOT NULL
    , Page_Reads_Sec int NOT NULL
    , Page_Writes_Sec int NOT NULL
    , SQL_Compilations_Sec int NOT NULL
    , SQL_Recompilations_Sec int NOT NULL
    , ServerMemoryTarget_KB int NOT NULL
    , ServerMemoryTotal_KB int NOT NULL
    , Transactions_Sec int NOT NULL
)

--You'll typically only query this by one value, which is added sequentually.  No page splits!!!
CREATE UNIQUE CLUSTERED INDEX IX_OSPerfCounters_DateAdded_U_C ON OSPerfCounters
(
    DateAdded
) WITH (FillFactor = 100)

--Only holds one value at a time, indexes are a waste
CREATE TABLE OSPerfCountersLast(
    DateAdded datetime NOT NULL
    , BatchRequests bigint NOT NULL
    , LazyWrites bigint NOT NULL
    , Deadlocks bigint NOT NULL
    , PageLookups bigint NOT NULL
    , PageReads bigint NOT NULL
    , PageWrites bigint NOT NULL
    , SQLCompilations bigint NOT NULL
    , SQLRecompilations bigint NOT NULL
    , Transactions bigint NOT NULL
)

The important part of all this is how you use it. It’s tempting to just look at the last 7 records and say that you know what’s going on, that makes me want to slap you. Every server is different, every server has different loads and baselines, and you’re either underworked or you don’t know what those baselines are for every server you manage. I do simple baselines every time I look at an incident and look at the last hour, the same time yesterday, and the same time a week ago. That gives you a chance to see what’s normal for this server and what’s different right now. This query is so simple you’ll wonder why I even posted it, but it’s effective which is why it’s here. The 7 records per day thing, that’s because 21 records show up on my screen without me scrolling, it is NOT a magic number!

SELECT 'Today', * FROM (
SELECT TOP 7 *
FROM OSPerfCounters
ORDER BY dateadded DESC
) X

UNION
SELECT 'Yesterday', * FROM (SELECT TOP 7 *
FROM OSPerfCounters
WHERE dateadded <= GETDATE()-1
ORDER BY dateadded DESC
) Y

UNION
SELECT 'Last Week', * FROM (
SELECT TOP 7 *
FROM OSPerfCounters
WHERE dateadded <= GETDATE()-7
ORDER BY dateadded DESC) Z

ORDER BY dateadded DESC

And, well, something I’ve been skipping on my posts and telling people to handle cleanup on their own…. Here’s step 2 of my jobs that populate my monitoring tables to keep your data from being the ever-growing data you’re struggling with in every other app. I delete in batches according to the clustered index. It’s overkill for something deleting one row at a time, or, even if you put this in a separate daily job, 288 rows if the process is scheduled every 5 minutes. So, why the batches? Because I copy/paste my own code everywhere, batches is reusable, and this is how I chop off the tail end of EVERYTHING!

SELECT 'Start' --Give me a rowcount of 1

WHILE @@ROWCOUNT > 0 BEGIN
    DELETE TOP (100000)
    FROM OSPerfCounters
    where dateadded < (GetDate() - 400)
END 

In the beginning I mentioned that if you agreed with me 100% then one of us is a mindless monkey. Look, I put this out there first, so I’m obviously not the mindless monkey here, am I? There’s a box below that gives you a chance to show that you’re not a mindless monkey either! Tell me I’m wrong, how I can do better, and how everyone else reading this can benefit from it even more! I’ll promote you from mindless monkey to talking monkey!

Deadlock – A quick, easy view


Deadlocks are rough to work with. Here are the scripts I use to capture deadlocks, find which ones are reoccurring, and view them along with a couple free eBooks to resolve them.

You could have thousands of deadlocks and it would take you forever to find out which ones are reoccurring or which ones are some freak accident with an annual process. I used to read through deadlock graphs one-by-one to see what was reoccurring, and I used to be less satisfied with my job as well. If you’re going to bang your head on your desk making sure a deadlock never happens again, you’ll want to make sure it wasn’t a one-time event that wouldn’t have happened again anyways.

I’m not going to get into how to figure out deadlocks, that’s a chapter in a book more than it’s a blog post. Chapter 7 to be exact in the FREE eBook or $25 physical book for Accidental DBAs by Jonathan Kehayias and Ted Krueger. Also, a whole book on blocking by Kalen Delaney in a FREE eBook or $22 physical book. I have to admit that Ted Krueger isn’t a name I recognize. However, Jonathan and Kalen are easily on my list of top 5 best MVPs out there.

Anyways, back to me. Here’s what I use to look at deadlock traces. The base code was found on the internet years ago, and you can find it in several places now. Because of that I have no idea who the original author is, but I’d be glad to throw up credit here for anyone who can show me a site dated older than when I started using it.

There are two versions here, almost identical code in each. The first one summarizes the deadlocks and gives you a count of the occurrences. The second one shows each SPID involved in the deadlock separately along with the XML for the deadlock.

DECLARE @Path VarChar(500);
SET @Path = (SELECT TOP 1 [path] FROM sys.traces WHERE [path] LIKE '%deadlock%')
--SET @Path = 'D:\Perflogs\PerfTraces\DeadlockTrace.trc'

;with CTE as
(
select 
 [TraceID] = 3, 
 [RowID] = row_number() OVER (ORDER BY StartTime), -- assign a row number to each deadlock
 [DeadlockGraph]=case when TextData like '<deadlock-list%' then convert(xml, TextData) else null end, 
 * 
 from ::fn_trace_gettable(@path, default)
where TextData like '<deadlock-list%'
    --AND StartTime BETWEEN '2013-06-17 00:00' AND '2013-06-30 00:00'
)
SELECT [Procedure], Inputbuffer, dMonth = DatePart(Month, DeadLockTime), dDay = DatePart(Day, DeadLockTime), DeadlockCount = Count(*), dMax = max(DeadlockTime), dMin = min(DeadlockTime)
FROM
(
select 
 [TimeoutID] = CTE.RowID,
 [DeadlockTime] = [StartTime],
 [DeadlockGraph],
 [PagelockObject] = CTE.[DeadlockGraph].value('/deadlock-list[1]/deadlock[1]/resource-list[1]/pagelock[1]/@objectname', 'varchar(200)'),
 [DeadlockObject] = CTE.[DeadlockGraph].value('/deadlock-list[1]/deadlock[1]/resource-list[1]/objectlock[1]/@objectname', 'varchar(200)'),
 [Victim] = case when Deadlock.Process.value('@id', 'varchar(50)') = CTE.[DeadlockGraph].value('/deadlock-list[1]/deadlock[1]/@victim', 'varchar(50)') then 1 else 0 end,
 [Procedure] = Deadlock.Process.value('executionStack[1]/frame[1]/@procname[1]', 'varchar(200)'),
 [LockMode] = Deadlock.Process.value('@lockMode', 'char(1)'),
 [Code] = Deadlock.Process.value('executionStack[1]/frame[1]', 'varchar(1000)'),
 [ClientApp] = Deadlock.Process.value('@clientapp', 'varchar(100)'),
 [HostName] = Deadlock.Process.value('@hostname', 'varchar(20)'),
 [HostPID] = Deadlock.Process.value('@hostpid', 'varchar(20)'),
 [LoginName] = Deadlock.Process.value('@loginname', 'varchar(20)'),
 [TransactionTime] = Deadlock.Process.value('@lasttranstarted', 'datetime'),
 [InputBuffer] = Deadlock.Process.value('inputbuf[1]', 'varchar(1000)')
 from CTE
 CROSS APPLY CTE.[DeadlockGraph].nodes('/deadlock-list/deadlock/process-list/process') as Deadlock(Process)
) X
--WHERE Victim = 0 
--WHERE TimeoutID IN (SELECT RowID FROM CTE WHERE [PROCEDURE] = 'master.dbo.FakeProcName')
--WHERE inputbuffer like '%select top 10%'
GROUP BY [procedure], Inputbuffer, DatePart(Month, DeadLockTime), DatePart(Day, DeadLockTime)
--ORDER BY TimeoutID 

Then there’s this to actually let me look at the XML. Although you can easily see in the script below which deadlocks reoccur back-to-back, it’s not so easy to see reoccurrence of deadlocks that happen once every morning between 2 and 3 AM. That’s where the first script shines, and you should use and abuse it.

DECLARE @Path VarChar(500);
SET @Path = (SELECT TOP 1 [path] FROM sys.traces WHERE [path] LIKE '%deadlock%');

with CTE as
(
select 
 [TraceID] = 3, 
 [RowID] = row_number() OVER (ORDER BY StartTime), -- assign a row number to each deadlock
 [DeadlockGraph]=case when TextData like '<deadlock-list%' then convert(xml, TextData) else null end, 
 * 
 from ::fn_trace_gettable(@Path, default)
where TextData like '<deadlock-list%'
    --AND StartTime BETWEEN '2013-03-08 02:00' AND '2013-03-08 04:00'
    AND StartTime > DATEADD(Hour, -2, GetDate())
)
SELECT *
FROM
(
select 
 [TimeoutID] = CTE.RowID,
 [DeadlockTime] = [StartTime],
 [DeadlockGraph],
 [PagelockObject] = CTE.[DeadlockGraph].value('/deadlock-list[1]/deadlock[1]/resource-list[1]/pagelock[1]/@objectname', 'varchar(200)'),
 [DeadlockObject] = CTE.[DeadlockGraph].value('/deadlock-list[1]/deadlock[1]/resource-list[1]/objectlock[1]/@objectname', 'varchar(200)'),
 [Victim] = case when Deadlock.Process.value('@id', 'varchar(50)') = CTE.[DeadlockGraph].value('/deadlock-list[1]/deadlock[1]/@victim', 'varchar(50)') then 1 else 0 end,
 [Procedure] = Deadlock.Process.value('executionStack[1]/frame[1]/@procname[1]', 'varchar(200)'),
 [LockMode] = Deadlock.Process.value('@lockMode', 'char(1)'),
 [Code] = Deadlock.Process.value('executionStack[1]/frame[1]', 'varchar(1000)'),
 [ClientApp] = CASE LEFT(Deadlock.Process.value('@clientapp', 'varchar(100)'), 29)
                    WHEN 'SQLAgent - TSQL JobStep (Job '
                        THEN 'SQLAgent Job: ' + (SELECT name FROM msdb..sysjobs sj WHERE substring(Deadlock.Process.value('@clientapp', 'varchar(100)'),32,32)=(substring(sys.fn_varbintohexstr(sj.job_id),3,100))) + ' - ' + SUBSTRING(Deadlock.Process.value('@clientapp', 'varchar(100)'), 67, len(Deadlock.Process.value('@clientapp', 'varchar(100)'))-67)
                    ELSE Deadlock.Process.value('@clientapp', 'varchar(100)')
                    END ,
 [HostName] = Deadlock.Process.value('@hostname', 'varchar(20)'),
 [LoginName] = Deadlock.Process.value('@loginname', 'varchar(20)'),
 [TransactionTime] = Deadlock.Process.value('@lasttranstarted', 'datetime'),
 [InputBuffer] = Deadlock.Process.value('inputbuf[1]', 'varchar(1000)')
 from CTE
 CROSS APPLY CTE.[DeadlockGraph].nodes('/deadlock-list/deadlock/process-list/process') as Deadlock(Process)
-- ORDER BY [DeadlockObject], [ClientApp], [Procedure], [Code]
) X
--WHERE Victim = 0 
--WHERE [Procedure] = 'master.dbo.FakeProcedureName'
ORDER BY TimeoutID DESC

Finally, to make all of this happen you’ll need a deadlock trace. Don’t go thinking that I open profiler, make a deadlock trace, script it out, and run it every time I need one. Reusable code is awesome, and this is reused every time I see an alert come through that we are having too many deadlocks. I have this proc in our Perf database, which is on every SQL Server we manage. So you don’t have to dig through the numbers, it grabs every event for the deadlock graph and nothing else.

IF  EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[usp_ServerSideTrace_Deadlock]') AND type in (N'P', N'PC'))
DROP PROCEDURE [dbo].[usp_ServerSideTrace_Deadlock]
GO

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE procedure [dbo].[usp_ServerSideTrace_Deadlock]
    @FilePath nvarchar(1000) = N'D:\Perflogs\PerfTraces\DeadlockTrace',                                            
    @maxfilesize bigint = 25,
    @maxfiles int = 10,
    @TraceDBID bit=0,
    @DBID int=0
AS

declare @rc int
declare @TraceID int

--Create Trace
exec @rc = sp_trace_create @TraceID output, 2, @FilePath, @maxfilesize, NULL, @maxfiles
if (@rc != 0) 
Begin
    select ErrorCode=@rc    
    return
End

-- Set the RPC Completed and SQL Batch Completed events
declare @on bit
set @on = 1
exec sp_trace_setevent @TraceID, 148, 11, @on
exec sp_trace_setevent @TraceID, 148, 12, @on
exec sp_trace_setevent @TraceID, 148, 14, @on
exec sp_trace_setevent @TraceID, 148, 1, @on
exec sp_trace_setevent @TraceID, 25, 15, @on
exec sp_trace_setevent @TraceID, 25, 8, @on
exec sp_trace_setevent @TraceID, 25, 32, @on
exec sp_trace_setevent @TraceID, 25, 56, @on
exec sp_trace_setevent @TraceID, 25, 1, @on
exec sp_trace_setevent @TraceID, 25, 9, @on
exec sp_trace_setevent @TraceID, 25, 57, @on
exec sp_trace_setevent @TraceID, 25, 2, @on
exec sp_trace_setevent @TraceID, 25, 10, @on
exec sp_trace_setevent @TraceID, 25, 3, @on
exec sp_trace_setevent @TraceID, 25, 11, @on
exec sp_trace_setevent @TraceID, 25, 12, @on
exec sp_trace_setevent @TraceID, 25, 13, @on
exec sp_trace_setevent @TraceID, 25, 6, @on
exec sp_trace_setevent @TraceID, 25, 14, @on
exec sp_trace_setevent @TraceID, 25, 22, @on
exec sp_trace_setevent @TraceID, 59, 32, @on
exec sp_trace_setevent @TraceID, 59, 56, @on
exec sp_trace_setevent @TraceID, 59, 1, @on
exec sp_trace_setevent @TraceID, 59, 57, @on
exec sp_trace_setevent @TraceID, 59, 2, @on
exec sp_trace_setevent @TraceID, 59, 14, @on
exec sp_trace_setevent @TraceID, 59, 22, @on
exec sp_trace_setevent @TraceID, 59, 3, @on
exec sp_trace_setevent @TraceID, 59, 12, @on

-- Set the Filters
--Default to exclude the trace from Tracing Itself
exec sp_trace_setfilter @TraceID, 10, 0, 7, N'SQL Server Profiler'

--Add other filters as specified by parameters
If @TraceDBID=1
    exec sp_trace_setfilter @TraceID, 3, 0, 0, @DBID

-- Set the trace status to start
exec sp_trace_setstatus @TraceID, 1

-- display trace id for future references
return @TraceID

GO

The basics of a deadlock is that process 1 gets a lock on data A while process 2 gets a lock on data B. Then process 1 says it needs a lock on data B to continue while process 2 needs a lock on data A to continue. It’s a simple basic concept, but it gets complex…too complex for me to cover it here and say I did it justice. Sometimes a process deadlocks on itself, doing a parallelism deadlock as demonstrated by Jason Strate. Solutions also vary from order of operations (lets make processes 1 and 2 both get data A first, then they can’t deadlock), tuning (if the locks are released quicker then there’s less time for deadlocks), reducing the number of transactions (the lock on data A was grabbed earlier in this transaction, and there’s no reason these need to be in a explicit transaction), scheduling (these are both processes that have to run between 10 PM and 6 AM, why do they both start at midnight?), and many other approaches. There isn’t one right answer, and anyone promising a single right answer in one short blog post is selling you short. That’s either a very long blog post or it’s an incomplete answer. I go with option 3, and here’s my post on the best way to view them along with a couple book chapters to get you started on fixing what you found.

Related Content:

SQL Server Concurrency: Locking, Blocking and Row Versioning By Kalen Delaney
FREE eBook or $22 physical book

Troubleshooting SQL Server: A Guide for the Accidental DBA By Jonathan Kehayias and Ted Krueger
FREE eBook or $25 physical book

Recently Recompiled Resource Hogs


It’s not too uncommon for a query to get a new execution plan that performs a lot worse than it could, and sometimes it’s bad enough to drag the whole server down to a halt.  When it’s something obvious such as a query going from 2 seconds duration to 30 seconds you’ll pick it up on a standard monitoring trace.  That’s not always the case, and the query that hits your server twice a second going from 100ms to 900ms can fly under the radar except for users saying the server isn’t as responsive as it was yesterday.

Before you go all out and throw up an intrusive trace that can slow down your server capturing an unfiltered sample of RPC:Completed and SQL:BatchCompleted to run aggregates on, see what the DMVs can tell you.  If you look in sys.dm_exec_query_stats you can get the total amount of resources used since the last recompilation, which is going to give you pretty much everything you need as long as you’re looking for a query that can stay in the cache.  The only trick is to look at everything in the same context.  Something that did 10,000,000 reads since it was compiled yesterday is nothing compared to something that did 1,000,000 reads since it was compiled 10 minutes ago.  Here’s my answer to this:

SELECT top 10 --WorkerTime_Sec = cast(round(total_worker_time / 1000 / 1000.0, 0) as Int)
    WorkerTimeSec_PerSec =  cast(cast(round(total_worker_time / 1000 / 1000.0, 0) as Int) / (Cast(DateDiff(second, Creation_Time, GetDate()) as DEC(20,2)) + .00000001) as DEC(20,2))
    --, LogicalReads_K = total_logical_reads / 1000
    , LogicalReads_K_PerSec = cast(total_logical_reads / 1000 / (Cast(DateDiff(second, Creation_Time, GetDate())as DEC(20,2)) + .00000001) as DEC(20,2))
    --, ElapsedTime_Sec = cast(round(total_elapsed_time / 1000 / 1000.0, 0) as Int)
    , ElapsedTimeSec_PerSec = cast(cast(round(total_elapsed_time / 1000 / 1000.0, 0) as Int)/ (Cast(DateDiff(second, Creation_Time, GetDate())as DEC(20,2)) + .00000001) as DEC(20,2))
    , Seconds_Since_Recompile = DateDiff(second, Creation_Time, GetDate())
    , Creation_Time
    , qs.execution_count
    , avg_reads = total_logical_reads / qs.execution_count
    , avg_elapsed_ms = total_elapsed_time / 1000 / qs.execution_count
    , avg_worker_ms = total_worker_time / 1000 / qs.execution_count
    , Offset = statement_start_offset
    , ObjectName = OBJECT_NAME(qt.objectid, qt.dbid)
    , qt.text
    , StatementText = SUBSTRING (qt.text, qs.statement_start_offset/2 + 1,
        abs(CASE WHEN qs.statement_end_offset = -1
               THEN LEN(CONVERT(NVARCHAR(MAX), qt.text)) * 2 
            ELSE qs.statement_end_offset END - qs.statement_start_offset)/2 + 1)
    --, qt.*
    --, qs.*
FROM sys.dm_exec_query_stats AS qs
    CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS qt
WHERE Creation_Time > DateAdd(Hour, -24, GetDate())
    AND (creation_time < DATEADD(SECOND, -60, GETDATE()) or execution_count > 5) --Has to be a minute old or executed 6 times before I consider the data stable
    --AND (total_worker_time > 20000 or execution_count > 15000)
ORDER BY 1 DESC 

I comment out what I don’t typically use, same as always with me.  Those pieces don’t usually change.  The only thing that does typically change is the date filter on the creation time; if someone says there weren’t any problems this time yesterday then it stays at the default value, if they say it worked fine before lunch then it drops down to 4 hours.

There are four things to keep in mind with this DMV.  Everything is in microseconds which I don’t use, so I “fixed” the problem in my query.  It will only hold data while the query is in cache, which means it’s never tracked here if it’s never in cache.  Each row is for a single plan for a single statement, although  you could easily aggregate it if you wanted to.  Also, it’s possible for the worker time or duration per second to be over 1 due to multiple simultaneous runs or, with worker time, parallelism can cause this.  I typically try not to restate what you should be reading on MSDN or BOL, but these aren’t things everyone expects.

For me to run this, I typically have to already have a complaint that a server is running slower than expected. Check out my Server Running Slow post to see everything else that runs along with this.

Reading Traces


Capturing information is useless unless you know how to use it. Here’s what I use to get the most out of my basic trace of anything taking over so many seconds. This is also my template for more invasive traces, and it gets tweaked on a case-by-case basis.

It’s my typical style with half of it commented out where less run portions can be highlighted and run and portions of my WHERE clause don’t have to be rewritten to be added in. The first line gets highlighted as soon as I open this script every time so I can run it, copy the path of the trace I’m looking for, and paste it into my main script. As for the WHERE clause, call me lazy or whatever it is you call people who can’t remember the right word, but I don’t always remember the exact wording of each column name that I want to filter by. I simply uncomment them as I need them.

The only thing I can say is really impressive here is finding the job names, which is a piece of this script I will never forget figuring out. Other than that, just little things to help readability when both running this script and when playing with the WHERE clause. I can’t tell you why duration is microseconds and CPU is milliseconds, but I can say the SELECT and WHERE clauses are set up to use the same units I would use when I talk to someone.

--SELECT * FROM sys.traces

/*
--sp_trace_setstatus id, status
exec sp_trace_setstatus 3,0 --stop
exec sp_trace_setstatus 3,2 --close
--0 stop
--1 start
--2 close
*/

--query running trace
SELECT TOP 10000 Database_Name = DB_Name(DatabaseID)
    , DatabaseName
    , starttime
    , endtime
    , textdata = cast(textdata as varchar(4000)) 
    , Duration_Sec = cast(duration/1000/1000.0 as Dec(10,1)) 
    , CPU_Sec = cast(cpu/1000.0 as Dec(10,1)) 
    , Reads_K = cast(reads/1000.0 as Dec(10,0)) 
    , Writes_K = Cast(writes/1000.0 as Dec(10,1)) 
    , hostname  
    , LoginName 
    , NTUserName 
    , ApplicationName = CASE LEFT(ApplicationName, 29)
                    WHEN 'SQLAgent - TSQL JobStep (Job '
                        THEN 'SQLAgent Job: ' + (SELECT name FROM msdb..sysjobs sj WHERE substring(ApplicationName,32,32)=(substring(sys.fn_varbintohexstr(sj.job_id),3,100))) + ' - ' + SUBSTRING(ApplicationName, 67, len(ApplicationName)-67)
                    ELSE ApplicationName
                    END 
    , SPID
    , TE.*
FROM fn_trace_gettable('D:\Traces\DurationOver10sec_110.trc',default) T
    LEFT JOIN sys.trace_events TE ON  T.EventClass = TE.trace_event_id
WHERE endtime > DATEADD(Hour, -3, GetDate()) 
    --AND Duration > 1000000 * 15 --in seconds
    --AND CPU > 1000 * 10 --in seconds
    --AND Writes > 10000
    --AND Reads > 1000000
    --AND te.category_id = 2 --Auto Grow/Shrink found in default trace
    --AND t.databaseid = DB_ID('master')
    --AND t.DatabaseName = 'tempdb'
    --AND t.LoginName <> 'shood'
    --AND t.textdata like '%%'
ORDER BY endTime DESC, starttime DESC

So, for those of you not too familiar with tracing, what does all of this mean and what am I looking for? It depends on what’s wrong. The day after an app’s upgrade I’m looking for procs that snuck through testing and are killing the server. When a user complains about performance you might see a new proc showing up because it’s getting a bad execution plan. If tempdb or the logs fill up I can tell you what had 100,000 writes and just happened to end at that exact moment. Sometimes I’ll glance to see if anything is hitting or approaching the app’s 30 second timeout.

A general trace such as the RPC:Completed and SQL:BatchCompleted that take over X seconds won’t answer all of your questions, but it will be a good start. The most important things are to always have something running (could be extended events instead of a trace) to tell you the big things that are running and to have your monitoring limited enough where you aren’t the one slowing down the server. Make sure it’s customized to each server since a 1 second duration in a web app is a big deal, but other servers you would get flooded with results unless you bump it up to 10 seconds. You can run more invasive traces when you need them.

There are two hits against extended events where I’m even using traces for this. First, I still have too many SQL 2005 servers around and like to do things as consistent as reasonably possible across all servers to make it simple. Second, I’m still not too happy about extended events skipping SQL:BatchCompleted in SQL 2008 as that misses out on finding large loops of small statements. Extended events are much more efficient, can do a lot that traces can’t do, but aren’t a 100% replacement for traces.

Comment on my posts, let me know what you think. Every post I have at Simple SQL Server is written to myself 5 years ago, and is exactly what I wish I had back then in my own mind. The key words being “in my own mind”, and I’d like to hear what’s on your mind. Be it constructive criticism on this post or a completely separate SQL Server subject you wish you knew more about either now or 5 years ago that you’d like to see thrown up for everyone to use, I’ll never know until you tell me.

Tracing Introduction


SQL Server tracing is essential for troubleshooting performance issues, yet it can put loads on your server that would cause noticeable slowness, capture so much information that it fills the drive, and I’ve even seen it cause a cluster failover more than once.  With a tool this powerful you just need to know how to use it before you use it.  It’s almost like the idea where companies want their entry-level people to have a year of experience.

There is one trace I’d like to see on every server, preferably as a startup proc or a job that starts automatically when SQL Agent starts.  It doesn’t matter how you do it, if you want me to be happy you’ll tell me this was running for at least 24 hours before you had a performance incident.  All I’m looking for is RPC:Completed and SQL:BatchCompleted with a reasonable filter of duration over 10 seconds.

If you’ve never used Profiler before, here’s a quick overview to make the server-side trace I’m using.  Leave the first screen at the defaults, with the possible exception of changing the template to blank.  Select RPC:Completed and SQL:BatchCompleted and set the Duration filter to greater than or equal to 10,000 ms which you can get to using the “Column Filters…” button.  You can collect all columns if you want to, it won’t hurt with these events although there are some columns that won’t help.

Create trace with duration filter

Run the trace, then you can export it to a script.  I’m not happy that you have to run it first, but in the end this is either something that doesn’t affect the server at all or something I run on a dev server that’s at the same patch level.

Export trace to file

There, you have the basic script to start the server-side trace.  You still need to tweak it a bit before you can use it, but the hard part is done.  Everything highlighted in yellow was changed by me.  The default max file size was changed from 5 MB to 25 MB because I view 25 to 50 MB to be the range that’s not annoyingly small or too big to move around or send to someone easily.  The second parameter of sp_trace_create is set to 2 to enable rollover files, the third parameter is set to my output file sans an extension, and the last parameter is added to say how many output files there are.

/****************************************************/
/* Created by: SQL Server 2008 R2 Profiler          */
/* Date: 06/16/2013  06:10:02 AM         */
/****************************************************/

-- Create a Queue
declare @rc int
declare @TraceID int
declare @maxfilesize bigint
set @maxfilesize = 25

-- Please replace the text InsertFileNameHere, with an appropriate
-- filename prefixed by a path, e.g., c:\MyFolder\MyTrace. The .trc extension
-- will be appended to the filename automatically. If you are writing from
-- remote server to local drive, please use UNC path and make sure server has
-- write access to your network share

exec @rc = sp_trace_create @TraceID output, 2, N'D:\Traces\DurationOver10Sec', @maxfilesize, NULL, 5
if (@rc != 0) goto error

-- Client side File and Table cannot be scripted

-- Set the events
declare @on bit
set @on = 1
exec sp_trace_setevent @TraceID, 10, 15, @on
exec sp_trace_setevent @TraceID, 10, 31, @on
exec sp_trace_setevent @TraceID, 10, 8, @on
exec sp_trace_setevent @TraceID, 10, 16, @on
exec sp_trace_setevent @TraceID, 10, 48, @on
exec sp_trace_setevent @TraceID, 10, 1, @on
exec sp_trace_setevent @TraceID, 10, 9, @on
exec sp_trace_setevent @TraceID, 10, 17, @on
exec sp_trace_setevent @TraceID, 10, 10, @on
exec sp_trace_setevent @TraceID, 10, 18, @on
exec sp_trace_setevent @TraceID, 10, 34, @on
exec sp_trace_setevent @TraceID, 10, 3, @on
exec sp_trace_setevent @TraceID, 10, 11, @on
exec sp_trace_setevent @TraceID, 10, 51, @on
exec sp_trace_setevent @TraceID, 10, 12, @on
exec sp_trace_setevent @TraceID, 10, 13, @on
exec sp_trace_setevent @TraceID, 10, 14, @on
exec sp_trace_setevent @TraceID, 12, 15, @on
exec sp_trace_setevent @TraceID, 12, 31, @on
exec sp_trace_setevent @TraceID, 12, 8, @on
exec sp_trace_setevent @TraceID, 12, 16, @on
exec sp_trace_setevent @TraceID, 12, 48, @on
exec sp_trace_setevent @TraceID, 12, 1, @on
exec sp_trace_setevent @TraceID, 12, 9, @on
exec sp_trace_setevent @TraceID, 12, 17, @on
exec sp_trace_setevent @TraceID, 12, 10, @on
exec sp_trace_setevent @TraceID, 12, 14, @on
exec sp_trace_setevent @TraceID, 12, 18, @on
exec sp_trace_setevent @TraceID, 12, 3, @on
exec sp_trace_setevent @TraceID, 12, 11, @on
exec sp_trace_setevent @TraceID, 12, 51, @on
exec sp_trace_setevent @TraceID, 12, 12, @on
exec sp_trace_setevent @TraceID, 12, 13, @on

-- Set the Filters
declare @intfilter int
declare @bigintfilter bigint

set @bigintfilter = 10000000
exec sp_trace_setfilter @TraceID, 13, 0, 4, @bigintfilter

-- Set the trace status to start
exec sp_trace_setstatus @TraceID, 1

-- display trace id for future references
select TraceID=@TraceID
goto finish

error: 
select ErrorCode=@rc

finish: 
go

This is done as a server-side trace for a couple reasons. Profiler needs to stay open to keep collecting data, SQL Server has to send the data to Profiler which is usually across the network, then the trace data is held in memory. None of these are a problem for a properly filtered short-term trace. This is properly filtered, but I’m really hoping everyone will run this long-term.

I owe you another blog post on querying traces, then we can move in to putting it all together with where to start when a user makes their favorite statement to you…”The server is running slow”.

Wait Stats – Monitoring and Using


sys.dm_os_wait_stats is one of the most important DMVs out there, and one that you should know the historical values of for every server you care about. This DMV will tell you the cumulative amount of time you waited for each wait type since SQL Server services started, which is nice to know, but limited in its usefulness as-is.

If, however, you knew what the values were 30 minutes ago and what they are now then you could know why recent queries were waiting. Take it a step further and know what you were waiting for in the same 30 minute period yesterday and you have a bit of a baseline to go off of. Now you can compare what you’re waiting for now to your baseline so you can say if what you’re seeing now is just normal for the server or if you’re seeing a true incident. Remember, the only way to know if numbers on a server are a problem are not is to know what’s normal for that specific server.

A half-hour increment is a bit much for me, and I would prefer to know it every 5 minutes to give a little more detail. Also, just looking at yesterday’s stats for comparison isn’t good enough for me, especially on when looking into an incident on a Monday where Sunday isn’t a very good baseline. Because of that, I want to look at the baseline data from the same time period yesterday and exactly a week ago.

To capture the data you need to capture a snapshot of sys.dm_os_wait_stats along with the time you grabbed it. Then grab it again, do a diff between the two values, and save that off with the starting and ending times. Keep this running every so many minutes in a SQL job and you have a running baseline to compare what you’re seeing now to what the server is typically doing.

After that, keep the data trimmed down to what you’d actually use. Personally, I’m a huge fan of 13 months. This running every 5 minutes for 13 months will accumulate about 750 MB of data, and you can adjust these numbers as you see fit.

Here are the tables; I comment out stuff if I would cry if I accidently ran it at the wrong time.

/*
IF OBJECT_ID('WaitStats') IS NOT NULL BEGIN
    DROP TABLE WaitStats
END

IF OBJECT_ID('WaitStatsLast') IS NOT NULL BEGIN
    DROP TABLE WaitStatsLast
END
*/
GO

CREATE TABLE WaitStats (
    DateStart datetime
    , DateEnd datetime
    , wait_type nvarchar(60)
    , waiting_tasks_count bigint
    , wait_time_ms bigint
    , max_wait_time_ms bigint
    , signal_wait_time_ms bigint
)
GO

CREATE CLUSTERED INDEX IX_WaitStats_DateStart_waittype_U_C ON WaitStats
(
    DateStart
    , wait_type
) WITH (Fillfactor = 95)
GO

CREATE TABLE WaitStatsLast (
    DateAdded datetime
    , wait_type nvarchar(60)
    , waiting_tasks_count bigint
    , wait_time_ms bigint
    , max_wait_time_ms bigint
    , signal_wait_time_ms bigint
)
GO

CREATE CLUSTERED INDEX IX_WaitStatsLast_waittype_U_C ON WaitStatsLast
(
    wait_type
) WITH (Fillfactor = 95)
GO

Notice the lack of creativity. It is very intentional. The code is more stable as I’m copying the field sizes directly from the DMV, and the data is more legible to an outsider (new employee, Microsoft engineer, consultant, vendor) because the fields are exactly the same as the DMV they’re used to seeing. I’m not a fan of non-creative solutions, so there’s a reason to go with it when I decide to be boring.

Now to populate the tables. This script will need to be ran on a regular schedule. Preferably, throw it in a proc and run that proc every 5 minutes, but that’s just my opinion in a discussion about how to watch your servers.

IF OBJECT_ID('tempdb..#WaitStatsNew') IS NOT NULL BEGIN
    DROP TABLE #WaitStatsNew 
END

SELECT DateAdded = GETDATE()
    , wait_type
    , waiting_tasks_count 
    , wait_time_ms
    , max_wait_time_ms
    , signal_wait_time_ms 
INTO #WaitStatsNew
FROM sys.dm_os_wait_stats

INSERT INTO WaitStats (DateStart, DateEnd, wait_type, waiting_tasks_count, wait_time_ms, max_wait_time_ms, signal_wait_time_ms)
SELECT DateStart = ISNULL(l.DateAdded, (SELECT create_date FROM sys.databases WHERE name = 'tempdb'))
    , DateEnd = n.DateAdded
    , wait_type = n.wait_type
    , waiting_tasks_count = n.waiting_tasks_count - ISNULL(l.waiting_tasks_count, 0)
    , wait_time_ms = n.wait_time_ms - ISNULL(l.wait_time_ms, 0)
    , max_wait_time_ms = n.max_wait_time_ms --It's a max, not cumulative
    , signal_wait_time_ms = n.signal_wait_time_ms - ISNULL(l.signal_wait_time_ms, 0)
FROM #WaitStatsNew n
    LEFT OUTER JOIN WaitStatsLast l ON n.wait_type = l.wait_type AND l.DateAdded > (SELECT create_date FROM sys.databases WHERE name = 'tempdb')

TRUNCATE TABLE WaitStatsLast 

INSERT INTO WaitStatsLast (DateAdded, wait_type, waiting_tasks_count, wait_time_ms, max_wait_time_ms, signal_wait_time_ms)
SELECT DateAdded
    , wait_type
    , waiting_tasks_count
    , wait_time_ms
    , max_wait_time_ms
    , signal_wait_time_ms
FROM #WaitStatsNew 

DROP TABLE #WaitStatsNew

Finally, querying it. Give it a descent time frame, I typically start at 2 hours to get an overall picture without having a single event skew the numbers too much. Then look at your most relevant baseline data – yesterday and 7 days ago during the same timeframe.

DECLARE @dStart datetime
    , @dEnd datetime

SELECT @dStart = DATEADD(Hour, -2, GETDATE())
    , @dEnd = GETDATE()

IF OBJECT_ID('tempdb..#IgnoredWaits') IS NOT NULL BEGIN
    DROP TABLE #IgnoredWaits
END

CREATE TABLE #IgnoredWaits (Wait_Type NVarChar(60) PRIMARY KEY) 
--/*
--Values taken from p25 of "Troubleshooting SQL Server - A Guide for the Accidental DBA" by Jonathan Kehayias and Ted Krueger  
INSERT INTO #IgnoredWaits 
SELECT 'BAD_PAGE_PROCESS'
UNION SELECT 'BROKER_EVENTHANDLER'
UNION SELECT 'BROKER_RECEIVE_WAITFOR'
UNION SELECT 'BROKER_TASK_STOP'
UNION SELECT 'BROKER_TO_FLUSH'
UNION SELECT 'BROKER_TRANSMITTER'
UNION SELECT 'CHECKPOINT_QUEUE'
UNION SELECT 'CLR_AUTO_EVENT'
UNION SELECT 'CLR_MANUAL_EVENT'
UNION SELECT 'DBMIRROR_EVENTS_QUEUE'
UNION SELECT 'DISPATCHER_QUEUE_SEMAPHORE'
UNION SELECT 'FT_IFTS_SCHEDULER_IDLE_WAIT'
UNION SELECT 'FT_IFTSHC_MUTEX'
UNION SELECT 'KSOURCE_WAKEUP'
UNION SELECT 'LAZYWRITER_SLEEP'
UNION SELECT 'LOGMGR_QUEUE'
UNION SELECT 'ONDEMAND_TASK_QUEUE'
UNION SELECT 'PREEMPTIVE_OS_AUTHENTICATIONOPS'
UNION SELECT 'PREEMPTIVE_OS_GETPROCADDRESS'
UNION SELECT 'REQUEST_FOR_DEADLOCK_SEARCH'
UNION SELECT 'RESOURCE_QUEUE'
UNION SELECT 'SLEEP_BPOOL_FLUSH'
UNION SELECT 'SLEEP_SYSTEMTASK'
UNION SELECT 'SLEEP_TASK' 
UNION SELECT 'SQLTRACE_BUFFER_FLUSH'
UNION SELECT 'WAITFOR'
UNION SELECT 'XE_DISPATCHER_JOIN'
UNION SELECT 'XE_DISPATCHER_WAIT'
UNION SELECT 'XE_TIMER_EVENT'

SELECT TOP 7 TimeFrame = 'Right Now'
    , DateStart = MIN(DateStart)
    , DateEnd = MAX(DateEnd)
    , wait_type
    , wait_minutes = SUM(wait_time_ms)/1000/60
FROM WaitStats
WHERE DateStart >= @dStart 
    AND DateEnd <= @dEnd 
    AND Wait_Type NOT IN (SELECT Wait_Type FROM #IgnoredWaits)
GROUP BY wait_type
ORDER BY 5 DESC

SELECT TOP 7 TimeFrame = 'Yesterday'
    , DateStart = MIN(DateStart)
    , DateEnd = MAX(DateEnd)
    , wait_type
    , wait_minutes = SUM(wait_time_ms)/1000/60
FROM WaitStats
WHERE DateStart >= @dStart - 1
    AND DateEnd <= @dEnd - 1
    AND Wait_Type NOT IN (SELECT Wait_Type FROM #IgnoredWaits)
GROUP BY wait_type
ORDER BY 5 DESC

SELECT TOP 7 TimeFrame = 'Last Week'
    , DateStart = MIN(DateStart)
    , DateEnd = MAX(DateEnd)
    , wait_type
    , wait_minutes = SUM(wait_time_ms)/1000/60
FROM WaitStats
WHERE DateStart >= @dStart - 7
    AND DateEnd <= @dEnd - 7
    AND Wait_Type NOT IN (SELECT Wait_Type FROM #IgnoredWaits)
GROUP BY wait_type
ORDER BY 5 DESC

The only thing left to do is wait and interpret the data. All the wait types mean something, with some easily pointing to an issue and others just being normal operations. Doing a dive into what wait types are out there, what they mean to you, and what they mean when they’re not what they were yesterday isn’t something I’m going to get into with this post.  However, there is an Accidental DBA book out there that explains this amazingly well.  Here are links to the FREE PDF Version and the Physical Book.

Indexes – Unused and Duplicates


Indexes aren’t free, and many databases end up with unused indexes. Every time you make any update to a table you will be updating the clustered index (I assume no heaps, because I hate heaps), and every index that has uses one of the columns that were updated. Inserts and Deletes affect every column and will affect every index, with exceptions for non-typical indexes. These updates cost a lot, to the point that I have several processes that disable certain nonclustered indexes, do all of the work, then rebuild the indexes afterwards.

The problem is that you can’t tell very easily if an index is being used or not. You can look at sys.dm_db_index_usage_stats, but that information is very limited in that gets wiped out on index rebuilds, SQL service restarts (server reboot included), and anytime you ask it to. That’s not mentioning that it also ignores obscure usage of the index. However, the obscure usage is more of something to keep in mind than it is something to base all of your decisions on.

From my point of view, there are two answers to this problem.  There isn’t a right and wrong answer, it’s more of a step 1 and step 2.

The long and hard one is to capture the values in sys.dm_db_index_usage_stats on a normal basis (daily, before index maintenance) and give yourself long term statistics that you can determine if an index’s usage justifies its cost. Some indexes are updated 1,000,000 times for two uses. You obviously don’t need to keep this index around all the time, but you’re still paranoid about what those uses were. Were they ad-hoc queries run by a developer, or were they year-end reports run by the CEO? You can get some idea if you look at your history and check what day it was run. I would want to make a rollback script, drop the index, and make a searchable document of indexes I dropped in case anyone complained about year-end reports not working.

That was the long answer, which is right and is not replaced by the short answer. The script below is the short answer. If all of the key fields in an index match, or at least the first three in a larger index, then there’s a good chance that one of the indexes isn’t required or they could be consolidated.

For instance, if you have two indexes that have the same two key fields in the same order but the second index also includes two columns, then you’ll probably see usage on both indexes (see the long-hard solution above). If one index isn’t in use in this situation, your job is easy. If both are used, then look into dropping the index that doesn’t include any columns. Why? Because a slightly larger index rarely increases the overhead too much, and the queries that used the first index aren’t usually impacted too much by using a slightly larger index that has the same foundation.

This area gets shakey on how you should handle it, and is really one of the few areas in SQL Server I consider more of an art. For instance, the script below will show you indexes that have 5 key columns where the first three match. You look and find that both indexes are being used, but users are complaining that updates are a bit slow, too. I’d lean towards condensing by taking the the last two key columns of one index and including them on the other, but which index gets dropped and how do you test it?

Do my uses of “look into”, “rarely”, “usually”, and “too much” seem uncertain to you? Welcome to indexing.

Here are the scripts I use for both the short and long answers mentioned.  The first one is for the short answer, giving you indexes that are for the most part duplicated.

DECLARE @KeyFieldMatches INT

SET @KeyFieldMatches = 3 --Number of key fields to match in order

IF object_id('tempdb..#IndexList') IS NOT NULL BEGIN
    DROP TABLE #IndexList
END

IF object_id('tempdb..#IndexListShort') IS NOT NULL BEGIN
    DROP TABLE #IndexListShort
END

CREATE TABLE #IndexList (
      object_id    bigint not null
    , table_name varchar(150) not null
    , index_id int not null
    , index_name varchar(150) not null
    , index_column int not null
    , included bit not null
    , column_name varchar(150) not null
    , index_type int not null
)

CREATE CLUSTERED INDEX IndexList_Clu ON #IndexList (object_id, index_id, index_column, included)

CREATE TABLE #IndexListShort (
      object_id bigint not null
    , table_name varchar(150) not null
    , index_id int not null
    , index_name varchar(150) not null
    , column_names_first_n varchar(450) not null
    , column_names_key varchar(4000) not null
    , column_names_included varchar(4000) not null
)

DECLARE @object_id bigint
DECLARE @index_id int
DECLARE @List varchar(4000)
DECLARE @ListIncl varchar(4000)
DECLARE @ListShort varchar(450)

INSERT INTO #IndexList
SELECT o.object_id
    , table_name = o.name
    , i.index_id
    , index_name = i.name
    , index_column = ic.index_column_id
    , included = ic.is_included_column
    , column_name = c.name
    , index_type = i.type
FROM sys.objects o
    INNER JOIN sys.indexes i ON o.object_id = i.object_id
    INNER JOIN sys.index_columns ic ON i.object_id = ic.object_id AND i.index_id = ic.index_id
    INNER JOIN sys.columns c ON ic.object_id = c.object_id AND ic.column_id = c.column_id

SET @object_id = (SELECT TOP 1 object_id FROM #IndexList)
SET @index_id = (SELECT TOP 1 index_id FROM #IndexList WHERE object_id = @object_id)

WHILE @object_id IS NOT NULL BEGIN
    SET @List = ''
    SET @ListIncl = ''
    SET @ListShort = ''

    SELECT @List = @List + COALESCE(i.column_name + ', ', '') FROM #IndexList i WHERE i.object_id = @object_id and i.index_id = @index_id and i.included = 0 ORDER BY i.index_column
    SELECT @List = substring(@List, 0, Len(@List))

    SELECT @ListIncl = @ListIncl + COALESCE(i.column_name + ', ', '') FROM #IndexList i WHERE i.object_id = @object_id and i.index_id = @index_id and i.included = 1 ORDER BY i.index_column
    SELECT @ListIncl = substring(@ListIncl, 0, Len(@ListIncl))

    SELECT @ListShort = @ListShort + COALESCE(i.column_name + ', ', '') FROM #IndexList i WHERE i.object_id = @object_id and i.index_id = @index_id and i.included = 0 and index_column <= @KeyFieldMatches ORDER BY i.index_column
    SELECT @ListShort = substring(@ListShort, 0, Len(@ListShort))

    INSERT INTO #IndexListShort 
    SELECT TOP 1 i.object_id
        , i.table_name
        , i.index_id
        , i.index_name
        , @ListShort
        , @List
        , CASE i.index_type --Clustered indexes include everything
            WHEN 1 THEN '*'
            ELSE @ListIncl
            END
    FROM #IndexList i
    WHERE i.object_id = @object_id and i.index_id = @index_id

    DELETE #IndexList WHERE object_id = @object_id and index_id = @index_id

    SET @object_id = (SELECT TOP 1 object_id FROM #IndexList)
    SET @index_id = (SELECT TOP 1 index_id FROM #IndexList WHERE object_id = @object_id)
END

SELECT table_name, index_name, column_names_key, column_names_included
FROM #IndexListShort i
WHERE EXISTS (SELECT * FROM #IndexListShort i2 WHERE i.object_id = i2.object_id AND i.column_names_first_n = i2.column_names_first_n GROUP BY object_id, column_names_first_n HAVING Count(*) > 1)
ORDER BY table_name, column_names_key

DROP TABLE #IndexList
DROP TABLE #IndexListShort

The long answer takes a bit more work to figure out. You need to store the information long-term, keep a snapshot of what it looked like most recently, and do all the calculations to make sure you have the right numbers. It’s really no different than what I do for other DMVs, so you’ll see this process duplicated a lot if you follow my work. The tables for storing this information:

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE TABLE [dbo].[DMVIndex_Usage_Stats](
    [StartTime] [smalldatetime] NOT NULL,
    [EndTime] [smalldatetime] NOT NULL,
    [database_id] [smallint] NOT NULL,
    [object_id] [int] NOT NULL,
    [index_id] [int] NOT NULL,
    [user_seeks] [bigint] NOT NULL,
    [user_scans] [bigint] NOT NULL,
    [user_lookups] [bigint] NOT NULL,
    [user_updates] [bigint] NOT NULL,
    [system_seeks] [bigint] NOT NULL,
    [system_scans] [bigint] NOT NULL,
    [system_lookups] [bigint] NOT NULL,
    [system_updates] [bigint] NOT NULL,
 CONSTRAINT [PK_DMVIndex_Usage_Stats] PRIMARY KEY CLUSTERED 
(
    [StartTime] ASC,
    [database_id] ASC,
    [object_id] ASC,
    [index_id] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON, FILLFACTOR = 90) ON [PRIMARY]
) ON [PRIMARY]
GO

CREATE TABLE [dbo].[DMVIndex_Usage_Stats_Temp](
    [TS] [smalldatetime] NOT NULL,
    [database_id] [smallint] NOT NULL,
    [object_id] [int] NOT NULL,
    [index_id] [int] NOT NULL,
    [user_seeks] [bigint] NOT NULL,
    [user_scans] [bigint] NOT NULL,
    [user_lookups] [bigint] NOT NULL,
    [user_updates] [bigint] NOT NULL,
    [system_seeks] [bigint] NOT NULL,
    [system_scans] [bigint] NOT NULL,
    [system_lookups] [bigint] NOT NULL,
    [system_updates] [bigint] NOT NULL,
 CONSTRAINT [PK_DMVIndex_Usage_Stats_Temp] PRIMARY KEY CLUSTERED 
(
    [TS] ASC,
    [database_id] ASC,
    [object_id] ASC,
    [index_id] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON, FILLFACTOR = 90) ON [PRIMARY]
) ON [PRIMARY]

GO

And now the script that captures the information and does all of your calculations. You may have noticed that the tables themselves do NOT store the index or table names which was intentional. Name changes happen, tables and indexes get dropped and created, all of that is extra work for little benefit. Is the benefit worth it? I thought it wasn’t. I’m not expecting everyone to agree with me here because there are arguments going the other way. My advice is STEAL MY WORK!!! Don’t copy it, you need to steal it. Make it your own, learn it well enough that you can improve it and build upon it. At that point you’ll truly understand every bit of it and hopefully start up your own blog. I hear EvenSimplerSQLServer.com is available, but that’s subject to change. If you don’t want to go that far, then write your ideas in the comment box below. I change these posts from time to time, and this script was actually added months after the original post was made. Just don’t ask me why it wasn’t here in the first place, my answer is just….DOH!!!! Even saying I was only 6 days into blogging at the time I wrote the original doesn’t make that acceptable.

DECLARE @TS as SmallDateTime
DECLARE @Old_TS as SmallDateTime
DECLARE @Reboot_TS as SmallDateTime

SET @TS = GetDate()
SET @Reboot_TS = (SELECT create_date FROM sys.databases WHERE [name] = 'tempdb')

--If update wasn't already done this minute
  -- Delete all but the newest two updates - Only the newest will be used, so one spare is kept
  -- Add the current stats
  -- Calculate the usage between the last two and save in the permanent table
    -- If any of the values go down then the DMV reset for that index
      --It's a horrible join clause, but it makes everything null and uses the isnull(x,0)
      --AND N.user_seeks >= O.user_seeks will be hit if database is refreshed, detached, etc.
IF 0 = (SELECT Count(*) FROM Perf..DMVIndex_Usage_Stats_Temp WHERE TS = @TS) BEGIN
    DELETE Perf..DMVIndex_Usage_Stats_Temp 
    WHERE TS NOT IN 
        (SELECT DISTINCT TOP 2 TS 
         FROM Perf..DMVIndex_Usage_Stats_Temp 
         ORDER BY TS DESC)

    SET @Old_TS = --Newest TS in the temp table
        (SELECT TOP 1 TS
        FROM Perf..DMVIndex_Usage_Stats_Temp
        ORDER BY TS DESC)
        
    INSERT INTO Perf.dbo.DMVIndex_Usage_Stats_Temp
    SELECT @TS
        , database_id
        , object_id
        , index_id
        , user_seeks
        , user_scans
        , user_lookups
        , user_updates
        , system_seeks
        , system_scans
        , system_lookups
        , system_updates
    FROM sys.dm_db_index_usage_stats
    
    INSERT INTO Perf..DMVIndex_Usage_Stats
    SELECT isnull(@Old_TS, @Reboot_TS)
        , @TS
        , N.database_id
        , N.object_id
        , N.index_id
        , N.user_seeks - isnull(O.user_seeks, 0)
        , N.user_scans - isnull(O.user_scans, 0)
        , N.user_lookups - isnull(O.user_lookups, 0)
        , N.user_updates - isnull(O.user_updates, 0)
        , N.system_seeks - isnull(O.system_seeks, 0)
        , N.system_scans - isnull(O.system_scans, 0)
        , N.system_lookups - isnull(O.system_lookups, 0)
        , N.system_updates - isnull(O.system_updates, 0)
    FROM Perf..DMVIndex_Usage_Stats_Temp N --N = New, O = Old
        LEFT JOIN Perf..DMVIndex_Usage_Stats_Temp O ON O.TS = @Old_TS
            AND O.database_id = N.database_id
            AND O.object_id = N.object_id
            AND O.index_id = N.index_id
            AND N.user_seeks >= O.user_seeks
            AND N.user_scans >= O.user_scans
            AND N.user_lookups >= O.user_lookups
            AND N.user_updates >= O.user_updates
            AND N.system_seeks >= O.system_seeks
            AND N.system_scans >= O.system_scans
            AND N.system_lookups >= O.system_lookups
            AND N.system_updates >= O.system_updates
            AND @Reboot_TS < O.TS --They're reset on reboot, so don't compare if it was rebooted since last reading
    WHERE N.TS = @TS
END

Job Schedules – SQL Agent


This info should be easier to get than it is. Keep in mind that for something to run the subscription and schedule both have to be enabled.

To be fair, the majority of this code was grabbed from the internet, and this is not something I modified to the extent that I could call it my own. Michael Abair is the original author, and the original code can be found at the link below. My contribution is limited to very minor changes and a lot of hitting the tab key. I know it works just as well, maybe even 2ms faster, without the whitespace, but I had to do it.

http://www.sqlservercentral.com/scripts/Jobs/69088/

SELECT /*S.job_id,*/ S.job_name, S.is_job_enabled, S.is_schedule_enabled, S.schedule_name, S.Description
    ,avg_duration_in_seconds = avg(datediff(s, '1/1/2000', ('1/1/2000 ' + cast(stuff(stuff(right('000000' + cast(h.run_duration as varchar(6)), 6),5,0,':'),3,0,':') as datetime))))
    ,number_of_runs = count(1)
FROM (SELECT SJ.job_id 
        , SJ.name as job_name
        , SJ.enabled as is_job_enabled
        , SS.enabled as is_schedule_enabled
        , SS.name as schedule_name
        , CASE freq_type
            WHEN 1 THEN 'Occurs on ' + STUFF(RIGHT(active_start_date, 4), 3,0, '/') + '/' + LEFT(active_start_date, 4) + ' at '
                + REPLACE( RIGHT(CONVERT(varchar(30), CAST(convert(varchar(8), STUFF(STUFF(RIGHT('000000' + CAST(active_start_time as varchar(10)), 6), 3, 0, ':' ), 6, 0, ':' ), 8) as datetime) /* hh:mm:ss 24H */, 9), 14), ':000', ' ') /* HH:mm:ss:000AM/PM then replace the :000 with space.*/
            WHEN 4 THEN 'Occurs every ' + CAST(freq_interval as varchar(10)) + ' day(s) '
                + CASE freq_subday_type
                    WHEN 1 THEN 'at '+ LTRIM(REPLACE( RIGHT(CONVERT(varchar(30), CAST(convert(varchar(8), STUFF(STUFF(RIGHT('000000' + CAST(active_start_time as varchar(10)), 6), 3, 0, ':' ), 6, 0, ':' ), 8) as datetime), 9), 14), ':000', ' '))
                    WHEN 2 THEN 'every ' + CAST(freq_subday_interval as varchar(10)) + ' second(s)'
                    WHEN 4 THEN 'every ' + CAST(freq_subday_interval as varchar(10)) + ' minute(s)'
                    WHEN 8 THEN 'every ' + CAST(freq_subday_interval as varchar(10)) + ' hour(s)'
                    ELSE '' 
                    END
                + CASE WHEN freq_subday_type in (2,4,8) /* repeat seconds/mins/hours */
                        THEN ' between '+ LTRIM(REPLACE( RIGHT(CONVERT(varchar(30), CAST(convert(varchar(8), STUFF(STUFF(RIGHT('000000' + CAST(active_start_time as varchar(10)), 6), 3, 0, ':' ), 6, 0, ':' ), 8) as datetime), 9), 14), ':000', ' '))
                        + ' and '
                        + LTRIM(REPLACE( RIGHT(CONVERT(varchar(30), CAST(convert(varchar(8), STUFF(STUFF(RIGHT('000000' + CAST(active_end_time as varchar(10)), 6), 3, 0, ':' ), 6, 0, ':' ), 8) as datetime), 9), 14), ':000', ' '))
                    ELSE ''
                    END
            WHEN 8 THEN 'Occurs every ' + CAST(freq_recurrence_factor as varchar(10))
                + ' week(s) on '
                +
                REPLACE( CASE WHEN freq_interval&1 = 1 THEN 'Sunday, ' ELSE '' END
                + CASE WHEN freq_interval&2 = 2 THEN 'Monday, ' ELSE '' END
                + CASE WHEN freq_interval&4 = 4 THEN 'Tuesday, ' ELSE '' END
                + CASE WHEN freq_interval&8 = 8 THEN 'Wednesday, ' ELSE '' END
                + CASE WHEN freq_interval&16 = 16 THEN 'Thursday, ' ELSE '' END
                + CASE WHEN freq_interval&32 = 32 THEN 'Friday, ' ELSE '' END
                + CASE WHEN freq_interval&64 = 64 THEN 'Saturday, ' ELSE '' END
                + '|', ', |', ' ') /* get rid of trailing comma */

                + CASE freq_subday_type
                    WHEN 1 THEN 'at '+ LTRIM(REPLACE( RIGHT(CONVERT(varchar(30), CAST(convert(varchar(8), STUFF(STUFF(RIGHT('000000' + CAST(active_start_time as varchar(10)), 6), 3, 0, ':' ), 6, 0, ':' ), 8) as datetime), 9), 14), ':000', ' '))
                    WHEN 2 THEN 'every ' + CAST(freq_subday_interval as varchar(10)) + ' second(s)'
                    WHEN 4 THEN 'every ' + CAST(freq_subday_interval as varchar(10)) + ' minute(s)'
                    WHEN 8 THEN 'every ' + CAST(freq_subday_interval as varchar(10)) + ' hour(s)'
                    ELSE '' 
                    END
                + CASE WHEN freq_subday_type in (2,4,8) /* repeat seconds/mins/hours */
                        THEN ' between '+ LTRIM(REPLACE( RIGHT(CONVERT(varchar(30), CAST(convert(varchar(8), STUFF(STUFF(RIGHT('000000' + CAST(active_start_time as varchar(10)), 6), 3, 0, ':' ), 6, 0, ':' ), 8) as datetime), 9), 14), ':000', ' '))
                        + ' and '
                        + LTRIM(REPLACE( RIGHT(CONVERT(varchar(30), CAST(convert(varchar(8), STUFF(STUFF(RIGHT('000000' + CAST(active_end_time as varchar(10)), 6), 3, 0, ':' ), 6, 0, ':' ), 8) as datetime), 9), 14), ':000', ' '))
                    ELSE ''
                    END
            WHEN 16 THEN 'Occurs every ' + CAST(freq_recurrence_factor as varchar(10))
                + ' month(s) on '
                + 'day ' + CAST(freq_interval as varchar(10)) + ' of that month ' 
                + CASE freq_subday_type
                    WHEN 1 THEN 'at '+ LTRIM(REPLACE( RIGHT(CONVERT(varchar(30), CAST(convert(varchar(8), STUFF(STUFF(RIGHT('000000' + CAST(active_start_time as varchar(10)), 6), 3, 0, ':' ), 6, 0, ':' ), 8) as datetime), 9), 14), ':000', ' '))
                    WHEN 2 THEN 'every ' + CAST(freq_subday_interval as varchar(10)) + ' second(s)'
                    WHEN 4 THEN 'every ' + CAST(freq_subday_interval as varchar(10)) + ' minute(s)'
                    WHEN 8 THEN 'every ' + CAST(freq_subday_interval as varchar(10)) + ' hour(s)'
                    ELSE '' 
                    END
                + CASE WHEN freq_subday_type in (2,4,8) /* repeat seconds/mins/hours */
                        THEN ' between '+ LTRIM(REPLACE( RIGHT(CONVERT(varchar(30), CAST(convert(varchar(8), STUFF(STUFF(RIGHT('000000' + CAST(active_start_time as varchar(10)), 6), 3, 0, ':' ), 6, 0, ':' ), 8) as datetime), 9), 14), ':000', ' '))
                        + ' and '
                        + LTRIM(REPLACE( RIGHT(CONVERT(varchar(30), CAST(convert(varchar(8), STUFF(STUFF(RIGHT('000000' + CAST(active_end_time as varchar(10)), 6), 3, 0, ':' ), 6, 0, ':' ), 8) as datetime), 9), 14), ':000', ' '))
                    ELSE ''
                    END
            WHEN 32 THEN 'Occurs ' 
                + CASE freq_relative_interval
                    WHEN 1 THEN 'every first '
                    WHEN 2 THEN 'every second '
                    WHEN 4 THEN 'every third '
                    WHEN 8 THEN 'every fourth '
                    WHEN 16 THEN 'on the last '
                    END
                + CASE freq_interval 
                    WHEN 1 THEN 'Sunday'
                    WHEN 2 THEN 'Monday'
                    WHEN 3 THEN 'Tuesday'
                    WHEN 4 THEN 'Wednesday'
                    WHEN 5 THEN 'Thursday'
                    WHEN 6 THEN 'Friday'
                    WHEN 7 THEN 'Saturday'
                    WHEN 8 THEN 'day'
                    WHEN 9 THEN 'weekday'
                    WHEN 10 THEN 'weekend'
                    END
                + ' of every ' + CAST(freq_recurrence_factor as varchar(10)) + ' month(s) '
                + CASE freq_subday_type
                    WHEN 1 THEN 'at '+ LTRIM(REPLACE( RIGHT(CONVERT(varchar(30), CAST(convert(varchar(8), STUFF(STUFF(RIGHT('000000' + CAST(active_start_time as varchar(10)), 6), 3, 0, ':' ), 6, 0, ':' ), 8) as datetime), 9), 14), ':000', ' '))
                    WHEN 2 THEN 'every ' + CAST(freq_subday_interval as varchar(10)) + ' second(s)'
                    WHEN 4 THEN 'every ' + CAST(freq_subday_interval as varchar(10)) + ' minute(s)'
                    WHEN 8 THEN 'every ' + CAST(freq_subday_interval as varchar(10)) + ' hour(s)'
                    ELSE '' 
                    END
                + CASE 
                    WHEN freq_subday_type in (2,4,8) /* repeat seconds/mins/hours */
                        THEN ' between '+ LTRIM(REPLACE( RIGHT(CONVERT(varchar(30), CAST(convert(varchar(8), STUFF(STUFF(RIGHT('000000' + CAST(active_start_time as varchar(10)), 6), 3, 0, ':' ), 6, 0, ':' ), 8) as datetime), 9), 14), ':000', ' '))
                        + ' and '
                        + LTRIM(REPLACE( RIGHT(CONVERT(varchar(30), CAST(convert(varchar(8), STUFF(STUFF(RIGHT('000000' + CAST(active_end_time as varchar(10)), 6), 3, 0, ':' ), 6, 0, ':' ), 8) as datetime), 9), 14), ':000', ' '))
                    ELSE ''
                    END
            WHEN 64 THEN 'Runs when the SQL Server Agent service starts'
            WHEN 128 THEN 'Runs when the computer is idle'
            END 
        AS [Description]
        FROM msdb.dbo.sysjobs SJ
            INNER JOIN msdb.dbo.sysjobschedules SJS ON SJ.job_id = SJS.job_id
            INNER JOIN msdb.dbo.sysschedules SS ON SJS.schedule_id = SS.schedule_id
            INNER JOIN msdb.dbo.syscategories SC ON SJ.category_id = SC.category_id
        --WHERE SC.name = 'Name from query below'
    ) S
    INNER JOIN msdb.dbo.sysjobhistory H ON s.job_id = H.job_id AND H.step_id = 0
WHERE H.run_date >= /* 7 days ago */cast(datepart(yyyy, dateadd(d, -7, getDate())) as VarChar(10)) + cast(datepart(mm, dateadd(d, -7, getDate())) as VarChar(10)) + cast(datepart(dd, dateadd(d, -7, getDate())) as VarChar(10)) --format getDate once to compare against multiple run_dates
GROUP BY /*S.job_id,*/ S.job_name, S.is_job_enabled, S.is_schedule_enabled, S.schedule_name, S.Description
ORDER BY S.job_name

--SELECT * FROM msdb..syscategories --If you want to uncomment the line "WHERE SC.name = ...", copy/paste from this

Roll Logs – Backup Recovery


Rolling log files for a day, especially with 15 or even 5 minute log backups is a pain at best. Here’s a slightly better way to do it. Set the two variables at the top to the directory where the log backups are and the database you’re looking to restore. It will, assuming you named your log backups as DatabaseName_Log*, display all log backups in that folder in chronological order into an output script which is best read when doing results to text (Query/Results To/Results To Text).

Here’s what this won’t do. It won’t require SQL Server to remember taking the backups, so you can run it from any server that has access to this folder. It won’t restore anything for you, it will just give you the text to copy/paste and run yourself. It won’t leave xp_cmdshell on, although it does require it to be turned on for a bit if it was off.

Since this does nothing more than display the code you’ll probably run next, feel free to run it even if you’re just playing around or only want to roll half the logs.

DECLARE @LogBackupDirectory VarChar(1000)
DECLARE @DatabaseName VarChar(256)

SELECT @LogBackupDirectory = '\\BUServer\BUShare\Folder\logs\'
    , @DatabaseName = 'model'

---Don't change anything below this point unless upgrading the script---
---Also, slight change if backup and restore database names are different---

SET NOCOUNT ON

DECLARE @XP_CmdShell_Enabled INT
    , @XP_CmdShell_Command VarChar(4000)

IF Object_ID('TempDB..#temp') IS NOT NULL BEGIN
    DROP TABLE #temp
END

create table #temp
(
      Dir VARCHAR(MAX)
)

SELECT @XP_CmdShell_Enabled = CONVERT(INT, ISNULL(value, value_in_use))
FROM master.sys.configurations
WHERE name = 'xp_cmdshell'

IF @XP_CmdShell_Enabled = 0 BEGIN
    EXEC sp_configure 'xp_cmdshell', 1
    RECONFIGURE WITH OVERRIDE 
END 

SELECT @XP_CmdShell_Command = 'dir  /od /b ' + @LogBackupDirectory + @DatabaseName + '_log*'

INSERT INTO #temp
EXECUTE xp_cmdshell @XP_CmdShell_Command

IF @XP_CmdShell_Enabled = 0 BEGIN
    EXEC sp_configure 'xp_cmdshell', 0
    RECONFIGURE WITH OVERRIDE 
END 

DELETE #temp
WHERE Dir IS NULL

SELECT commands = 'Restore Log [' + @DatabaseName + '] from DISK=N''' + @LogBackupDirectory + Dir + ''' with NoRecovery' 
FROM #temp

SELECT 'RESTORE DATABASE [' + @DatabaseName + '] WITH RECOVERY'

Table Sizes


There’s no simple way in SQL Server to see the sizes of all the tables and their indexes.  Even seeing a single table’s size through SSMS can take a while.  The solution below is grabbed from statistics, so there’s no guarantee you’re getting exact numbers.  However, exact numbers don’t tend to matter, as you care more that the table is about 1 GB, not that it’s exactly 998 MB.

There are variations of this code floating all over the place, and I tweaked the results to be in the easiest to read format possible.

The only issue I have with this script is that you have to scroll to the next-to-the-last line to filter which tables it’s pulling. Yes, I’m picky, and I don’t like to scroll to see things that change.

SELECT Database_ID = DB_ID()
    , Database_Name = DB_NAME()
    , Schema_Name = a3.name
    , TableName = a2.name
    , TableSize_MB = (a1.reserved + ISNULL(a4.reserved,0)) / 128
    , RowCounts = a1.rows
    , DataSize_MB = a1.data / 128
    , IndexSize_MB = (CASE WHEN (a1.used + ISNULL(a4.used,0)) > a1.data 
                        THEN (a1.used + ISNULL(a4.used,0)) - a1.data 
                        ELSE 0 
                    END) /128
    , Free_MB = (CASE WHEN (a1.reserved + ISNULL(a4.reserved,0)) > a1.used 
                        THEN (a1.reserved + ISNULL(a4.reserved,0)) - a1.used 
                        ELSE 0 
                    END) / 128
FROM (SELECT ps.object_id
            , [rows] = SUM(CASE
                                WHEN (ps.index_id < 2) THEN row_count
                                ELSE 0
                            END)
            , reserved = SUM(ps.reserved_page_count)
            , data = SUM(CASE
                            WHEN (ps.index_id < 2) 
                                THEN (ps.in_row_data_page_count + ps.lob_used_page_count + ps.row_overflow_used_page_count)
                            ELSE (ps.lob_used_page_count + ps.row_overflow_used_page_count)
                        END)
            , used = SUM (ps.used_page_count) 
        FROM sys.dm_db_partition_stats ps
        GROUP BY ps.object_id) AS a1
    INNER JOIN sys.all_objects a2  ON a1.object_id = a2.object_id
    INNER JOIN sys.schemas a3 ON a2.schema_id = a3.schema_id
    LEFT JOIN (SELECT it.parent_id
            , reserved = SUM(ps.reserved_page_count)
            , used = SUM(ps.used_page_count)
        FROM sys.dm_db_partition_stats ps
            INNER JOIN sys.internal_tables it ON it.object_id = ps.object_id
        WHERE it.internal_type IN (202,204)
        GROUP BY it.parent_id) AS a4 ON a4.parent_id = a1.object_id
WHERE a2.type <> 'S' and a2.type <> 'IT'
    --AND a2.name IN ('spt_values')
ORDER BY a1.reserved desc

Index Columns and Names


Don’t trust the name of an index to tell you what it is. The name is wrong, or at least that’s what you have to assume.

I’m a huge fan of index naming conventions, with my favorite being starting out by saying it’s an index (IDX) followed by the table name, then the key columns, the letters INCL if there are included columns, each included column listed (if reasonable, just do all or nothing for each index), a U or N denoting if it’s unique or not, then a C or N denoting if it’s clustered or not. However, there are too many times I’ve seen an index naming convention get me in trouble where the index IDX_TableName_Key1_Key2_Key3_INCL_Incl1_Incl2_U_N actually not have the column Key1 in it, and it wasn’t unique either.

You can’t always trust the naming convention, even if it does exist. However, the system tables are always right. Here we have all of the key and included fields in an index, how much each index has been used (since restart of services or rebuilds in 2012+), and the size of each index. This typically takes less than a second to run in my experience, and you’ll get a feel for how it runs in your nonprod environment before it runs in prod, right?

DECLARE @TableName VarChar(100) = '%' --Accepts wildcards, % will give you all indexes in the database

SELECT TableName = i.SchemaName + '.' + i.TableName
	, IndexName = ISNULL(i.IndexName, '[' + Lower(i.IndexType) + ']')
	, i.User_Updates
	, i.User_Seeks
	, i.User_Scans
	, i.User_Lookups
	, KeyColumnList = substring((SELECT (', ' + c.name)
							FROM sys.index_columns ic
								INNER JOIN sys.columns c ON ic.object_id = c.object_id AND ic.column_id = c.column_id
							WHERE i.object_id = ic.object_id AND i.index_id = ic.index_id
								AND ic.is_included_column = 0
							ORDER BY ic.key_ordinal
							FOR XML PATH ('')
							), 3, 2000)
	, IncludedColumnList = CASE WHEN i.IndexType IN ('Clustered', 'Heap') THEN '*' ELSE
							substring((SELECT (', ' + c.name)
							FROM sys.index_columns ic
								INNER JOIN sys.columns c ON ic.object_id = c.object_id AND ic.column_id = c.column_id
							WHERE i.object_id = ic.object_id AND i.index_id = ic.index_id
								AND ic.is_included_column = 1
							ORDER BY c.Name
							FOR XML PATH ('')
							), 3, 2000)
						END
	, i.filter_definition
	, i.is_unique
	, i.Used_MB
	, i.Reserved_MB
	, Free_MB = i.Reserved_MB - i.Used_MB
	, i.Used_InRow_MB
	, i.Used_LOB_MB 
	, i.Row_Count
	, i.index_id
--	, DropStatement = 'DROP INDEX [' + i.IndexName + '] ON [' + i.SchemaName + '].[' + i.TableName + ']'
FROM 
	(
	SELECT SchemaName = s.name
		, TableName = t.name
		, IndexName = i.name
		, IndexType = i.type_desc 
		, i.object_id  
		, i.index_id
		, i.filter_definition
		, i.is_unique
		, User_Updates = SUM(ius.User_Updates)
		, User_Seeks = SUM(ius.user_seeks)
		, User_Scans = SUM(ius.user_scans)
		, User_Lookups = SUM(ius.user_lookups)
		, i.Used_MB 
		, i.Reserved_MB 
		, i.Used_InRow_MB
		, i.Used_LOB_MB 
		, i.row_count
	FROM (SELECT i.name
				, i.type_desc
				, i.object_id
				, i.index_id
				, i.is_unique
				, i.filter_definition
				, Used_MB = SUM(PS.used_page_count) / 128
				, Reserved_MB = SUM(PS.reserved_page_count) / 128 
				, Used_InRow_MB = SUM(PS.in_row_used_page_count) / 128
				, Used_LOB_MB = SUM(PS.lob_used_page_count) / 128
				, row_count = SUM(PS.row_count)
			FROM sys.indexes i 
				LEFT JOIN sys.dm_db_partition_stats PS ON i.object_id = PS.object_id AND i.index_id = PS.index_id
			GROUP BY i.name, i.type_desc, i.object_id, i.index_id, i.is_unique, i.filter_definition) i
		INNER JOIN sys.all_objects t ON i.object_id = t.object_id
		INNER JOIN sys.schemas s ON t.schema_id = s.schema_id
		LEFT JOIN sys.dm_db_Index_Usage_Stats ius ON ius.object_id = i.object_id 
											AND ius.index_id = i.index_id 
											AND ius.database_id = DB_ID()
	WHERE t.name LIKE @TableName  	
	GROUP BY s.name, t.name, i.name, i.object_id, i.index_id, i.is_unique, i.type_desc, i.filter_definition, i.Used_MB, i.Reserved_MB, i.row_count, Used_InRow_MB, Used_LOB_MB
	) i
ORDER BY 1, KeyColumnList 

You may say that’s a temporary one-off inqury the doesn’t fix anything, and it is. However, the permanent fix is very invasive, will void your support contracts, can cause damage, would cause pieces of future upgrades to fail, may not work if it generates a name that’s too long, and other minor details. Assuming you have a home-grown database and absolutely no query hints specifying an index anywhere in your code, have a dev environment recently refreshed from prod, and have looked into every other issue that I never even considered, do I have some code for you!!!

SELECT Command = '--DON''T RUN THIS WITHOUT FIRST LOOKING INTO THE CONSEQUENCES AND UPDATING YOUR RESUME'

UNION 
SELECT Command = 'EXEC sp_rename ''' + TableName + '.' + IndexName + ''', ''' + 'IX_' + TableName + ColumnList + case when len(IncludeList) > 3 then '_INCL' else '' end + ISNULL(IncludeList, '') + '_' + case is_unique when 1 then 'U' else 'N' end + '_' + left(IndexType COLLATE SQL_Latin1_General_CP1_CS_AS, 1) + ''' , ''INDEX'''
FROM (
        SELECT TableName = t.name
            , IndexName = i.name
            , i.is_unique
            , IndexType = i.type_desc
            , FileGroupName = d.name
            , ColumnList = substring((SELECT ('_' + c.name)
                                    FROM sys.index_columns ic
                                        INNER JOIN sys.columns c ON ic.object_id = c.object_id AND ic.column_id = c.column_id
                                    WHERE i.object_id = ic.object_id AND i.index_id = ic.index_id
                                        AND ic.is_included_column = 0
                                    ORDER BY ic.key_ordinal
                                    FOR XML PATH ('')
                                    ), 1, 2000)
            , IncludeList = substring((SELECT ('_' + c.name)
                                    FROM sys.index_columns ic
                                        INNER JOIN sys.columns c ON ic.object_id = c.object_id AND ic.column_id = c.column_id
                                    WHERE i.object_id = ic.object_id AND i.index_id = ic.index_id
                                        AND ic.is_included_column = 1
                                    ORDER BY ic.key_ordinal
                                    FOR XML PATH ('')
                                    ), 1, 2000)
        FROM sys.tables t
            INNER JOIN sys.indexes i ON t.object_id = i.object_id
            INNER JOIN sys.data_spaces d ON i.data_space_id = d.data_space_id
        WHERE t.name IN (
                  'Table_1'
                , 'Table_2')
    )x
ORDER BY 1

Running Processes


Sp_who2 is not used by this DBA.  That info is amazingly good to have, but I want more.  Don’t get me wrong, sp_who2 is kinda hidden in this one as it’s really just a proc that does little more than “SELECT * FROM SysProcesses“, and I use SysProcesses where I couldn’t find the same info in the DMVs that were supposed to replace it.  Don’t go thinking that SysProcesses is just a depreciated dinosaur, even MVPs still use it: http://sqlblog.com/blogs/kalen_delaney/archive/2008/06/29/why-i-still-need-sysprocesses.aspx

Threads and RunningThreads are new for me here.  It’s good to see when a query is split into 100 threads and only 1 is doing anything.  It’s also easier on the eyes to see the number 100 instead of seeing 100 separate rows for one SPID.

The percent complete and estimated completion time on this script are typically blank, with the major exceptions being backups and restores where these columns are the best thing ever! 

BlockedBy is something overlooked by many people.  If your query seems to be running forever when it normally runs quick, there’s about a 75% chance you’re just waiting for another process to complete. It’s also why you’ll see the notorious (nolock) in almost everything I write for myself, but that’s another conversation.  The HeadBlocker attempts find the start of a blocking chain, although I can say that I know my logic here isn’t bulletproof.

My wait types used to be perfect where if there wait time was 0 then it was lowercase, otherwise it was all caps.  It seems I broke it slightly, but it’s almost perfect.  My OCD will eventually get the best of me here, but it has to contend with time with the rest of my life.

The rest is pretty self-explanitory with the exception of query plans.  They are the last line of the SELECT and FROM clauses that are commented out.  It’s somewhat rare to need the plan, and it’s not too uncommon to have this script run noticably slower when looking for plans.  Because of that, I uncomment that column and the the cross apply that makes it possible only when needed.

SELECT SPID = er.session_id
    , ot.Threads
    , RunningThreads = coalesce(rsp.RunningThreads,0)
    , Pct_Comp = er.percent_complete
    , Est_Comp_Time = CASE er.estimated_completion_time WHEN 0 THEN NULL ELSE dateadd(ms, er.estimated_completion_time, getdate()) END 
    , er.status
    , er.command
    , database_name = sd.name
    , BlockedBy = wt.blocking_session_id
    , HeadBlocker = coalesce(hb5.session_id, hb4.session_id, hb3.session_id, hb2.session_id, hb1.session_id)
    , wait_type = coalesce(CASE er.wait_type WHEN 'CXPACKET' THEN 'CXPACKET - ' + sp.lastwaittype1 ELSE sp.lastwaittype1 END, lower(er.last_wait_type)) --Lowercase denotes it's not currently waiting, also noted by a wait time of 0.
    , Wait_Time_Sec = Cast(er.wait_time/1000.0 as DEC(20,3))
    , er.wait_resource
    , Duration_Sec = Cast(DATEDIFF(s, er.start_time, GETDATE()) as DEC(20,0))
    , CPU_Sec = Cast(er.cpu_time/1000.0 as DEC(20,3))
    , Reads_K = Cast(er.reads/1000.0 as DEC(20,3))
    , Writes_K = Cast(er.writes/1000.0 as DEC(20,3))
    , [Statement] = SUBSTRING (st.text, er.statement_start_offset/2,
        abs(CASE WHEN er.statement_end_offset = -1
               THEN LEN(CONVERT(NVARCHAR(MAX), st.text)) * 2 
            ELSE er.statement_end_offset END - er.statement_start_offset)/2)
    , st.text as Query
    , es.login_time
    , es.host_name
    , program_name = CASE LEFT(es.program_name, 29)
                    WHEN 'SQLAgent - TSQL JobStep (Job '
                        THEN 'SQLAgent Job: ' + (SELECT name FROM msdb..sysjobs sj WHERE substring(es.program_name,32,32)=(substring(sys.fn_varbintohexstr(sj.job_id),3,100))) + ' - ' + SUBSTRING(es.program_name, 67, len(es.program_name)-67)
                    ELSE es.program_name
                    END  
    , es.client_interface_name
    , es.login_name
    , es.status
    , es.total_scheduled_time
    , es.total_elapsed_time
    , er.start_time
    , es.last_request_start_time
    , es.last_request_end_time
    , er.database_id  
    --, qp.query_plan 
FROM sys.dm_exec_requests er
    INNER JOIN sys.dm_exec_Sessions es on er.session_id=es.session_id
    LEFT JOIN sys.databases sd on er.database_id=sd.database_id
    INNER JOIN (SELECT session_id, count(1) Threads FROM sys.dm_os_tasks GROUP BY session_id) ot on er.session_id=ot.session_id
    LEFT JOIN (SELECT spid, LastWaitType1 = MIN(lastwaittype), LastWaitType2 = MAX(lastwaittype) FROM sysprocesses sp WHERE waittime > 0 AND lastwaittype <> 'cxpacket' GROUP BY spid) sp ON er.session_id = sp.spid
    LEFT JOIN (SELECT spid, RunningThreads = COUNT(1) FROM sysprocesses sp WHERE waittime = 0 GROUP BY spid) rsp ON er.session_id = rsp.spid
    LEFT JOIN (SELECT session_id, max(blocking_session_id) blocking_session_id FROM sys.dm_os_waiting_tasks wt WHERE wt.blocking_session_id <> wt.session_id GROUP BY session_id) wt ON er.session_id = wt.session_id 
    LEFT JOIN (SELECT session_id, max(blocking_session_id) blocking_session_id FROM sys.dm_os_waiting_tasks wt GROUP BY session_id) hb1 ON wt.blocking_session_id = hb1.session_id 
    LEFT JOIN (SELECT session_id, max(blocking_session_id) blocking_session_id FROM sys.dm_os_waiting_tasks wt GROUP BY session_id) hb2 ON hb1.blocking_session_id = hb2.session_id 
    LEFT JOIN (SELECT session_id, max(blocking_session_id) blocking_session_id FROM sys.dm_os_waiting_tasks wt GROUP BY session_id) hb3 ON hb2.blocking_session_id = hb3.session_id 
    LEFT JOIN (SELECT session_id, max(blocking_session_id) blocking_session_id FROM sys.dm_os_waiting_tasks wt GROUP BY session_id) hb4 ON hb3.blocking_session_id = hb4.session_id 
    LEFT JOIN (SELECT session_id, max(blocking_session_id) blocking_session_id FROM sys.dm_os_waiting_tasks wt GROUP BY session_id) hb5 ON hb4.blocking_session_id = hb5.session_id 
    CROSS APPLY sys.dm_exec_sql_text(sql_handle) AS st  
    --CROSS APPLY sys.dm_exec_query_plan(er.plan_handle) qp
WHERE er.session_id <> @@SPID
    --AND es.host_name like '%%'
    --AND er.session_id = 2702
ORDER BY er.percent_complete DESC, er.cpu_time DESC, er.session_id

--Use the below command to get the last input of an open session id
--dbcc inputbuffer(61)

Backup History


Can you tell me the drive we were backing up server #58 to exactly 8 months ago, and, if possible, it would really help if you could give me the exact file name for me to pull the backup from tape for you.  Or, last night’s backup didn’t finish until after the tapes started, did those backups always take that long and is it taking longer because the databases are actually growing that much?

The answer to those questions tends to be “Ok, no problem.”

SELECT TOP 100 bs.database_name
    , bs.backup_start_date
    , bs.backup_finish_date
    , backup_duration_minutes = (DateDiff(second, backup_start_date, backup_finish_date)+30)/60
    , backup_duration_seconds = DateDiff(second, backup_start_date, backup_finish_date)
    , backup_size_mb = Cast(bs.backup_size / 1024 / 1024 as Int)
    --, compressed_backup_size_mb = Cast(bs.compressed_backup_size / 1024 / 1024 as Int) --SQL 2008+ only
    , backup_type = CASE bs.type 
                        WHEN 'D' THEN 'FULL' 
                        WHEN 'I' THEN 'Diff' 
                        WHEN 'L' THEN 'Log' 
                        ELSE 'UNKNOWN' 
                    END
    , bmf.physical_device_name
FROM msdb..backupset bs
    INNER JOIN msdb..backupmediafamily bmf ON bs.media_set_id = bmf.media_set_id
--WHERE bs.database_name like 'msdb'
    --AND bs.type <> 'L'
    --AND backup_start_date BETWEEN '2013-05-13 00:00' AND '2013-05-13 08:00'
ORDER BY Backup_Start_Date DESC

It’s common in my scripts for me to put 10 things in the where clause and comment them out. My problem is that I don’t want to look around for field names to filter my scripts, but I don’t always want to filter my scripts the same way. It works for me, and you’ll either get used to it or rewrite it for your own use.

Cluster Info


This script is so simple you’ll start off by asking why you’d ever use it.  Then you’ll use it and wonder why no one gave this to you as part of your new-hire orientation at your first DBA job.  The truth is that in larger environments there will always be times when you’re asked what the active node is, are two instances running on the same node, what nodes are in that cluster, what drives belong to what instance, etc…  Here’s your answer.

SELECT VirtualServerName = SERVERPROPERTY('ServerName')
    , ActiveNode = SERVERPROPERTY('ComputerNamePhysicalNetBIOS') 
    , PassiveNode = CASE (SELECT COUNT(*) FROM sys.dm_os_cluster_nodes WHERE NodeName <> SERVERPROPERTY('ComputerNamePhysicalNetBIOS'))
        WHEN 0 THEN 'Stand-Alone'
        WHEN 1 THEN (SELECT NodeName FROM sys.dm_os_cluster_nodes WHERE NodeName <> SERVERPROPERTY('ComputerNamePhysicalNetBIOS'))
        ELSE Cast((SELECT COUNT(*) FROM sys.dm_os_cluster_nodes WHERE NodeName <> SERVERPROPERTY('ComputerNamePhysicalNetBIOS')) as VarChar(2)) + ' Passive Nodes'
        END

SELECT PassiveNodes = NodeName 
FROM sys.dm_os_cluster_nodes 
WHERE NodeName <> SERVERPROPERTY('ComputerNamePhysicalNetBIOS') 

SELECT ClusterDrives = DriveName 
FROM sys.dm_io_cluster_shared_drives
ORDER BY DriveName

File Sizes


There are two scripts I like to use to look at the file sizes on servers.  The first one is everything I could hope for on servers where there is only a single file per filegroup.  All of the vital information is all in one place.  I know, DBAs tend to be good at math and there was no need for me to make all of these columns.  However, any room I leave to demonstrate my mathematical powers in a professional environment…. Ok, look, CPU time is cheap anymore, I’ll go with that for my reasoning.

IF Object_ID('TempDB..##FileSize') IS NOT NULL BEGIN
    DROP TABLE ##FileSize
END

CREATE TABLE ##FileSize
(      DB VarChar(128)
    , FileLogicalName VarChar(128)
    , FilePhysicalName VarChar(512)
    , FileGroup VarChar(128)
    , Used_MB VarChar(128)
    , Free_MB VarChar(128)
    , Size_MB VarChar(128)
    , GrowthRate VarChar(128)
    , MaxSize VarChar(128)
)

exec sp_MSforeachdb N'use [?]; 
INSERT INTO ##FileSize
SELECT DB = db_name()
    , FileLogicalName = f.name
    , FilePhysicalName = f.physical_name
    , FileGroup = ISNULL(g.name, f.Type_Desc) 
    , Used_MB = REPLACE(CONVERT(varchar(100), (CAST((FileProperty(f.name, ''SpaceUsed'')) / 128 AS money)), 1), ''.00'', '''')
    , Free_MB = REPLACE(CONVERT(varchar(100), (CAST((f.size - FileProperty(f.name, ''SpaceUsed'')) / 128 AS money)), 1), ''.00'', '''')
    , Size_MB = REPLACE(CONVERT(varchar(100), (CAST((f.size) / 128 AS money)), 1), ''.00'', '''')
    , GrowthRate = Case Is_Percent_Growth
                    WHEN 1 THEN '''' + Cast(Growth as VarChar(100)) + ''%''
                    ELSE REPLACE(CONVERT(varchar(100), (CAST(Growth/128 AS money)), 1), ''.00'', '''') + '' MB''
                    END 
    , MaxSize = Case Max_Size
                    WHEN -1 THEN ''---''
                    WHEN 268435456 THEN ''---''
                    ELSE REPLACE(CONVERT(varchar(100), (CAST(Max_Size/128 AS money)), 1), ''.00'', '''') + '' MB''
                    END
FROM sys.database_files f
    LEFT JOIN sys.filegroups g on f.data_space_id = g.data_space_id
'

SELECT *
    , PctUsed = Cast(100 * replace(Used_MB, ',', '') / (Cast(replace(Size_MB,',', '') as Dec(20,2)) + .01) as Dec(20,2))
    --, PctFree = 100 - Cast(100 * replace(Used_MB, ',', '') / (Cast(replace(Size_MB,',', '') as Dec(20,2)) + .01) as Dec(20,2))
FROM ##FileSize
--WHERE DB = 'msdb'
ORDER BY FilePhysicalName

DROP TABLE ##FileSize

The problem with that script is that there are times it makes sense to have multiple files in a filegroup.  Be it disk limitations, best practices on a SAN saying not to extend a LUN, or your predecessor tossed logic to the wind, there are many reasons you could have multiple files where the previous script leaves you doing math to figure out how big the filegroup is or how much room for growth you have.  You know what I said about doing math…CPU time is cheap.  Here’s what I use when I need to see the filegroup sizes.

IF Object_ID('TempDB..##FileGroupSize') IS NOT NULL BEGIN
    DROP TABLE ##FileGroupSize
END

CREATE TABLE ##FileGroupSize
(      DB VarChar(128)
    , FileGroup VarChar(128)
    , Used_MB Int
    , Free_MB Int
    , Size_MB Int
)

exec sp_MSforeachdb N'use [?]; 
INSERT INTO ##FileGroupSize
SELECT DB = db_name()
    , FileGroup = ISNULL(g.name, f.Type_Desc) 
    , Used_MB = SUM(FileProperty(f.name, ''SpaceUsed'')) / 128
    , Free_MB = SUM(f.size - FileProperty(f.name, ''SpaceUsed'')) / 128
    , Size_MB = SUM(f.size) / 128 
FROM sys.database_files f
    LEFT JOIN sys.filegroups g on f.data_space_id = g.data_space_id
GROUP BY f.Type_Desc, g.name
'

SELECT *
    , PctUsed = Cast(100 * Used_MB / (Cast(Size_MB as Dec(20,2)) + .01) as Dec(20,2))
    , PctFree = 100 - Cast(100 * Used_MB / (Cast(Size_MB as Dec(20,2)) + .01) as Dec(20,2))
FROM ##FileGroupSize
ORDER BY DB, FileGroup

DROP TABLE ##FileGroupSize

It should be noted that I have different rules for different situations.  If a script is part of a stored proc or view then * isn’t even a consideration.  If an external program, even an SSRS report, will consume the code then I would never try to format it using SQL.  However, these are saved in my scripts folder as FileSize.sql and FilegroupSizes.sql, and my personal .sql files get to break a rule or two.

Database Assessment


This script looks much more intimidating than the results it produces, so I recommend running it before reading through it.  I’ll run this when I first get on a unfamiliar server as a basic assessment of what’s there and find some hard to find issues with no effort.

The issues you’ll find include:

  • Backups – Where are they, when did they last occur, how big, etc.
    • I always look for databases (excluding model) that are in full or bulk logged recovery with no log backups.  This is a common issue, especially with vendor databases, that will fill your drives with data you didn’t care to have.
      • To know how critical it is when you find this, and you probably will, I include the log size and percent full.
    • There are variables in this script for a cutoff date which will ignore old backups. If you set @use_cutoffdate to 1 then any database not backed up since the value of @cutoffdate will show up as never being backed up. I love this feature because there’s no chance that you’ll misread a year-old backup as being taken last night.
    • Warning, the size of the backup is the size of the data, not the backup file.  To keep this compatible with SQL 2005, I’m not grabbing the compressed_backup_size field.
    • Differential backup results are commented out for a reason…most people don’t use them.  There is absolutely no reason to comment this out other than limiting the number of columns in my results.
  • Compatibility Level – If you upgrade a server it’s easy to leave a database behind.  I’ve seen SQL 2000 compatibility on SQL 2008 R2 for no other reason than “oops”.
  • Collation name – If this isn’t consistent you could run into issues that are next to impossible to debug.  The exceptions, ironically, are Microsoft databases.  Report Server and SharePoint databases specifically will be a different collation than anything else, but they were written with that in mind and handle it gracefully and shouldn’t be changed.
  • Auto Close – Ok, I admit it, I actually have this one commented out in my personal copy of this code.  If this is turned on it will typically blow up your error logs and make them half-way unreadable, and I’ll see this when I attempt to read them.  However, it’s good to know if this is ever turned on as it will kill performance.  I should uncomment this in my version with the mentality that too much information will slightly annoy you, too little will kill you.
Especially with most of us running more vendor databases than home-grown, don’t trust that this stuff was done right.
DECLARE @cutoffdate datetime
DECLARE @use_cutoffdate bit
SET @cutoffdate = DateAdd(Month, -3, GetDate())
SET @use_cutoffdate = 0

IF Object_ID('TempDB..##FileGroupSize') IS NOT NULL BEGIN
    DROP TABLE ##FileGroupSize
END

CREATE TABLE ##FileGroupSize
(      DB VarChar(128)
    , FileGroup VarChar(128)
    , Used_MB Int
    , Free_MB Int
    , Size_MB Int
)

exec sp_MSforeachdb N'use [?]; 
INSERT INTO ##FileGroupSize
SELECT DB = db_name()
    , FileGroup = ISNULL(g.name, f.Type_Desc) 
    , Used_MB = SUM(FileProperty(f.name, ''SpaceUsed'')) / 128
    , Free_MB = SUM(f.size - FileProperty(f.name, ''SpaceUsed'')) / 128
    , Size_MB = SUM(f.size) / 128 
FROM sys.database_files f
    LEFT JOIN sys.filegroups g on f.data_space_id = g.data_space_id
WHERE f.Type_Desc = ''LOG''
GROUP BY f.Type_Desc, g.name
'
SELECT database_name = d.name 
    , d.create_date
    , d.compatibility_level
    , LastFullBU = BUFull.last_backup  
    , LastDiffBU = BUDiff.last_backup  
    , LastLogBU = BULog.last_backup  
    , Recovery_Model = d.recovery_model_desc
    , Log_Size_MB = LogSize.Size_MB
    , Log_Pct_Used = Cast(100 * Used_MB / (Cast(Size_MB as Dec(20,2)) + .01) as Dec(20,2))
    , d.collation_name
    , LastFullLoc = BUFullLoc.physical_device_name
    , LastFullSize_MB = Cast(BUFullLoc.backup_size / 1024 / 1024 as bigint)
    --, LastDiffLoc = BUDiffLoc.physical_device_name
    --, LastDiffSize_MB = Cast(BUDiffLog.backup_size / 1024 / 1024 as bigint)
    , LastLogLoc = BULogLoc.physical_device_name
    , LastLogSize_MB = Cast(BULogLoc.backup_size / 1024 / 1024 as bigint)
    , d.is_auto_close_on
FROM master.sys.databases d with (nolock)
    LEFT JOIN (SELECT b.database_name, max(b.backup_finish_date) last_backup
        FROM msdb..backupset b with (nolock)
        WHERE b.type = 'D' 
            AND (b.backup_finish_date > @cutoffdate or @use_cutoffdate = 0)
        GROUP BY b.database_name
        ) BUFull ON d.name = BUFull.database_name
LEFT JOIN (SELECT b.database_name, max(b.backup_finish_date) last_backup
        FROM msdb..backupset b with (nolock) 
        WHERE b.type = 'I' 
            AND (b.backup_finish_date > @cutoffdate or @use_cutoffdate = 0)
        GROUP BY b.database_name
        ) BUDiff ON d.name = BUDiff.database_name 
    LEFT JOIN (SELECT b.database_name, max(b.backup_finish_date) last_backup
        FROM msdb..backupset b with (nolock) 
        WHERE b.type = 'L' 
            AND (b.backup_finish_date > @cutoffdate or @use_cutoffdate = 0)
        GROUP BY b.database_name
        ) BULog ON d.name = BULog.database_name 
    LEFT JOIN (SELECT f.physical_device_name, b.database_name, b.backup_finish_date, b.backup_size
        FROM msdb..backupset b with (nolock)
            INNER JOIN msdb..backupmediafamily f ON b.media_set_id = f.media_set_id
        WHERE b.type = 'D') BUFullLoc ON BUFull.database_name = BUFullLoc.database_name AND BUFull.last_backup = BUFullLoc.backup_finish_date
    LEFT JOIN (SELECT f.physical_device_name, b.database_name, b.backup_finish_date, b.backup_size
        FROM msdb..backupset b with (nolock)
            INNER JOIN msdb..backupmediafamily f ON b.media_set_id = f.media_set_id
        WHERE b.type = 'I') BUDiffLoc ON BUDiff.database_name = BUDiffLoc.database_name AND BUDiff.last_backup = BUDiffLoc.backup_finish_date
    LEFT JOIN (SELECT f.physical_device_name, b.database_name, b.backup_finish_date, b.backup_size
        FROM msdb..backupset b with (nolock)
            INNER JOIN msdb..backupmediafamily f ON b.media_set_id = f.media_set_id 
        WHERE b.type = 'L') BULogLoc ON BULog.database_name = BULogLoc.database_name AND BULog.last_backup = BULogLoc.backup_finish_date
    LEFT JOIN ##FileGroupSize LogSize ON d.name = LogSize.DB
WHERE d.name <> 'tempdb'
    AND d.state = 0 --Online

DROP TABLE ##FileGroupSize