Indexes – Unused and Duplicates

Indexes aren’t free, and many databases end up with unused indexes. Every time you make any update to a table you will be updating the clustered index (I assume no heaps, because I hate heaps), and every index that has uses one of the columns that were updated. Inserts and Deletes affect every column and will affect every index, with exceptions for non-typical indexes. These updates cost a lot, to the point that I have several processes that disable certain nonclustered indexes, do all of the work, then rebuild the indexes afterwards.

The problem is that you can’t tell very easily if an index is being used or not. You can look at sys.dm_db_index_usage_stats, but that information is very limited in that gets wiped out on index rebuilds, SQL service restarts (server reboot included), and anytime you ask it to. That’s not mentioning that it also ignores obscure usage of the index. However, the obscure usage is more of something to keep in mind than it is something to base all of your decisions on.

From my point of view, there are two answers to this problem.  There isn’t a right and wrong answer, it’s more of a step 1 and step 2.

The long and hard one is to capture the values in sys.dm_db_index_usage_stats on a normal basis (daily, before index maintenance) and give yourself long term statistics that you can determine if an index’s usage justifies its cost. Some indexes are updated 1,000,000 times for two uses. You obviously don’t need to keep this index around all the time, but you’re still paranoid about what those uses were. Were they ad-hoc queries run by a developer, or were they year-end reports run by the CEO? You can get some idea if you look at your history and check what day it was run. I would want to make a rollback script, drop the index, and make a searchable document of indexes I dropped in case anyone complained about year-end reports not working.

That was the long answer, which is right and is not replaced by the short answer. The script below is the short answer. If all of the key fields in an index match, or at least the first three in a larger index, then there’s a good chance that one of the indexes isn’t required or they could be consolidated.

For instance, if you have two indexes that have the same two key fields in the same order but the second index also includes two columns, then you’ll probably see usage on both indexes (see the long-hard solution above). If one index isn’t in use in this situation, your job is easy. If both are used, then look into dropping the index that doesn’t include any columns. Why? Because a slightly larger index rarely increases the overhead too much, and the queries that used the first index aren’t usually impacted too much by using a slightly larger index that has the same foundation.

This area gets shakey on how you should handle it, and is really one of the few areas in SQL Server I consider more of an art. For instance, the script below will show you indexes that have 5 key columns where the first three match. You look and find that both indexes are being used, but users are complaining that updates are a bit slow, too. I’d lean towards condensing by taking the the last two key columns of one index and including them on the other, but which index gets dropped and how do you test it?

Do my uses of “look into”, “rarely”, “usually”, and “too much” seem uncertain to you? Welcome to indexing.

Here are the scripts I use for both the short and long answers mentioned.  The first one is for the short answer, giving you indexes that are for the most part duplicated.

DECLARE @KeyFieldMatches INT

SET @KeyFieldMatches = 3 --Number of key fields to match in order

IF object_id('tempdb..#IndexList') IS NOT NULL BEGIN
    DROP TABLE #IndexList
END

IF object_id('tempdb..#IndexListShort') IS NOT NULL BEGIN
    DROP TABLE #IndexListShort
END

CREATE TABLE #IndexList (
      object_id    bigint not null
    , table_name varchar(150) not null
    , index_id int not null
    , index_name varchar(150) not null
    , index_column int not null
    , included bit not null
    , column_name varchar(150) not null
    , index_type int not null
)

CREATE CLUSTERED INDEX IndexList_Clu ON #IndexList (object_id, index_id, index_column, included)

CREATE TABLE #IndexListShort (
      object_id bigint not null
    , table_name varchar(150) not null
    , index_id int not null
    , index_name varchar(150) not null
    , column_names_first_n varchar(450) not null
    , column_names_key varchar(4000) not null
    , column_names_included varchar(4000) not null
)

DECLARE @object_id bigint
DECLARE @index_id int
DECLARE @List varchar(4000)
DECLARE @ListIncl varchar(4000)
DECLARE @ListShort varchar(450)

INSERT INTO #IndexList
SELECT o.object_id
    , table_name = o.name
    , i.index_id
    , index_name = i.name
    , index_column = ic.index_column_id
    , included = ic.is_included_column
    , column_name = c.name
    , index_type = i.type
FROM sys.objects o
    INNER JOIN sys.indexes i ON o.object_id = i.object_id
    INNER JOIN sys.index_columns ic ON i.object_id = ic.object_id AND i.index_id = ic.index_id
    INNER JOIN sys.columns c ON ic.object_id = c.object_id AND ic.column_id = c.column_id

SET @object_id = (SELECT TOP 1 object_id FROM #IndexList)
SET @index_id = (SELECT TOP 1 index_id FROM #IndexList WHERE object_id = @object_id)

WHILE @object_id IS NOT NULL BEGIN
    SET @List = ''
    SET @ListIncl = ''
    SET @ListShort = ''

    SELECT @List = @List + COALESCE(i.column_name + ', ', '') FROM #IndexList i WHERE i.object_id = @object_id and i.index_id = @index_id and i.included = 0 ORDER BY i.index_column
    SELECT @List = substring(@List, 0, Len(@List))

    SELECT @ListIncl = @ListIncl + COALESCE(i.column_name + ', ', '') FROM #IndexList i WHERE i.object_id = @object_id and i.index_id = @index_id and i.included = 1 ORDER BY i.index_column
    SELECT @ListIncl = substring(@ListIncl, 0, Len(@ListIncl))

    SELECT @ListShort = @ListShort + COALESCE(i.column_name + ', ', '') FROM #IndexList i WHERE i.object_id = @object_id and i.index_id = @index_id and i.included = 0 and index_column <= @KeyFieldMatches ORDER BY i.index_column
    SELECT @ListShort = substring(@ListShort, 0, Len(@ListShort))

    INSERT INTO #IndexListShort 
    SELECT TOP 1 i.object_id
        , i.table_name
        , i.index_id
        , i.index_name
        , @ListShort
        , @List
        , CASE i.index_type --Clustered indexes include everything
            WHEN 1 THEN '*'
            ELSE @ListIncl
            END
    FROM #IndexList i
    WHERE i.object_id = @object_id and i.index_id = @index_id

    DELETE #IndexList WHERE object_id = @object_id and index_id = @index_id

    SET @object_id = (SELECT TOP 1 object_id FROM #IndexList)
    SET @index_id = (SELECT TOP 1 index_id FROM #IndexList WHERE object_id = @object_id)
END

SELECT table_name, index_name, column_names_key, column_names_included
FROM #IndexListShort i
WHERE EXISTS (SELECT * FROM #IndexListShort i2 WHERE i.object_id = i2.object_id AND i.column_names_first_n = i2.column_names_first_n GROUP BY object_id, column_names_first_n HAVING Count(*) > 1)
ORDER BY table_name, column_names_key

DROP TABLE #IndexList
DROP TABLE #IndexListShort

The long answer takes a bit more work to figure out. You need to store the information long-term, keep a snapshot of what it looked like most recently, and do all the calculations to make sure you have the right numbers. It’s really no different than what I do for other DMVs, so you’ll see this process duplicated a lot if you follow my work. The tables for storing this information:

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE TABLE [dbo].[DMVIndex_Usage_Stats](
    [StartTime] [smalldatetime] NOT NULL,
    [EndTime] [smalldatetime] NOT NULL,
    [database_id] [smallint] NOT NULL,
    [object_id] [int] NOT NULL,
    [index_id] [int] NOT NULL,
    [user_seeks] [bigint] NOT NULL,
    [user_scans] [bigint] NOT NULL,
    [user_lookups] [bigint] NOT NULL,
    [user_updates] [bigint] NOT NULL,
    [system_seeks] [bigint] NOT NULL,
    [system_scans] [bigint] NOT NULL,
    [system_lookups] [bigint] NOT NULL,
    [system_updates] [bigint] NOT NULL,
 CONSTRAINT [PK_DMVIndex_Usage_Stats] PRIMARY KEY CLUSTERED 
(
    [StartTime] ASC,
    [database_id] ASC,
    [object_id] ASC,
    [index_id] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON, FILLFACTOR = 90) ON [PRIMARY]
) ON [PRIMARY]
GO

CREATE TABLE [dbo].[DMVIndex_Usage_Stats_Temp](
    [TS] [smalldatetime] NOT NULL,
    [database_id] [smallint] NOT NULL,
    [object_id] [int] NOT NULL,
    [index_id] [int] NOT NULL,
    [user_seeks] [bigint] NOT NULL,
    [user_scans] [bigint] NOT NULL,
    [user_lookups] [bigint] NOT NULL,
    [user_updates] [bigint] NOT NULL,
    [system_seeks] [bigint] NOT NULL,
    [system_scans] [bigint] NOT NULL,
    [system_lookups] [bigint] NOT NULL,
    [system_updates] [bigint] NOT NULL,
 CONSTRAINT [PK_DMVIndex_Usage_Stats_Temp] PRIMARY KEY CLUSTERED 
(
    [TS] ASC,
    [database_id] ASC,
    [object_id] ASC,
    [index_id] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON, FILLFACTOR = 90) ON [PRIMARY]
) ON [PRIMARY]

GO

And now the script that captures the information and does all of your calculations. You may have noticed that the tables themselves do NOT store the index or table names which was intentional. Name changes happen, tables and indexes get dropped and created, all of that is extra work for little benefit. Is the benefit worth it? I thought it wasn’t. I’m not expecting everyone to agree with me here because there are arguments going the other way. My advice is STEAL MY WORK!!! Don’t copy it, you need to steal it. Make it your own, learn it well enough that you can improve it and build upon it. At that point you’ll truly understand every bit of it and hopefully start up your own blog. I hear EvenSimplerSQLServer.com is available, but that’s subject to change. If you don’t want to go that far, then write your ideas in the comment box below. I change these posts from time to time, and this script was actually added months after the original post was made. Just don’t ask me why it wasn’t here in the first place, my answer is just….DOH!!!! Even saying I was only 6 days into blogging at the time I wrote the original doesn’t make that acceptable.

DECLARE @TS as SmallDateTime
DECLARE @Old_TS as SmallDateTime
DECLARE @Reboot_TS as SmallDateTime

SET @TS = GetDate()
SET @Reboot_TS = (SELECT create_date FROM sys.databases WHERE [name] = 'tempdb')

--If update wasn't already done this minute
  -- Delete all but the newest two updates - Only the newest will be used, so one spare is kept
  -- Add the current stats
  -- Calculate the usage between the last two and save in the permanent table
    -- If any of the values go down then the DMV reset for that index
      --It's a horrible join clause, but it makes everything null and uses the isnull(x,0)
      --AND N.user_seeks >= O.user_seeks will be hit if database is refreshed, detached, etc.
IF 0 = (SELECT Count(*) FROM Perf..DMVIndex_Usage_Stats_Temp WHERE TS = @TS) BEGIN
    DELETE Perf..DMVIndex_Usage_Stats_Temp 
    WHERE TS NOT IN 
        (SELECT DISTINCT TOP 2 TS 
         FROM Perf..DMVIndex_Usage_Stats_Temp 
         ORDER BY TS DESC)

    SET @Old_TS = --Newest TS in the temp table
        (SELECT TOP 1 TS
        FROM Perf..DMVIndex_Usage_Stats_Temp
        ORDER BY TS DESC)
        
    INSERT INTO Perf.dbo.DMVIndex_Usage_Stats_Temp
    SELECT @TS
        , database_id
        , object_id
        , index_id
        , user_seeks
        , user_scans
        , user_lookups
        , user_updates
        , system_seeks
        , system_scans
        , system_lookups
        , system_updates
    FROM sys.dm_db_index_usage_stats
    
    INSERT INTO Perf..DMVIndex_Usage_Stats
    SELECT isnull(@Old_TS, @Reboot_TS)
        , @TS
        , N.database_id
        , N.object_id
        , N.index_id
        , N.user_seeks - isnull(O.user_seeks, 0)
        , N.user_scans - isnull(O.user_scans, 0)
        , N.user_lookups - isnull(O.user_lookups, 0)
        , N.user_updates - isnull(O.user_updates, 0)
        , N.system_seeks - isnull(O.system_seeks, 0)
        , N.system_scans - isnull(O.system_scans, 0)
        , N.system_lookups - isnull(O.system_lookups, 0)
        , N.system_updates - isnull(O.system_updates, 0)
    FROM Perf..DMVIndex_Usage_Stats_Temp N --N = New, O = Old
        LEFT JOIN Perf..DMVIndex_Usage_Stats_Temp O ON O.TS = @Old_TS
            AND O.database_id = N.database_id
            AND O.object_id = N.object_id
            AND O.index_id = N.index_id
            AND N.user_seeks >= O.user_seeks
            AND N.user_scans >= O.user_scans
            AND N.user_lookups >= O.user_lookups
            AND N.user_updates >= O.user_updates
            AND N.system_seeks >= O.system_seeks
            AND N.system_scans >= O.system_scans
            AND N.system_lookups >= O.system_lookups
            AND N.system_updates >= O.system_updates
            AND @Reboot_TS < O.TS --They're reset on reboot, so don't compare if it was rebooted since last reading
    WHERE N.TS = @TS
END

6 thoughts on “Indexes – Unused and Duplicates

  1. I have to apologize here. I wrote this post a long time ago (3 months) and completely missed that I only had half of my scripts on here. This now has the scripts to keep track of sys.dm_db_index_usage_stage long term. It’s not that these scripts are mind boggling difficult to write that I’m putting them up here, it’s that people don’t (I didn’t) realize how easy these are to learn and track. Also, I’m making these posts because it’s difficult to come to the realization of how useful this can be on your own.

    Please, if you see anything that seems off to you in any of my work, I enjoy criticism a lot more than I enjoy finding errors months after people were trying to use my work to help their career.

    Thanks,
    Steve

  2. Hi Steve, I take it this is only for 2012, sys.dm_db_index_usage_stats,don’t get wiped out on index rebuilds on SQL pre 2012.

    • I know there are ways that this info can get wiped out in all versions, especially with a restart of services. I’m almost certain that they get wiped out in all versions on a rebuild but not a reorg, but I could be wrong.

      This has been running great in 2008 and 2008 R2 for a while now. I’d have to check to see if it has been run in 2005. Unfortunately, it has not been tested in 2012 yet, although I’d like to get that done for most of my scripts soon.

  3. Pingback: Indexing Strategy | Simple SQL Server

  4. Pingback: Query the Buffer Pool | Simple SQL Server

Questions are some of the sincerest compliments