Indexing Fundamentals

November 20, 2013 by Steve Hood

First, lets understand what the types of indexes are. If you take a reference book, you have the two types of indexes. The book itself is a clustered index, it contains all the data and has the key field of page number. The index in the back is your nonclustered index, it’s in order by the key field of subject, then has the implied second key field in there for page number, which nonclustered indexes in SQL Server have as well. On top of that, the table of contents at the beginning of the book is just like a b-tree built on every index that helps you get to the right page faster.

The key fields I mentioned are the fields everything is ordered by, just as you would expect a book to be ordered by page number. Nonclustered indexes also have implied key fields, which are all of the clustered index’s key fields that weren’t explicitly made key fields already, which is a great reason to keep your clustered indexes as narrow as possible in most cases, which means they have fewer columns.

If you have a phone book then you have a clustered index with the key fields last_name, first_name, address, phone_number. That means it’s ordered by each one, so any duplicate last names are still in order by first name, and so on. It’s wide for a clustered index, but doesn’t have any nonclustered indexes to worry about, so wide is fine here.

However, the important thing to notice is that if I asked you for people with the last name “Hood”, you’d find them rather quickly. If I asked you whose phone number is 412-555-6789, you’d cringe. The same thing happens in SQL Server where it will do the work you ask of it, but it won’t be cheap, quick and easy. This is why the internet has reverse phone number lookups where they have a nonclustered index on phone number which lets you find the name quickly by just knowing the number.

Indexes can also have included columns which are stored in the index, but no sorting is done on them. These fields don’t even exist in the b-tree for finding the information, only the page level where everything is stored. To simplify the process, this is where you put fields that you have in your SELECT clause but not your WHERE clause since don’t need it sorted to search quicker, you just need it to fulfill your query’s needs.

How are they used, and what do they cost?

A nonclustered index is used exactly the same way you read a reference book. You look up the key field in the back of the book and see what information is there. If you covered all the information that you needed just by looking at that index then you’re done. However, if you want to know more that wasn’t listed in the index then you take the key field of the clustered index (the page number) and do a key lookup by flipping to those pages and reading more about the subject at hand.

If you noticed, reading all that information in the back of the book seems pretty simple and efficient, especially when all of the information is there so you don’t have to touch the book itself. If you don’t have to touch the book then that’s called a covering index, which can be accomplished by adding the fields you want to retrieve to an index as either included or key columns. Depending on how many of these key lookups you have to do and how much extra information it has to get from the clustered index, it may make sense to make your index a covering index.

However, none of this is free. Every index you make has to be maintained. If you ignore filtered indexes, which I’ll get to soon, every time you insert or delete rows on a table then you have to write to each index. So if you have 6 nonclustered indexes, it doesn’t matter if you have a clustered index or a heap (merely a lack of a clustered index), you have to do one write to the table itself and one write to each nonclustered indexes, so 7 writes for that one row.

Add on top of that maintaining those indexes for fragmentation, extra storage needed on disks and backups, and more data that can possibly be held in the cache, and this stuff really starts to get expensive.

How many indexes should I have?

There’s no magic number of indexes you should have on any single table. Too few makes it hard to find the data, and too many makes a maintenance nightmare with data modifications taking longer. What indexes and how many will vary wildly, but there are methods to figure out what’s appropriate for each table. Chances are, many of your tables will benefit from both dropping indexes and creating them.

Consolidate what you have

First, look at the indexes themselves. If you have indexes that have the same first key field as the clustered index then you can consider removing those indexes with little to no detrimental effects. This is regardless as to if these indexes are being used or not.

Now, look at what indexes are never used. This can be done by looking at sys.dm_db_index_usage_stats to get you up to a month of data (assuming a monthly reboot for patching), or, my preferred method, by querying your tables you have set up that monitor this DMV. If an index is never being used it means it’s never helping you, but it is extra weight that’s adding up and slowing your database down. Just make sure you have enough data to say that the index really isn’t needed before relying on this alone.

Also, look for patterns in the indexes themselves by comparing the first couple key fields. If the couple key fields are the same, these are candidates for having the indexes combined into a single index. Depending on how quickly those key fields make the data semi-unique, you may be talking one to three key fields matching before I’m itching to reduce the number of indexes. You can do this by taking the columns that are in the second index that aren’t in the first and adding them to the first index as included columns, then drop the second index. Queries that used to hit either one of these indexes should run slower by an unnoticeable amount, yet you’re only storing data, doing updates, and maintaining a single index instead of two.

When you’re going through this process you just have to be careful that you understand the implementations of dropping a unique index. It is rarely a good idea to get rid of a unique index unless that uniqueness is enforced by other means as well.

Take this example. I have a heap with a unique index on Col1 that gets scanned, but no seeks. Also, there’s a NC index on Col2 that has a lot of seeks then is obviously doing a lot of lookups to get more information from the heap. In this case I happen to know that the “unique” part of the index on Col1 is because that’s an identity column, so I’m not too concerned about maintaining the unique constraint here. After seeing the usage stats over a month I’m able to restructure this table. Both nonclustered indexes were dropped, a new unique clustered index was created on Col4 and Col1. User Lookups were eliminated since it’s all just a clustered index now, we saved about 700 MB on disk, and the compressed equivalent on backups as well.

Take another example where you can see that an index is duplicated. No, none of these are 100% the same, but the second index is the unique (is_unique field), clustered (* for included columns, it’s just my way of doing it) has a single key column on Col1, which is the same as the first nonclustered index. Seeing the usage like this, I’m going to drop the first index, saving 435 MB of space in my database. This will have a negative impact in that it will move 626 index scans done throughout the month to an index that’s twice the size, and that’s a problem I’ll revisit later when I’m tuning queries that have high physical reads.

Notice you can also see two issues on this table that I’ll need to look into. First, I have a nonclustered index that’s larger than my clustered index, something is wrong here. Perhaps I should have put the fill factor on here to show that wasn’t the issue. The real answer is that I’m working on a table that somehow got missed on the index maintenance, and I now know that this is possible and needs to be fixed. Another issue is that there is a filtered index on here that has about 0.1% fewer rows than the unfiltered indexes, which means that you’re getting no benefits while alienating queries that may have used this otherwise.

And one last example. The 2nd and 3rd indexes both have a single key field on Col2 and just vary on the included columns. This is a simple task of combining the two indexes by making one of them include Col1, Col2, Col6, Col7, Col8, and Col9, then dropping the other. That saves us about 200 MB on disk, and avoids us having duplicated data in the cache. Ask a SAN or backup admin and they’ll tell you the disk savings is great, but ask an unrealistically informed user and you’ll realize that cache savings is what makes a difference dropping your physical reads, upping your PLE, and making your app runs faster.

Then, there’s another simple task on this table of dropping the final index, which is nothing more than 112 MB of dead weight that was being updated an average of 30 times a day. Not much to say here, they obviously didn’t use this as much as they thought they would.

Finally, you’re left with the info that User_Lookups is almost as high as the total number of seeks and scans on the nonclustered indexes. This means that almost every query that hits the nonclustered indexes need to do a key lookup on the clustered index. Knowing that, if I was going to get to that level of tuning, I would be looking into the queries that hit this table and determine if I want to add more included columns to actually cover the query or admit that I’ll never cover it efficiently and remove some included columns. My answer here will vary partially depending on how many records are involved – key lookups are ok for 10 rows, but not so much for 1,000,000 rows.

None of these are hypothetical situations I forced AdventureWorks to do. This is a real, production database, and I wrote two queries against it. One to help me find these problems so I can put a change request in to fix them, and a second query that does the exact same thing while obfuscating the names to protect my employment. So when you see the duplicated indexes or the sizes I’m working with, you can’t say that would never happen in production….it just did.

You reduced the load, now add to it

Now that you have fewer indexes in your database, it’s time to add to them. There are several ways to figure out what indexes you want to add, and I prefer to do everything manually. That means there’s no DTA (Database Tuning Advisor) here, it’s all scripts with proof that what I’m doing is making a difference.

Where are you finding the queries to tune

There are a couple approaches to finding the queries in most need of attention. Query Stats will give you the queries that take the most resources during your peak times, which will reduce the load on your server the quickest. Traces on duration will give you the queries that take the longest along with sample calls with all the parameters you need. Blocking monitors will give you the queries other queries are waiting on the most. Then, one of my favorites, job shadowing gives you what the pain points are for the users.

The first thing I look at is blocking, since that silently sits in the background making queries that could run extremely fast sit there and wait for a single slow query. If there is a lot of consistent blocking then take the head blockers and start tuning them. This won’t eliminate the fact that they’re grabbing locks, but it will cause those locks to be released quicker. Locks by themselves are not an issue and shouldn’t be eliminated, but the impact the locks have on other queries are an issue that is best to be handled this way. If you don’t have many locks that go too long, you can skip this and revisit it later. I say “too long” instead of giving you a number because it varies a lot; 5 seconds is too long for a web app while a minute isn’t much for a data warehouse.

Once that is out of the way, I’ll start diving into the most expensive queries on the server. They may not be the ones with the longest average run-time, but they add up to the most on the server. If you can cut the reads from 100,000 to 1,000 on a proc called every second then you’re doing more for the server than you would by tuning an hourly query that does 10,000,000 reads. Also, focusing on physical reads will help raise your PLE if that’s a concern on your server, which it usually is. Don’t stick on this for too long, you’re just trying to get the easy tuning out of the way and hopefully change the look of your Top-10 most expensive queries list a bit.

Now I move on to looking at the traces to see if any huge queries are running that are easy tuning opportunities. Again, skim through it, pick up the low hanging fruit and move on. Right now you’re still in the phase of reducing the load on the server, but you may not have hit what the users see yet.

Another place to look is in the cache to find out what indexes are taking up the most space there. If you know your data well enough then you should know what to expect from your cache. What you see there will probably surprise you, and you’ll probably see an Index_id of 0 or 1 (Heap or Clustered Index) sitting there taking 10 times what you would expect it to do. These are cases where there are clustered index scans taking place that you need to find. I was recently able to clear up 12 GB of cache on a server doing a full index scan on a 12 GB table where a 10 MB filtered index was able to do the job, which made PLE almost double.

So, now that you have the server running a little (or a lot) better in general, I get into the work that never ends. Get a hold of the people using the app, either by yourself or asking customer service to arrange it, and find out what they hate the most about the app’s performance. What you need here is for someone to write a list of when anything took a while, what screen they were on, and approximately how long it took. Don’t go requesting anyone to use stopwatches, you’re looking for “a couple seconds”, “about a minute”, or “I click OK then go get a cup of coffee.” Your goal here is to be able to correlate their pain points to something captured in your traces, giving you the procs they’re using along with real-life example calls to dive into.

Something you may find odd is that I’m diving into the part that affects the end users the most last, but I do it on purpose. The other pieces are things I skim through and do what I can in a week or so, but they’ll make the bigger impact on the server in general along with resolving some of the customer’s complaints in the process. In the process, you’re learning more about the server and how everything goes together so that when you start talking to your customers you can make a more immediate impact on their complaints.

After getting through list once I let my ADD take over, and man do I have ADD. The customers have an idea of what a DBA can do for them now to the point that they may even be contacting you directly when something annoys them, developers realize a SQL Server specialist can be a good person to lean on, and your server’s biggest pain point hurts a lot less. Once you get to this point, jump around on this list on your servers and take up a query or two when you feel like it.

SQL Server Running Slow (simplesqlserver.com)
Query Stats (simplesqlserver.com)
Tracing Introduction (simplesqlserver.com)
Fixing Page Life Expectancy (PLE) (simplesqlserver.com)
Blocking – Capturing and Monitoring (simplesqlserver.com)

Indexes – Unused and Duplicates

May 16, 2013 by Steve Hood

Indexes aren’t free, and many databases end up with unused indexes. Every time you make any update to a table you will be updating the clustered index (I assume no heaps, because I hate heaps), and every index that has uses one of the columns that were updated. Inserts and Deletes affect every column and will affect every index, with exceptions for non-typical indexes. These updates cost a lot, to the point that I have several processes that disable certain nonclustered indexes, do all of the work, then rebuild the indexes afterwards.

The problem is that you can’t tell very easily if an index is being used or not. You can look at sys.dm_db_index_usage_stats, but that information is very limited in that gets wiped out on index rebuilds, SQL service restarts (server reboot included), and anytime you ask it to. That’s not mentioning that it also ignores obscure usage of the index. However, the obscure usage is more of something to keep in mind than it is something to base all of your decisions on.

From my point of view, there are two answers to this problem. There isn’t a right and wrong answer, it’s more of a step 1 and step 2.

The long and hard one is to capture the values in sys.dm_db_index_usage_stats on a normal basis (daily, before index maintenance) and give yourself long term statistics that you can determine if an index’s usage justifies its cost. Some indexes are updated 1,000,000 times for two uses. You obviously don’t need to keep this index around all the time, but you’re still paranoid about what those uses were. Were they ad-hoc queries run by a developer, or were they year-end reports run by the CEO? You can get some idea if you look at your history and check what day it was run. I would want to make a rollback script, drop the index, and make a searchable document of indexes I dropped in case anyone complained about year-end reports not working.

That was the long answer, which is right and is not replaced by the short answer. The script below is the short answer. If all of the key fields in an index match, or at least the first three in a larger index, then there’s a good chance that one of the indexes isn’t required or they could be consolidated.

For instance, if you have two indexes that have the same two key fields in the same order but the second index also includes two columns, then you’ll probably see usage on both indexes (see the long-hard solution above). If one index isn’t in use in this situation, your job is easy. If both are used, then look into dropping the index that doesn’t include any columns. Why? Because a slightly larger index rarely increases the overhead too much, and the queries that used the first index aren’t usually impacted too much by using a slightly larger index that has the same foundation.

This area gets shakey on how you should handle it, and is really one of the few areas in SQL Server I consider more of an art. For instance, the script below will show you indexes that have 5 key columns where the first three match. You look and find that both indexes are being used, but users are complaining that updates are a bit slow, too. I’d lean towards condensing by taking the the last two key columns of one index and including them on the other, but which index gets dropped and how do you test it?

Do my uses of “look into”, “rarely”, “usually”, and “too much” seem uncertain to you? Welcome to indexing.

Here are the scripts I use for both the short and long answers mentioned. The first one is for the short answer, giving you indexes that are for the most part duplicated.

DECLARE @KeyFieldMatches INT

SET @KeyFieldMatches = 3 --Number of key fields to match in order

IF object_id('tempdb..#IndexList') IS NOT NULL BEGIN
    DROP TABLE #IndexList
END

IF object_id('tempdb..#IndexListShort') IS NOT NULL BEGIN
    DROP TABLE #IndexListShort
END

CREATE TABLE #IndexList (
      object_id    bigint not null
    , table_name varchar(150) not null
    , index_id int not null
    , index_name varchar(150) not null
    , index_column int not null
    , included bit not null
    , column_name varchar(150) not null
    , index_type int not null
)

CREATE CLUSTERED INDEX IndexList_Clu ON #IndexList (object_id, index_id, index_column, included)

CREATE TABLE #IndexListShort (
      object_id bigint not null
    , table_name varchar(150) not null
    , index_id int not null
    , index_name varchar(150) not null
    , column_names_first_n varchar(450) not null
    , column_names_key varchar(4000) not null
    , column_names_included varchar(4000) not null
)

DECLARE @object_id bigint
DECLARE @index_id int
DECLARE @List varchar(4000)
DECLARE @ListIncl varchar(4000)
DECLARE @ListShort varchar(450)

INSERT INTO #IndexList
SELECT o.object_id
    , table_name = o.name
    , i.index_id
    , index_name = i.name
    , index_column = ic.index_column_id
    , included = ic.is_included_column
    , column_name = c.name
    , index_type = i.type
FROM sys.objects o
    INNER JOIN sys.indexes i ON o.object_id = i.object_id
    INNER JOIN sys.index_columns ic ON i.object_id = ic.object_id AND i.index_id = ic.index_id
    INNER JOIN sys.columns c ON ic.object_id = c.object_id AND ic.column_id = c.column_id

SET @object_id = (SELECT TOP 1 object_id FROM #IndexList)
SET @index_id = (SELECT TOP 1 index_id FROM #IndexList WHERE object_id = @object_id)

WHILE @object_id IS NOT NULL BEGIN
    SET @List = ''
    SET @ListIncl = ''
    SET @ListShort = ''

    SELECT @List = @List + COALESCE(i.column_name + ', ', '') FROM #IndexList i WHERE i.object_id = @object_id and i.index_id = @index_id and i.included = 0 ORDER BY i.index_column
    SELECT @List = substring(@List, 0, Len(@List))

    SELECT @ListIncl = @ListIncl + COALESCE(i.column_name + ', ', '') FROM #IndexList i WHERE i.object_id = @object_id and i.index_id = @index_id and i.included = 1 ORDER BY i.index_column
    SELECT @ListIncl = substring(@ListIncl, 0, Len(@ListIncl))

    SELECT @ListShort = @ListShort + COALESCE(i.column_name + ', ', '') FROM #IndexList i WHERE i.object_id = @object_id and i.index_id = @index_id and i.included = 0 and index_column <= @KeyFieldMatches ORDER BY i.index_column
    SELECT @ListShort = substring(@ListShort, 0, Len(@ListShort))

    INSERT INTO #IndexListShort 
    SELECT TOP 1 i.object_id
        , i.table_name
        , i.index_id
        , i.index_name
        , @ListShort
        , @List
        , CASE i.index_type --Clustered indexes include everything
            WHEN 1 THEN '*'
            ELSE @ListIncl
            END
    FROM #IndexList i
    WHERE i.object_id = @object_id and i.index_id = @index_id

    DELETE #IndexList WHERE object_id = @object_id and index_id = @index_id

    SET @object_id = (SELECT TOP 1 object_id FROM #IndexList)
    SET @index_id = (SELECT TOP 1 index_id FROM #IndexList WHERE object_id = @object_id)
END

SELECT table_name, index_name, column_names_key, column_names_included
FROM #IndexListShort i
WHERE EXISTS (SELECT * FROM #IndexListShort i2 WHERE i.object_id = i2.object_id AND i.column_names_first_n = i2.column_names_first_n GROUP BY object_id, column_names_first_n HAVING Count(*) > 1)
ORDER BY table_name, column_names_key

DROP TABLE #IndexList
DROP TABLE #IndexListShort

The long answer takes a bit more work to figure out. You need to store the information long-term, keep a snapshot of what it looked like most recently, and do all the calculations to make sure you have the right numbers. It’s really no different than what I do for other DMVs, so you’ll see this process duplicated a lot if you follow my work. The tables for storing this information:

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE TABLE [dbo].[DMVIndex_Usage_Stats](
    [StartTime] [smalldatetime] NOT NULL,
    [EndTime] [smalldatetime] NOT NULL,
    [database_id] [smallint] NOT NULL,
    [object_id] [int] NOT NULL,
    [index_id] [int] NOT NULL,
    [user_seeks] [bigint] NOT NULL,
    [user_scans] [bigint] NOT NULL,
    [user_lookups] [bigint] NOT NULL,
    [user_updates] [bigint] NOT NULL,
    [system_seeks] [bigint] NOT NULL,
    [system_scans] [bigint] NOT NULL,
    [system_lookups] [bigint] NOT NULL,
    [system_updates] [bigint] NOT NULL,
 CONSTRAINT [PK_DMVIndex_Usage_Stats] PRIMARY KEY CLUSTERED 
(
    [StartTime] ASC,
    [database_id] ASC,
    [object_id] ASC,
    [index_id] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON, FILLFACTOR = 90) ON [PRIMARY]
) ON [PRIMARY]
GO

CREATE TABLE [dbo].[DMVIndex_Usage_Stats_Temp](
    [TS] [smalldatetime] NOT NULL,
    [database_id] [smallint] NOT NULL,
    [object_id] [int] NOT NULL,
    [index_id] [int] NOT NULL,
    [user_seeks] [bigint] NOT NULL,
    [user_scans] [bigint] NOT NULL,
    [user_lookups] [bigint] NOT NULL,
    [user_updates] [bigint] NOT NULL,
    [system_seeks] [bigint] NOT NULL,
    [system_scans] [bigint] NOT NULL,
    [system_lookups] [bigint] NOT NULL,
    [system_updates] [bigint] NOT NULL,
 CONSTRAINT [PK_DMVIndex_Usage_Stats_Temp] PRIMARY KEY CLUSTERED 
(
    [TS] ASC,
    [database_id] ASC,
    [object_id] ASC,
    [index_id] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON, FILLFACTOR = 90) ON [PRIMARY]
) ON [PRIMARY]

GO

And now the script that captures the information and does all of your calculations. You may have noticed that the tables themselves do NOT store the index or table names which was intentional. Name changes happen, tables and indexes get dropped and created, all of that is extra work for little benefit. Is the benefit worth it? I thought it wasn’t. I’m not expecting everyone to agree with me here because there are arguments going the other way. My advice is STEAL MY WORK!!! Don’t copy it, you need to steal it. Make it your own, learn it well enough that you can improve it and build upon it. At that point you’ll truly understand every bit of it and hopefully start up your own blog. I hear EvenSimplerSQLServer.com is available, but that’s subject to change. If you don’t want to go that far, then write your ideas in the comment box below. I change these posts from time to time, and this script was actually added months after the original post was made. Just don’t ask me why it wasn’t here in the first place, my answer is just….DOH!!!! Even saying I was only 6 days into blogging at the time I wrote the original doesn’t make that acceptable.

DECLARE @TS as SmallDateTime
DECLARE @Old_TS as SmallDateTime
DECLARE @Reboot_TS as SmallDateTime

SET @TS = GetDate()
SET @Reboot_TS = (SELECT create_date FROM sys.databases WHERE [name] = 'tempdb')

--If update wasn't already done this minute
  -- Delete all but the newest two updates - Only the newest will be used, so one spare is kept
  -- Add the current stats
  -- Calculate the usage between the last two and save in the permanent table
    -- If any of the values go down then the DMV reset for that index
      --It's a horrible join clause, but it makes everything null and uses the isnull(x,0)
      --AND N.user_seeks >= O.user_seeks will be hit if database is refreshed, detached, etc.
IF 0 = (SELECT Count(*) FROM Perf..DMVIndex_Usage_Stats_Temp WHERE TS = @TS) BEGIN
    DELETE Perf..DMVIndex_Usage_Stats_Temp 
    WHERE TS NOT IN 
        (SELECT DISTINCT TOP 2 TS 
         FROM Perf..DMVIndex_Usage_Stats_Temp 
         ORDER BY TS DESC)

    SET @Old_TS = --Newest TS in the temp table
        (SELECT TOP 1 TS
        FROM Perf..DMVIndex_Usage_Stats_Temp
        ORDER BY TS DESC)
        
    INSERT INTO Perf.dbo.DMVIndex_Usage_Stats_Temp
    SELECT @TS
        , database_id
        , object_id
        , index_id
        , user_seeks
        , user_scans
        , user_lookups
        , user_updates
        , system_seeks
        , system_scans
        , system_lookups
        , system_updates
    FROM sys.dm_db_index_usage_stats
    
    INSERT INTO Perf..DMVIndex_Usage_Stats
    SELECT isnull(@Old_TS, @Reboot_TS)
        , @TS
        , N.database_id
        , N.object_id
        , N.index_id
        , N.user_seeks - isnull(O.user_seeks, 0)
        , N.user_scans - isnull(O.user_scans, 0)
        , N.user_lookups - isnull(O.user_lookups, 0)
        , N.user_updates - isnull(O.user_updates, 0)
        , N.system_seeks - isnull(O.system_seeks, 0)
        , N.system_scans - isnull(O.system_scans, 0)
        , N.system_lookups - isnull(O.system_lookups, 0)
        , N.system_updates - isnull(O.system_updates, 0)
    FROM Perf..DMVIndex_Usage_Stats_Temp N --N = New, O = Old
        LEFT JOIN Perf..DMVIndex_Usage_Stats_Temp O ON O.TS = @Old_TS
            AND O.database_id = N.database_id
            AND O.object_id = N.object_id
            AND O.index_id = N.index_id
            AND N.user_seeks >= O.user_seeks
            AND N.user_scans >= O.user_scans
            AND N.user_lookups >= O.user_lookups
            AND N.user_updates >= O.user_updates
            AND N.system_seeks >= O.system_seeks
            AND N.system_scans >= O.system_scans
            AND N.system_lookups >= O.system_lookups
            AND N.system_updates >= O.system_updates
            AND @Reboot_TS < O.TS --They're reset on reboot, so don't compare if it was rebooted since last reading
    WHERE N.TS = @TS
END

Indexes – Understanding basic types and their components (simplesqlserver.com)
Fixing Page Life Expectancy (PLE) (simplesqlserver.com)

Index Columns and Names

May 14, 2013 by Steve Hood

Don’t trust the name of an index to tell you what it is. The name is wrong, or at least that’s what you have to assume.

I’m a huge fan of index naming conventions, with my favorite being starting out by saying it’s an index (IDX) followed by the table name, then the key columns, the letters INCL if there are included columns, each included column listed (if reasonable, just do all or nothing for each index), a U or N denoting if it’s unique or not, then a C or N denoting if it’s clustered or not. However, there are too many times I’ve seen an index naming convention get me in trouble where the index IDX_TableName_Key1_Key2_Key3_INCL_Incl1_Incl2_U_N actually not have the column Key1 in it, and it wasn’t unique either.

You can’t always trust the naming convention, even if it does exist. However, the system tables are always right. Here we have all of the key and included fields in an index, how much each index has been used (since restart of services or rebuilds in 2012+), and the size of each index. This typically takes less than a second to run in my experience, and you’ll get a feel for how it runs in your nonprod environment before it runs in prod, right?

DECLARE @TableName VarChar(100) = '%' --Accepts wildcards, % will give you all indexes in the database

SELECT TableName = i.SchemaName + '.' + i.TableName
	, IndexName = ISNULL(i.IndexName, '[' + Lower(i.IndexType) + ']')
	, i.User_Updates
	, i.User_Seeks
	, i.User_Scans
	, i.User_Lookups
	, KeyColumnList = substring((SELECT (', ' + c.name)
							FROM sys.index_columns ic
								INNER JOIN sys.columns c ON ic.object_id = c.object_id AND ic.column_id = c.column_id
							WHERE i.object_id = ic.object_id AND i.index_id = ic.index_id
								AND ic.is_included_column = 0
							ORDER BY ic.key_ordinal
							FOR XML PATH ('')
							), 3, 2000)
	, IncludedColumnList = CASE WHEN i.IndexType IN ('Clustered', 'Heap') THEN '*' ELSE
							substring((SELECT (', ' + c.name)
							FROM sys.index_columns ic
								INNER JOIN sys.columns c ON ic.object_id = c.object_id AND ic.column_id = c.column_id
							WHERE i.object_id = ic.object_id AND i.index_id = ic.index_id
								AND ic.is_included_column = 1
							ORDER BY c.Name
							FOR XML PATH ('')
							), 3, 2000)
						END
	, i.filter_definition
	, i.is_unique
	, i.Used_MB
	, i.Reserved_MB
	, Free_MB = i.Reserved_MB - i.Used_MB
	, i.Used_InRow_MB
	, i.Used_LOB_MB 
	, i.Row_Count
	, i.index_id
--	, DropStatement = 'DROP INDEX [' + i.IndexName + '] ON [' + i.SchemaName + '].[' + i.TableName + ']'
FROM 
	(
	SELECT SchemaName = s.name
		, TableName = t.name
		, IndexName = i.name
		, IndexType = i.type_desc 
		, i.object_id  
		, i.index_id
		, i.filter_definition
		, i.is_unique
		, User_Updates = SUM(ius.User_Updates)
		, User_Seeks = SUM(ius.user_seeks)
		, User_Scans = SUM(ius.user_scans)
		, User_Lookups = SUM(ius.user_lookups)
		, i.Used_MB 
		, i.Reserved_MB 
		, i.Used_InRow_MB
		, i.Used_LOB_MB 
		, i.row_count
	FROM (SELECT i.name
				, i.type_desc
				, i.object_id
				, i.index_id
				, i.is_unique
				, i.filter_definition
				, Used_MB = SUM(PS.used_page_count) / 128
				, Reserved_MB = SUM(PS.reserved_page_count) / 128 
				, Used_InRow_MB = SUM(PS.in_row_used_page_count) / 128
				, Used_LOB_MB = SUM(PS.lob_used_page_count) / 128
				, row_count = SUM(PS.row_count)
			FROM sys.indexes i 
				LEFT JOIN sys.dm_db_partition_stats PS ON i.object_id = PS.object_id AND i.index_id = PS.index_id
			GROUP BY i.name, i.type_desc, i.object_id, i.index_id, i.is_unique, i.filter_definition) i
		INNER JOIN sys.all_objects t ON i.object_id = t.object_id
		INNER JOIN sys.schemas s ON t.schema_id = s.schema_id
		LEFT JOIN sys.dm_db_Index_Usage_Stats ius ON ius.object_id = i.object_id 
											AND ius.index_id = i.index_id 
											AND ius.database_id = DB_ID()
	WHERE t.name LIKE @TableName  	
	GROUP BY s.name, t.name, i.name, i.object_id, i.index_id, i.is_unique, i.type_desc, i.filter_definition, i.Used_MB, i.Reserved_MB, i.row_count, Used_InRow_MB, Used_LOB_MB
	) i
ORDER BY 1, KeyColumnList

You may say that’s a temporary one-off inqury the doesn’t fix anything, and it is. However, the permanent fix is very invasive, will void your support contracts, can cause damage, would cause pieces of future upgrades to fail, may not work if it generates a name that’s too long, and other minor details. Assuming you have a home-grown database and absolutely no query hints specifying an index anywhere in your code, have a dev environment recently refreshed from prod, and have looked into every other issue that I never even considered, do I have some code for you!!!

SELECT Command = '--DON''T RUN THIS WITHOUT FIRST LOOKING INTO THE CONSEQUENCES AND UPDATING YOUR RESUME'

UNION 
SELECT Command = 'EXEC sp_rename ''' + TableName + '.' + IndexName + ''', ''' + 'IX_' + TableName + ColumnList + case when len(IncludeList) > 3 then '_INCL' else '' end + ISNULL(IncludeList, '') + '_' + case is_unique when 1 then 'U' else 'N' end + '_' + left(IndexType COLLATE SQL_Latin1_General_CP1_CS_AS, 1) + ''' , ''INDEX'''
FROM (
        SELECT TableName = t.name
            , IndexName = i.name
            , i.is_unique
            , IndexType = i.type_desc
            , FileGroupName = d.name
            , ColumnList = substring((SELECT ('_' + c.name)
                                    FROM sys.index_columns ic
                                        INNER JOIN sys.columns c ON ic.object_id = c.object_id AND ic.column_id = c.column_id
                                    WHERE i.object_id = ic.object_id AND i.index_id = ic.index_id
                                        AND ic.is_included_column = 0
                                    ORDER BY ic.key_ordinal
                                    FOR XML PATH ('')
                                    ), 1, 2000)
            , IncludeList = substring((SELECT ('_' + c.name)
                                    FROM sys.index_columns ic
                                        INNER JOIN sys.columns c ON ic.object_id = c.object_id AND ic.column_id = c.column_id
                                    WHERE i.object_id = ic.object_id AND i.index_id = ic.index_id
                                        AND ic.is_included_column = 1
                                    ORDER BY ic.key_ordinal
                                    FOR XML PATH ('')
                                    ), 1, 2000)
        FROM sys.tables t
            INNER JOIN sys.indexes i ON t.object_id = i.object_id
            INNER JOIN sys.data_spaces d ON i.data_space_id = d.data_space_id
        WHERE t.name IN (
                  'Table_1'
                , 'Table_2')
    )x
ORDER BY 1

Simple SQL Server

A basic approach to SQL Server administration

Tag Archives: sys.index_columns

Indexing Fundamentals

How are they used, and what do they cost?

How many indexes should I have?

Consolidate what you have

You reduced the load, now add to it

Where are you finding the queries to tune

Indexes – Unused and Duplicates

Index Columns and Names

How are they used, and what do they cost?

How many indexes should I have?

Consolidate what you have

You reduced the load, now add to it

Where are you finding the queries to tune

Related articles

Rate this:

Related articles

Rate this:

Rate this: