SQL Saturday Cleveland #SQLSat473

On February 6, 2016, Cleveland is hosting a free training event for SQL Server.  This has a lot of the great stuff from the big, paid events, and skips some of the negatives.

There’s a great team from the North Ohio SQL Server Users Group that took on a very difficult task to put all of this together.  They hand selected every presentation, being forced to turn away over half of the abstracts and many speakers.  What they came up with was a variety of topics that span the breadth of SQL Server and the experience levels of the professionals working with databases.

What is this SQL Saturday?

For those of you who have never been to a SQL Saturday, here’s how this one works.  Other SQL Saturdays will vary from this slightly, but not too much.

There are 6 training rooms with hour-long presentations running simultaneously in each.  You move from room to room between each session choosing the ones that apply to you the most.  The schedule for the sessions is posted in advance, and it’s impressive.

An optional $10 lunch, which is typically very obvious that no one is making a profit off of, is the closest thing the event has to an entrance fee. The stuff you’d expect to cost you is provided by volunteers and sponsors who pull together the venue, presenters, and community for you.

There’s time to network with vendors, attendees, and speakers before the event, during breaks and lunch, and often end up in conversations that last throughout the next session.  This differs from the big events in that it’s much smaller and personable with most attendees being relatively local.  The people you meet at a regional event like this are going to be more relevant to your life because they’re part of your local community.

The event at Hyland Software concludes with prizes ranging from $100 gift cards to new laptops being handed out to random attendees.  Yes, you can basically get paid to go to free training, which I still don’t quite understand.

Then there’s the after-party, also provided by the SQL Saturday volunteers and sponsors.  There’s food, fun, and more time to get to know some great people.

The venue can’t hold everyone who should come, and often can’t hold everyone who wants to come.  Register now to make sure there’s room for you, and make sure to cancel if your plans fall through.

What is SQL Saturday?

Not just this event, what is SQL Saturday as a whole?  It’s a program specifically set up to help everyone in the community to develop.

The obvious people benefiting are the attendees who get a chance to see all the sessions and meet the vendors.  You can see how they would benefit from here, but not quite how much until you’re at the event.

The less obvious are the speakers, some speaking publically for the first time.  As one of the speakers, I can personally say that I’ve grown more as a person and as a professional than I thought possible.  It’s a step I highly recommend, and one I’m very grateful to have with SQL Saturday.

The even less obvious are the vendors.  They’re speaking to the public, not just their established customers, getting candid feedback on what works, what doesn’t work, and how they can make their offerings better.  Even if they don’t make a single sale from attending the event, they can always come out ahead.

SQL Saturday didn’t just pop out of nowhere, and someone told me we should give thanks to the people who started it all.

Thank you:

Steve Jones (b|t)

Andy Warren (b|t)

Brian Knight (b|t)

These event are around the world with one or more occurring almost every weekend of the year as an annual event in most major cities.  See SQL Saturday’s website for a full list of what’s coming up.  If your city isn’t coming up soon, look in past events because most cities hold it about the same time every year.

Who’s speaking?

Where do I start?  Well, by listing EVERYONE on the current schedule, along with the blogs and twitter handles I’m aware of.

  • Deji Akomolafe (b|t)
  • Jim Arko (b|t)
  • Adam Belebczuk (b|t) <– Organizer
  • Chris Bell (b|t)
  • Delora Bradish (b|t)
  • Mindy Curnutt (b|t)
  • Aaron Cutshall (t)
  • Joey D’Antoni (b|t)
  • David Eldersveld (b|t)
  • Cassandra Faris (t)
  • Kevin Feasel (b|t)
  • Frank Gill (b|t)
  • Amy Herold (b|t)
  • Paul Hiles (t) <– Organizer
  • Steve Hood (b|t) <– What am I doing on a list this awesome?
  • Michael John
  • Jonathan Kehayias (b|t)
  • Dave Mattingly (t)
  • David Maxwell (b|t)
  • Evelyn Maxwell <– bright future ahead for her
  • Eugene Meidinger (b|t)
  • Kon Melamud (t)
  • Ben Miller (b|t)
  • Jeff Moden (b)
  • Colleen Morrow (b|t) <– Organizer
  • Matt Nelson (b|t)
  • Wendy Pastrick (b|t)
  • Pat Phelan (t)
  • Justin Randall (b|t)
  • Wayne Sheffield (b|t)
  • Peter Shore (b|t)
  • Warren Sifre (b|t)
  • Julie Smith (b|t)
  • Erin Stellato (b|t) <– Organizer
  • Michael J. Swart (b|t)
  • Allen White (b|t) <– Organizer
  • Andy Yun (b|t)

Note that the schedule may change some before the event, but it never changes too much.

What do we ask of you?

Show up.

Despite the amazing list of speakers, we’re all volunteers.  That means we do it because we want to do it, and truly enjoy it.  If you want to help us enjoy it even more, be involved.  Talk to us and the other attendees, ask questions, and give feedback.

If you want to absolutely blow us away, get in touch afterwards and let us know the difference it made for you.  Yes, we want this, it will absolutely make our day!  Let the organizers know what you think of the event.  Tell the presenters how you were able to use their advice at work, and even ask questions down the road.  Get in touch with the vendors to discuss their products, even if you aren’t planning a purchase, to give them real feedback and see what’s out there.  I used to think this was imposing, but now I realize this is the one of the most awesome compliments you can get.

The most important part….

Show up!

Hope to see you there!

UPDATE – 2016-01-13 – On the waiting list

If you registered before today, you’re in.  If you’re registering now, you’re on the waiting list.

If you can’t make it, please cancel your registration so someone on the waiting list can get in.  We’ll miss you, but at least there will be someone else we won’t be missing.

Indexing Strategy

What do I care about when I’m playing with indexes? That’s easy. I want as few indexes as possible efficiently referenced by as many pertinent, well-tuned, consistently written queries as is reasonable. It’s explaining that last sentence that’s the hard part.

The thing that will jump out to most people is that my goal doesn’t mention a specific, single query that you want to run great.  Although that’s something I hope to achieve, it only becomes a priority as a last resort.  I’m more concerned with already having that data in memory because the index is being used by many queries, and also having fewer indexes to make data modifications more efficient. There’s more to it than that, but the detail belongs in the body of this post, not the intro.

If I was writing how to make a perfect index for a single reference to a table in a single query, this post could be done rather well in a couple paragraphs. Even though I’m focusing only on OLTP (ruling out columnstore indexes), in 99.999% of environments (ruling out in-memory hash indexes), and not getting into details of filtered indexes or indexed views, there’s still a lot to consider to the point that the first few paragraphs will just be what we’re going to keep in mind. I hope you didn’t have other plans today…

Does this advice apply to you?

It depends! Just kidding, I hate that (non) answer.

There are two targets audiences for this.  While it’s useful to everyone, you’d have to hit both of them for this to be perfect for you. First target is the person; this is written for someone who is comfortable working with indexes for single queries but wants a better view of the big picture. Second target is the database, which is a rather typical in-house OLTP database.

  • More data than you have memory
  • Writes throughout the day, especially in your larger tables
  • Read and write performance matter more than disk space
  • No extreme write loads, such as truncating and repopulating your largest table (easy fix, disable and rebuild your indexes around this action)
  • You have the ability to make indexing changes (this goes beyond what you can typically do with vendor databases)
  • Hopefully, you have the ability to make some code changes

If you or your database aren’t the perfect audience, don’t worry about it.  The most important things to know are what to keep in mind and how those things are interconnected.  The real goal is making more intelligent decisions for your databases, not fitting into a specific mold.

PreReqs!!!

Because this is an advanced look at the fundamentals of indexing strategy, you shouldn’t take offense if you have to do some prerequisite work for all of this to come together. If I give you a severe headache consider reading this stuff first, and the book on the list is well worth a second read cover-to-cover.

PreReqs:

Even with that I’ll probably still give you a headache (a common theme for me), and you’ll probably have questions. Keep in mind that some of the greatest compliments you can give someone are asking them a question and offering them large sums of cash. Although I don’t hand out my LLC’s address to send checks to unless I’ve actually looked over your indexes, I do have a free comments section below that I’d like to see used more often.

Something to consider

Here are all the things we’ll be considering. This is a great list, but nothing is ever going to be 100% all-inclusive or perfect.

Tune your queries: The ideal set of indexes for a poorly written query may be 100% different than the same query after it’s refactored.  We don’t want to constantly tinker with indexes, so this process is best if you tune your most expensive queries first.  An old, unpublished concept for this post had creating an index as a 13 step process with steps 1-11 avoiding indexes through tuning, step 12 making an index, and step 13 seeing if you could delete any other indexes.  That really wasn’t too different from Brent Ozar’s (b|t) Be Creepy method.  Indexing is not the only answer, and not the first answer either.

Query Importance: Some queries just need to complete, some need to run decently well, and some need to run as close to instantly as possible. Knowing how critical your query is will weigh in on how you index, but none of them, not even the most critical queries, will have their performance be the only deciding factor on what indexes you should have. Some outliers will require you to go back and create a specific index for them as a last resort, but there’s no reason to concern ourselves with last resorts before we get started.  If an index would work, even somewhat inefficiently, and it’ll already be in cache, do we want to create another index that will fight it for space in cache?  Don’t create another index to get your most critical query to 0.1 seconds when it’s running at 0.2 seconds and the business is happy with anything under 1.0 second.

Query Consistency: Are you querying the table the same way in all of your queries? If not, do you know you are stressing me out? Do you not care?  Using the same columns to join where possible matters, even if you could live without one of them in some cases because consistent queries mean index reusability.

Query Frequency: Some queries run five times a second, some run once a year. They aren’t even close to the same, and shouldn’t be treated the same.

Query Timing: It is different from frequency. Once-a-day is not just once-a-day. An “8-5 is critical” shop where all your users are sleeping at 3 AM means that we care much less about collateral damage from a 3 AM query. If we do a table scan on a huge clustered index at 3 AM that kicks everything out of cache it might not matter much, but do that same thing at 3 PM and we may want to consider an index even if it’s only used for a single query.

Query Justification: That query timing example threw up a red flag to me. Do we need to run that query at all? Does it need to run in prod, or is there a reporting database I can run it against? Should I consider making a reporting database? Does it need to run at 3 PM? Question the outliers that would change your indexing strategy for both if they need to run and if they could use a little T-SQL help.

Insert / Update / Delete performance: The more indexes you have, the slower your data modifications will be because they have to be written more than once. Wider indexes will be more overhead due to updates hitting it more often, larger index to maintain, and fewer rows per page of data.

Reusability: How many queries can use this index, and how will each of them use it? Is a query using it differently because it’s not referencing the table consistently or because it’s legitimately doing something different? This part is HUGE, and is really going to be a major focus. I didn’t give you a hard time on that query consistency point asking if you cared for no reason!

Memory usage: How much memory is being used, where is it being used, and why? Is that memory being used to fulfill multiple queries (see Reusability, which references Query Consistency, which goes back to Query Tuning)? Could we use less memory if we had a descent index? Is the query that requires all that memory justified and timed properly? These points are starting to mix together and reference themselves, aren’t they? Is indexing strategy an infinite loop?!?!? Yes, yes it is.

Key Lookups: For the queries that use this index, is this a covering index? If not, what would it need to be a covering index? We’ll look at these things: how critical is each query, how often is that query run, how many key lookups does it do, how wide are the total columns we would need to add to be covering, how often are each of those columns updated, what other queries would take advantage of having those columns in there, and is there any filtering being done on those columns?

Maintenance: It’s easy to see having fewer, more narrow indexes would make index rebuilds, index reorgs, and database backups quicker and easier. How about key column order and compression?

TDE: What’s this feature doing in an indexing article?

SQL Edition: Index compression is going to be the biggest one for us today. Online rebuilds can make a big difference, too, but it rarely makes a big difference in what indexes you want.

Pick a table, any table

We’re not going to change the entire database at once.  Partially because it’s overwhelming to you, but mostly because it’s lower risk that’s easier to troubleshoot and roll back if needed. So we’ll pick a single table that we want to have run more efficiently, make a change or two to it, then do it again with either the same table or a different one.

I’m not consistent on how I pick tables.  Although I usually pick one of the ones that’s the largest in the buffer pool that I haven’t made as efficient as I could already, which you can find using the query in  my Cleaning Up the Buffer Pool post.  However, that’s not always how I pick a table.  I’ll also start this off with a query that I wish was running faster, run it with SET STATISTICS IO, TIME ON to see what tables are getting hit in the slowest part, and work on a table that’s getting hit inefficiently here.  There’s no magic to it, just pick a table and reassure the other tables that they’ll get their turn later.

It looks like an infinite loop at first glance because I’ll keep picking tables and loop back to revisit table later, but it’s better to think of it as an upward spiral. That first trip around will give you all kinds of awesome, the second trip will add on to that, and each trip around yields less of an improvement. You could stop wherever you wanted if it wasn’t addictive.

Also as the size of your data changes, the queries hitting your database change, and more, it’s unreasonable to expect your indexing strategy to stay 100% the same.  This isn’t a job where you can ever say you’re really done, just in a better place than you were last week.

I have a table, now what?

At the times of day you want your database to perform great, what’s happening with your table? This may be anything that ever runs on the database for some places, and it may be anything that runs between 8 AM and 5 PM for others.

I’m being as all inclusive as possible by looking at everything that touches the table, so this won’t be as quick and easy as you’d think. Yes, my target audience for this post can create an index for a single query in minutes while I typically spend well over an hour on a single table; how fast you make it through this project isn’t my primary concern.

Once you picked a table to work on look in the proc cache to see what references the indexes on the table.  My query to do that in the same Cleaning Up the Buffer Pool post is good for this, but the one in Querying the Plan Cache is better for viewing an entire table at once. This has every cacheable plan that ran since the last restart of services and hasn’t been forced out of memory. Remember this is a really CPU intensive query that will take several minutes to run and needs to run against prod at a time of day you care about to provide what you need. If you have an extremely busy hour or two, run this as soon as things start to calm down.

Note, there were three different ways something could have avoided you seeing it in the proc cache, and that wasn’t counting if you turned on the typically recommended setting “Optimize for ad-hoc workload” that Kimberly Tripp (b|t) wrote about where you can miss the first run of ad-hoc queries in favor of keeping your memory cleaner. It’s also possible that a query is getting a different plan each time it gets compiled due to stats changing or parameter sniffing, but that affects us a little less since we’re going to make index changes that will change those plans anyways.

The proc cache query is also only capturing index usage. It does capture RID lookups, but not table scans against a heap. You’ll have to look at the modified scripts I put in Querying the Plan Cache to see the table scans because they’re stored differently in the XML.

For now, let’s focus on how things can sneak past our cache and how we can find them.

1 & 2: Was not run since the last restart of services or was forced out of memory. It can be in cache, it’s just not there right now. For that, we’re going to check back on the cache multiple times, and we’re also going to make one or two index changes at a time which will again have us checking back multiple times.

3: Uncacheable plans can happen for several reasons, with the most popular in my experience being temp tables where data was loaded into it then an index was created on the temp table. Whatever the reason, start up extended events or a trace and watch for sql_statement_recompile to help hunt them down. Take this list and search for references of your table to know which ones are relevant. To get bonus points (I’m not keeping score), find out why it’s not getting along with your cache and see if it’s something that should be fixed.

To make things a little more difficult in that step, you’ll also have to look for references to views and functions that reference the table. The views and functions will show up in my Proc Contains Text query, and you’ll have to iterate through that process again.

Keep in mind, this will never be perfect and 100% all-inclusive. I know I said that before, but I need some of the important details repeated to me before they sink in and I have to assume there are others like me. It will be very rare for this to pick up an ad-hoc query that runs for year-end processing. You can use your imagination to find 10 other ways you’ll miss something and still be shocked when a new way pops up.

However, we have enough to move forward, and we’re going to accept the rest as acceptable risk. If you don’t accept some risk you’ll never accept any rewards, it’s just a matter of reducing the risk and knowing enough to make an intelligent decision.

Now that you know what’s running, how is each one referencing the table? Looking at the proc cache, you’ll see predicates and seek predicates, which you’ll combine on a list. You’re going to have to run the stuff that didn’t make it into the proc cache manually on a test server and add them to the list as well.

This is completely overwhelming to do all of it.  The more you do, the more accurate your results will be, but it’s not actually reasonable.  Do what’s reasonable, understand that you’re trading off some level of accuracy for time, and also understand that if you don’t make that tradeoff then you’ll never have time for anything else…not even going home at night.

Here’s what the list could use:

  • Proc or name of SQL Batch
  • How important is it
  • How often does it run
  • When does it run
  • Predicates and Seek Predicates (let’s just call them predicates)
  • Equality columns
  • Range columns
  • Inequality columns
  • Column’s returned
  • Rows returned

If there was a RID or Key Lookup on a reference to a nonclustered index, add the output columns and predicate (not the seek predicate for this case only) from the lookup on here as well.  The seek predicate on a lookup is just the clustered index key or RID bookmark added as hidden key columns on your nonclustered index; they will not help you here.

Now look at this list and look for consistencies. What equality predicates are used a lot? You should be able to find different groups of equality predicates that can accommodate most of your queries, and those are going to be the key columns you’ll consider for your indexes. The first key column is going to be the column all of the queries you want to use this index have in common as an equality column, then iterate through them as the columns are used less and less.

This is not the traditional “order of cardinality” advice I’m sure you’ve heard when creating an index for a specific query, but we’re designing an index for your database, not your query. I’ll go one step further and say if it’s a toss-up between what’s the first key column, consider making it one that’s added sequentially such as DateAdded or ID on tables that see more updates because that will reduce page splits and fragmentation.

A query can take advantage of the chain of key columns starting with the first one. The chain can continue after each equality use. An inequality or range can take advantage of a key column as well, but the first one of these is the end of your chain. Once the chain is broken, everything else can be useful, but only as unordered values that don’t matter if they’re key columns or included columns.

You can stop putting in key columns when either queries stop being able to take advantage of them being ordered or the values you’re getting are either unique or close enough. These key columns aren’t free as Paul Randal (b|t) points out in his post On index key size, index depth, and performance.  If a key column is not very useful, then it’s very useful not to have it there.

I should note that if you’re using an index to enforce uniqueness then it will use all the key columns and none of the included columns to do so.  Based on the last paragraph you don’t want any key columns after it’s unique anyways, so don’t even consider that.  However, included columns aren’t used to calculate uniqueness, so you can make this a covering index if it helps you without hurting the unique constraint.

This process, like any other indexing process, isn’t going to be perfect. You’ll have to weigh your decisions with queries that are more critical or are called more often carry more weight in your decision.

Now that you have your key columns figured out, look at the queries that use more than just those columns. When they reference this index how many rows are they going to return where they have to get more information from the table itself through a lookup? How wide are those columns, and how many other queries are going to do the same? This is the balancing act between adding included columns and accepting key lookups. A key lookup is going to be a nested loop operation making separate calls to get the missing columns from the clustered index (or heap, for those who wish to anger me), so 10,000 key lookups is 10,000 separate calls in a loop. If you have to add a large number of columns to eliminate 10 key lookups then it’s almost never worth it. If you have to add one small column to eliminate 1,000,000 key lookups then it’s almost always worth it. Notice I didn’t use determinate language on those…you’ll have to find your balance, test it, and balance it again.

Some things like adding a column to avoid key lookups may make more of a difference to you than the user, but that doesn’t make it less important. For instance, I just said adding a small column to eliminate 1,000,000 key lookups is almost always worth it. If those 1,000,000 key lookups were from a single execution of a query then the user would probably notice, they might even buy you lunch if you’re lucky. If it was a single key lookup on a query run 1,000,000 times that day it’s still a drop in CPU utilization and a potential 1,000,000 pages from the clustered index that didn’t have to be read into cache. You’re doing this because it adds up to a better running server, not always because it’s noticed.

Your goal here is making an index as reusable as is reasonable. Here are the reasons you’re doing that:

  • Every index will fight to be in cache (assuming you don’t have vastly more memory than databases to fill it), an index that is reusable by many queries will be more likely to already be in cache and that space in cache will be more versatile.
  • Every index is another write process in an Insert, Update, and Delete, you’re trying to cut down on those.
  • Every index needs to be maintained, you’re cutting down on that, too.
  • Every index adds disk space, backup size, backup duration, restore durations, etc..
  • If you use TDE, every time a page is read from disk into memory it is decrypted. A reusable index tends to stay in memory more, reducing the number of times the CPU has to decrypt it. See, I TOLD you this belonged in an indexing strategy post!

Nothing’s free, so here’s what you’re giving up:

  • The index isn’t going to be the prefect index for most queries. Test the performance of your critical queries, but also keep in mind that these indexes are more likely to be in cache which could eliminate physical reads from the execution of those queries.
  • These indexes will tend to be wider than the query needs, which is basically restating that this isn’t going to be the perfect, most efficient index for a query. It also means that physical reads on these indexes will tend to be more expensive as there are fewer rows per page.  Again, keep in mind they’re more likely to be in memory because you’re going with fewer indexes that are shared by more queries.

Once you decide on an index or two to add, there are a couple things to consider.

  • What indexes don’t you want anymore? If a query could use another index slightly more efficiently, it will.  However, if it’s close enough then you want to get rid of that other index because of all those benefits of reusability I just mentioned (weren’t you paying attention?). It’s not a question of if a query would use the other index, it’s a question of if you want it to use it.
  • Some queries “should” use this index based on the key columns, but instead of it showing up as a seek predicate it shows up as a predicate. In these cases either your chain of key columns was broken (if column 2 wasn’t an equality column, column 3 will not be a seek predicate) or this column is not being referenced in a SARGable way.
  • Test in non-prod, not prod. Then test it again.
  • Know you’re accepting risk and understand everything involved the best you can. Have a healthy fear of that risk while also understanding that you took a risk just driving to work this morning.

Once these changes go through keep an eye on how they’re being used over the next couple weeks. If you’re in a rush to make a big impact, start a second table as the first change or two are in progress on the first table. Just don’t get too many changes in motion for a single table at once as that’s typically adding more risk and hiding which changes had positive and negative impacts. This is a process, and the longer it takes you do go through it the better the chance is that you’re doing it right.

If I’m doing this process for someone else who wants consistent improvement without taking on too much time or risk at once, then I like to get on their servers once or twice a month, find one or two changes to suggest, then have those go through testing and implementation.  Afterwards review the results and come up with the next suggestion.  It’s hard to be that patient as a full-time employee, but try.

Cluster It

All of that was talking about nonclustered indexes, but you get to pick a clustered index for your table as well.  Keep in mind this is a bigger change and involves more risk, but it’s also a bigger reward.

So, what do I care about that’s special for a clustered index?

  • Uniqueness
  • Key width
  • Width of columns being queried
  • Column types being returned (some can’t be in nonclustered indexes)
  • Number of rows being returned

The size of your key columns on your clustered index is the MINIMUM size of the key columns on a nonunique nonclustered index, and it’s also the MINIMUM width of the page level of any nonclustered index.  You need to keep that in mind.

However, just because your table has an identity column in it doesn’t mean that’s the best clustered index.  What is the best clustered index is going to vary wildly from table to table; there’s not always going to be a clear answer, and the answer will partially depend on how the table is queried.  I get into that a lot more in my last post, Picking a Clustered Index.  Yes, I wrote that post specifically to keep this one shorter…with mixed results.

If a table is often queried by a relatively small column that’s not unique, but the queries tend to pull back most of the columns in the table and a large number of rows then it’s worth considering using this as part of the clustered index key.

If you don’t then you’re faced with two solutions; you can make a really wide nonclustered index to cover these queries, or you can let the queries decide if they want to do a ton of key lookups or just scan the clustered index.  These don’t sound like fun to me.

You still have to worry about the integrity of your data, so if you’re dropping the unique clustered index with a single column to do this then you almost definitely want to add a unique nonclustered index with that single key column to maintain your data integrity.

Compress It

Index compression is an Enterprise-ONLY feature.

Compression is a very big point to hit on here, even if I’m only giving you the compressed version.  It makes your data smaller on disk (less I/O), smaller in memory (less need for I/O), and actually tends to lower your CPU usage instead of raising it.  I get into a lot more detail in my Data Compression post because I didn’t want to have too much space dedicated to a feature not everyone can use here.

Don’t Forget the Outliers

Go back to that list you made of all the queries hitting a specific table. Were some of the queries different than the rest? There are usually a couple, and that doesn’t necessarily mean there’s an issue. However, I look at these to determine if they are using the table properly.

Are they are joining on all the fields they should be. Sometimes you can get the correct results by joining on 3 of the 4 fields you technically should, so why join on the 4th? Well, index reusability is one of those reasons, because it may not be able to use the proper index because someone skipped a column that happens to be the first key field of the perfect index for this query.

Is the query SARGable? Sometimes you’re joining or filtering on the right fields, but something is written in a way that SQL couldn’t do a direct comparison.

Are you returning too many columns? I’ve seen queries returning 20 columns (or using *, which is a move obvious version of the same thing) to populate a screen that uses 3 of them, and on the SQL side you have a DBA trying to figure out if they should add included columns to an index to make that run more efficiently. The most efficient for this and many other examples is refactoring, not reindexing.

Remember, your goal is to make your server run more efficiently, and tweaking indexes is simply one of your tools. While you’re going through this process keep your eyes open towards how other tools can be used.  SSMS is never going to come up with a warning telling you that you should read a book or two by Itzik Ben-Gan (b|t) or Kalen Delaney (b|t), but I would welcome that change.

Does this negate my previous advice?

If you follow my blog at all, which is suggested in my very biased opinion, you may have seen me talk about Unused and Duplicate Indexes, but I make no mention of them here. Did I forget about them?

No, I did not. This is designing every index you want to have on your table in a reusable way. If that index was not on the list then you’ll want to consider getting rid of it. It’s two ways of looking at the same thing. A complete understanding of both of these methods will help you make intelligent indexing decisions and go as far as you need to for the situation you’re in.

Talk to me

This isn’t a short or easy process, and perhaps I could have worded some of it better.  I enjoy what I do, both writing this post and playing with indexes, and having someone think enough of me to ask me questions on this makes it all the more enjoyable.

I may be rewriting parts of this post as I find ways to reword concepts better, especially as I finalize and tweak my presentation with the same name for which this post is my guide. That presentation will make its debut at SQL Saturday Cleveland on February 6, 2016.

If you feel you can help me improve, please don’t hold back.  I’d rather feel that I’m improving than falsely believe I’m infallible.

Monitoring and Baselines Presentation

Here’s my full presentation for SQL Saturday #250 in Pittsburgh this past Saturday along with some notes on what I got out of it.

My Lessons

I learned a lot doing this, and I hope people learned a lot from it. For the people on the other side of the podium from me, the main lessons were in what I put in the abstract; they learned mostly about SQL Server. On my side of the podium though, the lessons were far from the same. Public speaking, creating presentation, prepping for unknown audiences, and seeing the gratitude of the SQL Server community in person. Anything that was on the abstract I learned slightly better, but that didn’t compare to what wasn’t on it.

Public speaking was always a fear of mine, and I tend to be someone who avoids my fears. However, throwing myself out there at the public as a whole was never my thing either, and I’ve been doing half decent, at least in my own mind. That being said, I decided to go all out and push myself a little further. After all, what are the chances of them picking me to present with all those professional teachers, consultants, and MVPs out there throwing up their abstracts. Best case I could say I tried, worst case I was going to throw myself out of my comfort zone and hope for the best.

They picked me, which I didn’t let myself expect. Everyone that knew I put in my abstract was also told by me that I only had a 50/50 chance at best, which was more optimistic than I really was, but I don’t like people knowing when I’m being dismissive of myself. It turns I was wrong a lot, and in every case I was wrong I was glad I was wrong.

Being Forced to Get Better

I’m not one to learn things for the sake of learning them, which is why I sucked in school. To really learn things I need a real-life use case, somewhere I’m going to apply it. I didn’t practice speaking ever before because I didn’t speak in front of large groups. I didn’t know how to put together a presentation properly because that’s not my thing…well, wasn’t is probably a better word now. Just like a couple months ago I wasn’t quite sure how to blog, but I’ve made it past that point.

Like everything in my life, blogging went from unknown to addiction quick. I’m not sure that I can say the same thing about speaking, but I can see the possibility. SQL Saturday only comes to Pittsburgh once a year and the local user group, which I’ll be joining soon, only meets once a month and has a single speaker. However, I can’t say that I didn’t look up when the SQL Saturdays in Cleveland and Washington DC were. Also, I took my own thoughts on my presentation and the feedback I received in and immediately starting thinking about how I could make that presentation better.

What’s below is the original, not touched up at all in the last 4 days. In part to show where I went wrong and how I’m going to fix it, and in part because Monday was my first day at a new job. By the way, starting a new job when between jobs is the best and worst thing you could ever do to yourself. You don’t have time to be too nervous about anything because you’re too overworked, but you’re also too overworked. Eh, you can’t win them all.

Well, on to the presentation. What you have here is my script that I talked through and taught myself before the presentation. However, if you were there, you’ll notice this isn’t exactly what I said. I didn’t read it, I presented it. There were no demos (the biggest complaint from the crowd) to avoid me from stumping myself, and the PowerPoint was just a whimsical picture for the start of each paragraph to keep me on track because I knew I’d be nervous and lose my way without a paragraph-by-paragraph guide helping me along.

I must have done at least descent because my reviews came back with two 3’s and the rest 4’s and 5’s out of 5. You’re just asking me to get off topic and start talking about why I hate ratings that are odd numbers, but no, I’m not getting off topic, I’m getting on to my presentation!!!

The Presentation

Download slide deck here.

I’ve been working on databases for over a decade, and most of that without having good standards or monitoring in place. My job was rough and I was much less effective. Now I know what I’m doing and I want to share that knowledge. In this presentation I am going to tell you what I watch and why I watch it. A presentation is a poor format to go through how to do this; having it in writing is much more beneficial to you. That is why I’m pretty much skipping that portion here and diverting you to my blog at SimpleSQLServer.com. Not so much to promote a blog that I lose money on, but instead to give you the resources in the best format.

As a DBA you will hit problems with the performance of your databases. It doesn’t matter if it’s one process or across the board, or if it’s just today or it has always been that way. No matter what it is, the more you know about your servers the easier it’s going to be to fix them.

It’s not easy, there’s no single spot to watch, no single solution to all of your problems, and there’s no “normal” values for these counters you can apply across every server you manage (Note: Thank you, Mike John, you stressed this point a lot). I watch several things on all the important servers, and most of these on every server. The important part is that you watch them continuously, even when you aren’t expecting to use the data. Some parts are cumulative and you can’t tell what was during an incident or what was from overnight maintenance. Other parts are snapshots and there’s no looking back.

(Note: Now I’m on to the stuff Brian Castle taught me. He’s the best you could hope for in a manager, and last I checked he was still hiring at EDMC in Robinson Township near Pittsburgh, PA)

To me, monitoring and baselining is the same thing. I know others will do a specific baselining process on a specific day and keep that. I feel you lose too much doing that and I watch enough to say that I have a continuous baseline going back for 13 months in most instances. I’m not all that worked up about lucky numbers or anything, it’s just nice to be able to say “that annual process we ran last week ran like this last year”. If you’re superstitious or have OCD, 400 days makes me just as happy.

Traces – Snapshot
Wait Stats – Cumulative – Resets on restart
Blocking – Snapshot
Query Stats – Cumulative – Resets on recompile
Index Stats – Cumulative – Resets on restart
OS Perf Counters – Varies
Database Sizes – Snapshot
Table Sizes – Snapshot

Traces – What long running processes have completed on the server?

You have to be careful, this one can kill your server. I have, on more than one occasion (sorry, Brian), caused a large server to reboot in the middle of the day because I filled the drives writing 40 GB of traces in 5 minutes. That being said, there is a safer way to trace, just don’t do an unfiltered trace catching everything.

I personally feel that absolutely every server should have a trace capturing RPC:Completed and SQL Batch:Completed that ran over X seconds duration. What is X? Well, that depends on the server. I’ve seen the best number for X be as low as 100 ms and as high as 10 seconds. Start high, and work your way down. You can add or do a separate trace for the Statement Completed for each of these if you want more detail.

You can get all of this except for SQL Batch:Completed with the text data from Extended Events, and that is a very good alternative. I haven’t made that jump for several reasons, although I would recommend it. First, this level of tracing hasn’t hurt me. Second, I still support several important servers that are still on SQL 2005 and I want to be as consistent as possible to provide as consistent as possible support.

What you’re trying to solve here is answering what ran long, have some hints as to why it ran long, and be able to say how often it has been running that way in recent history. A long duration doing very little work was waiting on something else, and other monitoring will help solve that problem. A lot of CPU, reads, or writes shows that you may need to look into tuning or statistics. Just keep in mind that reads can be reads of work tables, and writes can be writes to tempdb.

If you do it right you should have 4 or more days worth of history, and by doing it right I mean having 5 to 10 rollover files sized a reasonable size you can send off if needed (100 MB at most, they zip well) and capturing over the best duration threshold for your environment.

Wait Stats – What’s slowing you down?

The traces tell you what work was done, and the waist statistics tell you what went on when the query couldn’t actively do its work. This is cumulative over the server, so you can’t get details down to an individual process, however, you can see how much time is wasted and where it’s wasted at.

There are almost 500 distinct wait types in 2008 R2, but you don’t need to worry about that. The important thing is to know what are your biggest waits are, and you can look them up if you don’t recognize them. In the last slide I have links to a free eBook written by Jonathan Kehayias and Tom Kruger for the Accidental DBA that does an amazing job documenting what you can ignore and the meaning of the big ones you’re most likely to see.

If you’re looking for overall server health then you would want to look at waits over a long period of time, specifically the times of day you’d like to see better performance. For incidents, you want to look at what your wait stats are now and compare them to your running baseline. For me, the typical baseline I use is the same timeframe yesterday and 7 days ago. These are actual relevant days that have all but the changes you put in over the last couple of days, and it’s a true baseline for this specific server. If this server never waits on PageIOLatch, but it’s in your top 3 today then you instantly have a direction your heading in troubleshooting.

Blocking – Who’s in your way?

Locking occurs when one query is using data, and blocking occurs when another query needs to do something that is incompatible with that lock. If you don’t keep an eye on it then you’re looking back at a trace and seeing that a query took 1 second of CPU, did 1,000 reads, and no writes, yet it figured out a way to take 5 minutes in duration. The users complain the app is slow or unusable, and you’re giving them the answer that you’ll try to figure it out next time it happens and THEN try to find out the root cause.

That’s a bad idea. There are three types of users – those who don’t know you because things are running smooth, those that like you because you can say “I see exactly what happened and I can start working on avoiding reoccurrences”, and those users that you keep telling that you’re not sure what happened and you’ll try to get a better idea the next time it happens. I wish I could tell you that I could have more users not know your name, but watching blocking is a very easy way to get the users that like you to outnumber the users that hate you.

So, what can you do to tell them you know what just happened? My answer is to capture everything that’s blocking on the server every minute. Sure, there’s going to be a lot you capture that is actually only blocking for 10ms, and there are going to be things that were blocking for 59 seconds that were timed perfectly so you never saw it happen. The trick is to know that there is no perfect solution, and capture what you can. The DMVs are there to provide all the information, and being able to capture that on the fly along with in a proactive monitoring setup will make you look amazing to the users. If something blocked for 5 minutes, you have no excuse not to say “This query blocked this query, and this was the head blocker”. Sure you may have an excuse to say “It’s a vendor database and I passed it on to their support”, but at least you can tell the users something, which is always better than nothing.

Query Stats

The DMV dm_exec_query_stats is used by many DBAs, even if all we realized was that we were running one of the “Top 10 Most Expensive Query” scripts off the internet. The problem is that this is using it wrong. This DMV holds everything from when the query first went into cache and loses that information when the query goes out of the cache, even just for being recompiled. So the big process you ran overnight could still be in cache, leading you down a path that’s actually low priority for you. However, that relatively large process that runs every 5 minutes that just recompiled a couple minutes ago due to stats auto updating, it’s not even in there. This is too volatile to say that you’re getting good numbers out of it.

The answer isn’t to find somewhere else to get this information, it’s to capture this information more often and more intelligently. Capture it once and you have a snapshot of what it looked like, capture it again an hour later and you know what work has been done in that hour. Repeat the process over and over again, constantly adding another hour’s worth of data and saving off the latest snapshot in the process and you have some real information to go off of. This isn’t so much of what happened to be in cache at the time, it’s a pretty good summary of what ran and when it ran. Now you can query in detail, specifically saying “I want to know what queries are doing the most physical reads between 9:00 AM and 5:00 PM on weekdays so I can have the biggest impact when I tune this database.”

Index Stats

The DMV dm_exec_index_usage_stats isn’t too different from dm_exec_query_stats except that it’s cumulative since the time SQL services were started. Looking at a snapshot still leaves you half blind, unable to see what happened before the last time you rebooted the server and unable to tell when that index was used. Personally, seeing when it was used is less important to me than how much it was used long-term. Tracking this can do amazing things for you if you’re really fine-tuning a database.

This DMV is really under-used in my experience. As time goes on with a database the data changes and you add more indexes to make it run faster, but you don’t know if or when the index it used to use was being used by anything else so it stays there. Over time that means you have more and more unused indexes, and there’s no solid proof that they aren’t being used without you doing the work to collect that proof. This, in addition to duplicated indexes, adds to the workload for inserts and updates, adds to the workload of index maintenance, adds to the database size, and put additional strain on your cache and thus the PLE.

OS Performance Counters

This is one of the most misunderstood DMVs while also being one of the most useful. It’s giving you critical counters that give you an idea of how SQL Server is interacting with the hardware, but it’s doing it with several different types of counters that have to be measured differently. There are a couple great posts out on the internet to understanding how each one needs to be measured. As for right now, we’re focusing on why you want to watch it and what it will do for you. And wouldn’t you know that I saved the hardest one to describe for last, because each value is different and there will be controversy on which ones you should be watching and which ones are just a waste of your time monitoring.

PLE is one that this is no controversy about if you should watch it. This is the average age of the pages in cache, and a good measurement of when you’re doing too many physical reads. Peeking in on this from time to time is good, you can make sure it’s high enough. However, watching this will let you know when it’s dropping and help you dive into why it’s dropping. If you’re also running a trace you can see what did a lot of reads in that timeframe, and if you’re capturing Query Stats then you can find out which one of your large read queries is doing all the physical reads that destroy your PLE.

Server Target and Total memory are also great to watch even though they rarely change after the server is back up to speed following a restart of SQL services. The target memory is how much SQL Server would like to have, and Total memory is how much it actually does have. If target memory drops then there’s OS pressure you need to worry about. Also, as Total memory is increasing then it means the server is still filling the cache and PLE can pretty much be ignored. Of course it’s low, everything in there is new.

Page Lookups, Page Reads, Page Writes and Lazy Writes give you a better idea of how SQL Server is interacting with your cache and disks. If you want a real picture of what SQL Server is doing behind the scenes, this is much more useful than glancing at cache hit ratio.

Deadlocks and Memory Grants Pending are two things you’d like to always see at zero. That may be out of the question with deadlocks, but if you’re getting above zero on memory grants than you need to find out when that happened and everything that was running at that time.

SQL Compilations and Recompilations are hidden CPU hogs. They’re never going to be at zero because very few servers have stuff stay in cache forever, ad-hoc code runs, and several other reasons. However, when one of these counters jump up, you may have a hidden cause. For instance, if a piece of code that runs every minute or more was written in a way that it can’t be stored in cache, these numbers will be noticeably higher. When this happens, your CPU will be noticeably more stressed with no other indicators as to why, and this query won’t even show up in your Query Stats as it relies on showing the stats of everything that’s currently in cache.

Database Sizes

This seems simple, and it is. Most people know how to find the sizes of their databases, but do you track it? If not, you should. Some questions you should be able to answer readily are how fast your databases are growing, when will you run out of space, and is the steady decline in PLE justified by the growth of the database. Getting into more details, you can touch on when does the database grow such as an accounting app jumping in size every April, or answering if the data is growing steadily or exponentially. These are important things to know when it comes time to budget for new servers and disk space.

On the other side of things, you also have information to push back on application teams on how much space they’re using and if that’s really necessary. A lot of times you’ll find that they’re surprised by their growth to the point that you have to show them the numbers for them to believe it. It’s not uncommon for the response from this to be a cleanup project that helps keep the databases smaller and running faster, which is a big goal being accomplished.

Table Sizes

This is just building off of watching your database sizes in more detail. When a database starts filling up, and it’s filling up quick, it’s good to tell an application team what table is doing it. In my current environment it makes sense to watch every table that is over .5% of the database size AND over 100 MB. This means that some databases we don’t watch anything and other databases we watch about the 20 biggest tables. The biggest point is that we’re not trying to watch everything while watching anything big enough to make a difference to us.

If a database is filling a lot faster than normal then a lot of the time there’s a process that isn’t running like it should. The biggest table in the database may be a rather static value, but the third biggest table in the database wasn’t even on your radar two weeks ago. The app teams love it when you can tell them that not only is the database growing out of control, but we also see the growth in table X which has been growing at 200 MB per day starting on the first Saturday of last month. By the way, wasn’t last Saturday the day you changed a couple procs around or implemented an upgrade?

Now you just graduated from “Hey, there’s a problem” to also include “and here’s a huge lead to finding the root cause.” It’s rare to use this information, but it’s lightweight to capture for something that gives you a heavyweight appearance when you reference this knowledge to other teams.

My Critiques

I’d love to hear what you think about this presentation. Before we get to that, here are the complaints from the most harsh person in the room when I was presenting…

I didn’t do enough to say what each thing was. Although I marked the presentation as intermediate, you don’t know who’s going to be there. In this case, I knew a couple people in the crowd, and they ranged from data analyst to database manager. It should have been presented in more of “If you’re not ready for intermediate then you have everything you need, but it will take some effort to keep up”.

My slides were lacking. I stick with the idea that there should be few words on the screen, but I took it too far. I admitted above that the slides were there more to keep me on track then it was to help the audience, which I probably needed for my first public speaking venue. However, next time I’ll be less nervous and more prepared to make it geared for the audience better. Instead of random pictures, I should have more of graphs and data that I can talk through to give the audience visuals.

Nothing was given to the audience, and there should have been something. To be fair, none of the presentations that I know of handed anything to the audience. However, I set my own bar, and I feel that people would like a sheet of paper with the outline on it with links to online sources. My blog, of course, because it shows how to grab everything. Also, an outside link for each item. This gives a physical reminder to turn this knowledge into action.

There was no demo, and that was the audience’s biggest complaint. While I’m still not sure that I would do a live demo on my second go around, I’m going to hold myself on the remark above about my slides having graphs and data. That would also drag me away from the podium to talk through the slides, making it a more dynamic and interesting presentation; that would really help the audience take things in.

Your Turn

Now I need you to do two things. Put this information to work in your environment is what will help you the most. What will help me the most is you commenting on this post both before and after you implement any of this so you can help me become a better presenter.

Thank you for taking the time to read this!

SQL Saturday 250 – Pittsburgh – Advice for My Presentation on Monitoring and Baselines

I’ll be presenting at SQL Saturday 250 in Pittsburgh on Baselining and Monitoring. This is my first public presentation, and I could use some advice to make sure my presentation is the best it can possibly be for both myself and the audience.

First, my presentation. Anyone who’s been following my blog won’t be surprised by anything they see. I’m going to hit all of the the statistics I watch and why I watch them. The whole thing is a pep rally for what, why, and how it will change your job. How to do it will be completely skipped for the reasons below.

  • Brent Ozar told me to in point 13.
  • Trying to take notes on the how would be too intense for any audience member.
  • What and why are more exicting, life’s too short for boring!!!
  • I already did the how, it’s on my blog, and I’ll tell the audience all of the hows are already listed here in both my opening and closing statements.

The points I’m hitting are:

  • Traces (with mention of Extended Events)
  • Wait Stats (sys.dm_os_wait_stats)
  • Blocking (sys.dm_os_waiting_tasks with LCK% type)
  • Query Stats (sys.dm_exec_query_stats)
  • Index Stats (sys.dm_db_index_usage_stats)
  • OS Perf Counters (sys.dm_os_performance_counters)

In all of these areas I’ll go through why I watch these and how monitoring them over looking at the values you can get at any point-in-time has changed my life as a DBA. After going through each section I would like to ask the audience for questions in the middle of the presentation so the topic of the questions is fresh in everyone’s minds and more likely to be taken to heart. How you capture these will be completely documented on my blog to the point that you can practically copy/paste the code to develop a customized monitoring setup on your servers from one source.

Baselining is built directly into my monitoring and how I use my results. Instead of having a static baseline that I took on a certain day, I’m a huge fan of comparing today’s incidents to yesterday and last week with a rolling baseline. I find it to be an impressive way to handle things.

The slide deck still needs a bit more work, but it’s literally just going to be a couple words (Steve Jobs style!!!) with the information coming from what I say instead of the audience’s attention being divided between me and the screen behind me. They’ll see the topic on the screen with a picture of what it means to me while hearing my views.

This presentation is almost complete on my end, but I’m flexible enough to say I’m wrong on anything and make changes. Some questions I have for your are:

  • Is this a presentation you would enjoy seeing? Why or why not?
  • Do you feel this is a complete list of items to monitor? What would you change?
  • Are questions in the middle of a presentation a good idea?
  • How do I make this public, both assisting the audience as much as possible while keeping in mind my rights and possible future use of this presentation? I would consider posting the full thing on the internet, even a recording of me doing it, but I don’t want to think about it a year from now and think that I just shot myself in the foot doing it. I don’t know what to expect, thus the call for help.
  • Anything else you have to add will be greatly appreciated. Adding that information before September 14th would raise the level of my appreciation! 🙂

I’m extremely motivated to do this, and I want to absolutely kill this presentation! The only way to hit my expectations is to prep for it, and the best way to prep for it is to take in as much positive and negative criticism before it goes live as I possibly can.

Thank you for taking the time to read this!
Steve