Many people see CXPACKET at the top of their waits and start trying to fix it. I know this topic’s been done before, but people ask me about it enough that it should be done again, and again. In fact, Brent Ozar wrote an excellent article about it, and Jonathan Kehayias wrote an article about tweaking it using the cost threshold of parallelism. Even with those, it took Grant Fritchey warning us to be careful about giving guidance to lead me to Bob Ward’s take on it. After all of these people and more saying CXPACKET isn’t what you think it is, enough people have it all wrong to the point that I felt this was the biggest piece of public opinion to speak out against for Michael Swart’s T-SQL Tuesday #052.
BOL still doesn’t state it completely clear when it defines it as “Occurs with parallel query plans when trying to synchronize the query processor exchange iterator. If waiting is excessive and cannot be reduced by tuning the query (such as adding indexes), consider adjusting the cost threshold for parallelism or lowering the degree of parallelism.” The true meaning of this wait is much easier, with Bob Ward’s explanation of “This wait type only means a parallel query is executing.”
Yes, that’s it, it’s not really all that complex. If you have a MAXDOP of 4 then you could end up with 5 threads. Think of it as having one manager who can delegate work out to 4 subordinates. The subordinates do their pieces of the job and the manager marks down CXPACKET on his time sheet while he waits for them. When they finish, the manager compiles their work (repartitions the streams if you want). There was a little extra work done here because the manager wouldn’t have to combine all the work if he did it all on his own (MAXDOP = 1), and it’s possible the subordinates did more work than they had to. However, in most cases, it does get done faster, and duration is what all your users talk about.
The advice in BOL above saying “consider adjusting the cost threshold for parallelism or lowering the degree of parallelism” is incomplete at best. Upping the cost threshold is exactly what it sounds like, making it so only more expensive tasks can be assigned to those subordinates. Then lowering the degree of parallelism is just dropping how many subordinates you can use.
Cost Threshold for Parallelism is something Brent Ozar makes fun of a bit, because it’s a default of 5 units that are undefined anymore. Jeremiah Peschka recommends setting this to 50 and tweaking it from there. I’d recommend 25 for most OLTP because I like parallelism and my procs aren’t stressed. However, the point is that you should only tweak what you understand, and now I hope you have a descent understanding of this. If you have a lot moderately large queries you wish could break out the work to get it done faster, drop it. If all of your CPUs are running hot and you wish less work was being done repartitioning streams or compiling more complex plans, raise it.
Max Degree of Parallelism is what people really like to tweak, and it’s usually a bad idea. My personal recommendation is to set it to 4 in OLTP (higher in OLAP, very dependent on your load) and forget it unless you have a really, really good reason to change it. The default of 0 is absolutely horrible for OLTP on larger servers, where this could mean that you’re breaking out a task and assigning it to 80 or more subordinates, then trying to put all of their work back together, and killing yourself doing it. This shouldn’t be above 8 unless you really like to be overworked putting your subordinates’ work back together and dealing with those performance problems. However, it’s great to have subordinates who can help you out when needed, so please don’t drop this below 4 unless you have an amazing reason to do so, and only on the query level if you only have one amazing reason.
So, what do you do when CXPACKET is taking over? The trick is to think of it in terms of how a business would work. You have a ton of work to do, enough where your managers are going crazy trying to find subordinates to do it for them. Do you tell your managers to do it themselves OPTION(MAXDOP=1), do you tell them to go out and find as many subordinates as they can OPTION(MAXDOP=0), or do you tell them you’ll see if you can help them find a more efficient way to do their jobs?
If you have the Cost Threshold for Parallelism set to 25 or 50 you have a descent limit before you consider spreading the load around. So, what takes that long for your server to handle? A good portion of the time you’ll find this when you’re scanning a large index or table, or even doing a very expensive seek. If you look at your execution plan is it doing a scan? If it’s a seek that’s starting your parallelism then does it have a small “Seek Predicate” and a lot to do in the “Predicate” in the properties for that operator? It could be that you could use better indexes, it could be that you could find a more SARGable approach, or a variety of other performance tuning opportunities.
The thing is that parallelism isn’t bad and CXPACKET is saying you’re using a good thing. A high CXPACKET could mean you’re trying to use too much of a good thing because you’re “overworking your managers”, but the workload is typically more of an issue than the fact that you let your manager delegate their work when there’s enough to go around. A CXPACKET a lot higher than your baseline could mean that you have poor statistics or parameter sniffing that’s causing a bad execution plan, or it could mean you have a new query that could use some attention. When you’re talking about parallelism it’s also possible that the statistics were off and one thread (subordinate) ends up doing 99% of the work, making CXPACKET jump up, and that can be found in the actual execution plans; read more about that in Grant Fritchey’s book.
If you have a MAXDOP around 4 (between 2 and 8, reading these recommendations) and Cost Threshold of 25 or higher then I’d be looking into the workload more than the fact that you’re using parallelism. Sure, some tasks just don’t take well to parallelism and there’s reasons to use a MAXDOP=1 hint, but it’s rare to need to take it down to 2 on the server level and even more rare to turn if off by taking it down to 1.
What ever you do, when you have “a big query using parallelism”, try to focus on “a big query” first, it’s usually the problematic part of that statement.