SQL Server Running Slow

A user calls to say the app or server is slow today.  Here’s a quick summary of how I get started.  It depends on monitoring where you can compare baselines to recent activity, and I include links to all of that code I use.

The initial question is too vague.  You can’t blame this individual user because this is how 95% of issues make it to you.   The initial follow-up will be the following questions I ask the user along with the corresponding question your boss is asking you:

  • Can anyone else duplicate your issue?
    • Is this the user’s machine or a server issue?  Not a 100% conclusive answer, but it’s a start.
  • Is everything running slow or just one process or screen?
    • Is the whole server running slow or just one stored procedure?
  • Were any changes made to the system recently?  Depending on the user, this may be a question for someone else.
    • Did we cause it by making a change or did this just come up?

However, this is enough for me to do a very quick overall health check while I wait for those answers.  It seems like a lot at first glance, but it’s practically automated with the scripts I have on hand.  This is only opening 5 pre-written scripts, running them, and peeking at the results.

Quick Health:

  • Look at historical wait stats for the last two hours and the same time window both 1 and 7 days ago.  Documented in the Wait Stats post.
  • Look at memory and CPU contention for the same times as wait stats.  You’re specifically looking for high CPU usage, PLE drops, and changes to target memory. Documented in the OS Performance Counters post.
  • Look for blocking in the last hour, even though it would typically show up in wait stats.  You can compare this to previous days if needed, Documented in the Blocking post
  • Look at the large queries that completed in the last two hours along with the queries currently running to see if anything pops out.  This is just a quick glance looking for glaring issues, and you can look for details after getting answers to the follow-up questions.  Unfortunately, this information isn’t as cut-and-dry as the rest, and it’s going to rely on you being a little familiar with what normal is on this server.  Completed queries are documented in the Tracing Creation and Reading posts, and currently running queries are documented in the Running Processes post.
  • Look at the queries recompiled recently to see if a new execution plan is taking excessive resources.  Documented in the Recently Recompiled Resource Hogs post.

So, where do you go from here?  Yeah, you get the typical “it depends” answer from me.  The information you just looked at will normally comes up with something that doesn’t look right and needs to be looked into further.  However, there are so many directions this can go from here that a single document can’t cover it.  The purpose of this post is simply to answer the question “Where do I start looking with the most common question a user throws at me”, nothing more.  Also, I wanted a way to tie all of the random posts together so you could see them working together as the full solution.

Running Processes

Sp_who2 is not used by this DBA.  That info is amazingly good to have, but I want more.  Don’t get me wrong, sp_who2 is kinda hidden in this one as it’s really just a proc that does little more than “SELECT * FROM SysProcesses“, and I use SysProcesses where I couldn’t find the same info in the DMVs that were supposed to replace it.  Don’t go thinking that SysProcesses is just a depreciated dinosaur, even MVPs still use it: http://sqlblog.com/blogs/kalen_delaney/archive/2008/06/29/why-i-still-need-sysprocesses.aspx

Threads and RunningThreads are new for me here.  It’s good to see when a query is split into 100 threads and only 1 is doing anything.  It’s also easier on the eyes to see the number 100 instead of seeing 100 separate rows for one SPID.

The percent complete and estimated completion time on this script are typically blank, with the major exceptions being backups and restores where these columns are the best thing ever! 

BlockedBy is something overlooked by many people.  If your query seems to be running forever when it normally runs quick, there’s about a 75% chance you’re just waiting for another process to complete. It’s also why you’ll see the notorious (nolock) in almost everything I write for myself, but that’s another conversation.  The HeadBlocker attempts find the start of a blocking chain, although I can say that I know my logic here isn’t bulletproof.

My wait types used to be perfect where if there wait time was 0 then it was lowercase, otherwise it was all caps.  It seems I broke it slightly, but it’s almost perfect.  My OCD will eventually get the best of me here, but it has to contend with time with the rest of my life.

The rest is pretty self-explanitory with the exception of query plans.  They are the last line of the SELECT and FROM clauses that are commented out.  It’s somewhat rare to need the plan, and it’s not too uncommon to have this script run noticably slower when looking for plans.  Because of that, I uncomment that column and the the cross apply that makes it possible only when needed.

SELECT SPID = er.session_id
    , ot.Threads
    , RunningThreads = coalesce(rsp.RunningThreads,0)
    , Pct_Comp = er.percent_complete
    , Est_Comp_Time = CASE er.estimated_completion_time WHEN 0 THEN NULL ELSE dateadd(ms, er.estimated_completion_time, getdate()) END 
    , er.status
    , er.command
    , database_name = sd.name
    , BlockedBy = wt.blocking_session_id
    , HeadBlocker = coalesce(hb5.session_id, hb4.session_id, hb3.session_id, hb2.session_id, hb1.session_id)
    , wait_type = coalesce(CASE er.wait_type WHEN 'CXPACKET' THEN 'CXPACKET - ' + sp.lastwaittype1 ELSE sp.lastwaittype1 END, lower(er.last_wait_type)) --Lowercase denotes it's not currently waiting, also noted by a wait time of 0.
    , Wait_Time_Sec = Cast(er.wait_time/1000.0 as DEC(20,3))
    , er.wait_resource
    , Duration_Sec = Cast(DATEDIFF(s, er.start_time, GETDATE()) as DEC(20,0))
    , CPU_Sec = Cast(er.cpu_time/1000.0 as DEC(20,3))
    , Reads_K = Cast(er.reads/1000.0 as DEC(20,3))
    , Writes_K = Cast(er.writes/1000.0 as DEC(20,3))
    , [Statement] = SUBSTRING (st.text, er.statement_start_offset/2,
        abs(CASE WHEN er.statement_end_offset = -1
               THEN LEN(CONVERT(NVARCHAR(MAX), st.text)) * 2 
            ELSE er.statement_end_offset END - er.statement_start_offset)/2)
    , st.text as Query
    , es.login_time
    , es.host_name
    , program_name = CASE LEFT(es.program_name, 29)
                    WHEN 'SQLAgent - TSQL JobStep (Job '
                        THEN 'SQLAgent Job: ' + (SELECT name FROM msdb..sysjobs sj WHERE substring(es.program_name,32,32)=(substring(sys.fn_varbintohexstr(sj.job_id),3,100))) + ' - ' + SUBSTRING(es.program_name, 67, len(es.program_name)-67)
                    ELSE es.program_name
                    END  
    , es.client_interface_name
    , es.login_name
    , es.status
    , es.total_scheduled_time
    , es.total_elapsed_time
    , er.start_time
    , es.last_request_start_time
    , es.last_request_end_time
    , er.database_id  
    --, qp.query_plan 
FROM sys.dm_exec_requests er
    INNER JOIN sys.dm_exec_Sessions es on er.session_id=es.session_id
    LEFT JOIN sys.databases sd on er.database_id=sd.database_id
    INNER JOIN (SELECT session_id, count(1) Threads FROM sys.dm_os_tasks GROUP BY session_id) ot on er.session_id=ot.session_id
    LEFT JOIN (SELECT spid, LastWaitType1 = MIN(lastwaittype), LastWaitType2 = MAX(lastwaittype) FROM sysprocesses sp WHERE waittime > 0 AND lastwaittype <> 'cxpacket' GROUP BY spid) sp ON er.session_id = sp.spid
    LEFT JOIN (SELECT spid, RunningThreads = COUNT(1) FROM sysprocesses sp WHERE waittime = 0 GROUP BY spid) rsp ON er.session_id = rsp.spid
    LEFT JOIN (SELECT session_id, max(blocking_session_id) blocking_session_id FROM sys.dm_os_waiting_tasks wt WHERE wt.blocking_session_id <> wt.session_id GROUP BY session_id) wt ON er.session_id = wt.session_id 
    LEFT JOIN (SELECT session_id, max(blocking_session_id) blocking_session_id FROM sys.dm_os_waiting_tasks wt GROUP BY session_id) hb1 ON wt.blocking_session_id = hb1.session_id 
    LEFT JOIN (SELECT session_id, max(blocking_session_id) blocking_session_id FROM sys.dm_os_waiting_tasks wt GROUP BY session_id) hb2 ON hb1.blocking_session_id = hb2.session_id 
    LEFT JOIN (SELECT session_id, max(blocking_session_id) blocking_session_id FROM sys.dm_os_waiting_tasks wt GROUP BY session_id) hb3 ON hb2.blocking_session_id = hb3.session_id 
    LEFT JOIN (SELECT session_id, max(blocking_session_id) blocking_session_id FROM sys.dm_os_waiting_tasks wt GROUP BY session_id) hb4 ON hb3.blocking_session_id = hb4.session_id 
    LEFT JOIN (SELECT session_id, max(blocking_session_id) blocking_session_id FROM sys.dm_os_waiting_tasks wt GROUP BY session_id) hb5 ON hb4.blocking_session_id = hb5.session_id 
    CROSS APPLY sys.dm_exec_sql_text(sql_handle) AS st  
    --CROSS APPLY sys.dm_exec_query_plan(er.plan_handle) qp
WHERE er.session_id <> @@SPID
    --AND es.host_name like '%%'
    --AND er.session_id = 2702
ORDER BY er.percent_complete DESC, er.cpu_time DESC, er.session_id

--Use the below command to get the last input of an open session id
--dbcc inputbuffer(61)