In SQL Server, Blocking Begets Deadlocking

Gateway Drug


Most servers I look at have some level of problems with queries deadlocking with one another. In many cases, they’re read queries deadlocking with write queries, which is easy to resolve using an optimistic isolation level.

My approach to resolving deadlocks is nearly identical to my approach for resolving blocking problems: make the queries go faster to reduce the potential for overlap.

Deadlocks are a result of queries blocking each other, where they’d drag on forever in an unwinnable grudge match. Sometimes this happens just because of bad timing, weird locking hints, using implicit transactions, or application bugs that leave sessions in a sleeping state while holding locks. The fastest queries in the world generally can’t fix those kinds of problems, because they’re going out of their way to do bad things.

But it still comes back to locks being taken and held. Not all blocking leads to deadlocks of course, but the longer you leave locks out there, the higher your chances of running into deadlocks is.

A lot of the time, just figuring out what deadlocks is only half the battle. You’ll also need to diagnose what’s blocking to fully resolve things.

How do you do that?

Blocked Process Report


Turning on the blocked process report is a good starting place. You can do that like so.

EXEC sys.sp_configure
    N'show advanced options',
    1;
RECONFIGURE;
GO
EXEC sys.sp_configure
    N'blocked process threshold',
    5; --Seconds
RECONFIGURE;
GO

The only real downside of the blocked process report is that you can’t go below five seconds for the block duration that you have to hit before things are logged.

We’ll talk about other options next, but first! How do you log to the blocked process report now that it’s enabled?

Extended events, my dear friends.

CREATE EVENT SESSION 
    blocked_process_report
ON SERVER
    ADD EVENT 
        sqlserver.blocked_process_report
    ADD TARGET 
        package0.event_file
    (
        SET filename = N'bpr'
    )
WITH
(
    MAX_MEMORY = 4096KB,
    EVENT_RETENTION_MODE = ALLOW_SINGLE_EVENT_LOSS,
    MAX_DISPATCH_LATENCY = 5 SECONDS,
    MAX_EVENT_SIZE = 0KB,
    MEMORY_PARTITION_MODE = NONE,
    TRACK_CAUSALITY = OFF,
    STARTUP_STATE = ON
);

ALTER EVENT SESSION
    blocked_process_report
ON SERVER 
    STATE = START;
GO

To read data from it, you can use my stored procedure sp_HumanEventsBlockViewer.

EXEC dbo.sp_HumanEventsBlockViewer
    @session_name = N'blocked_process_report';

That should get you most of the way to figuring out where your blocking problems are.

Logging sp_WhoIsActive


If you want try to catch blocking problems shorter than 5 seconds, one popular way to do that is to log sp_WhoIsActive to a table.

I have a whole set of code to help you do that, too. In that repo, you’ll find:

It works pretty well for most use cases, but feel free to tweak it to meet your needs.

Getting To The Deadlocks


The best way known to god, dog, and man to look at deadlocks is to use sp_BlitzLock.

I put a lot of work into a big rewrite of it recently to speed things up and fix a lot of bugs that I noticed over the years.

You can use it to look at the system health extended event session, or to look at a custom extended event session.

CREATE EVENT SESSION 
    deadlock
ON SERVER
    ADD EVENT 
        sqlserver.xml_deadlock_report
    ADD TARGET 
        package0.event_file
    (
        SET filename = N'deadlock'
    )
WITH
(
    MAX_MEMORY = 4096KB,
    EVENT_RETENTION_MODE = ALLOW_SINGLE_EVENT_LOSS,
    MAX_DISPATCH_LATENCY = 5 SECONDS,
    MAX_EVENT_SIZE = 0KB,
    MEMORY_PARTITION_MODE = NONE,
    TRACK_CAUSALITY = OFF,
    STARTUP_STATE = ON
);
GO

ALTER EVENT SESSION
    deadlock
ON SERVER 
    STATE = START;
GO

And then to analyze it:

EXEC dbo.sp_BlitzLock
    @EventSessionName = N'deadlock';

Problem Solving


Once you have queries that are blocking and deadlocking, you get to choose your own adventure when it comes to resolving things.

If you need help with that, click the link below to set up a sales call with me. If you’re gonna go it on your own, here are some basic things to check:

  • Isolation levels: are you using repeatable read or serializable without knowing it?
  • Do you have the right indexes for your queries to find data quickly?
  • Are your queries written in a way to take full advantage of your indexes?
  • Do you have any foreign keys or indexed views that are slowing modifications down?
  • Are you starting transactions and doing a lot of work before committing them?

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

SQL Server Community Tools: Changing The Output Of sp_WhoIsActive In Different Ways

But Why?


As a procedure, sp_WhoIsActive is pretty perfect. It’s not all things to all people, but it does what it’s supposed to do really well.

One thing I really like about it is that you can change a couple things about the way it returns data to you (or to a logging table) in different ways:

  • The set of columns
  • The sorting of columns

There are definitely good uses for these options, especially when you’re beyond the “what the hell is going on here?” phase and on to the “troubleshooting a specific problem” phase.

Just as a couple examples, if you were specifically troubleshooting:

  • tempdb issues, you’d wanna sort and focus on tempdb related columns
  • blocking issues, you’d wanna sort and focus on blocking related columns

Let’s take a look at how to do that.

Focus On tempdb


If you want to focus on just what’s using a lot of tempdb without any other noise, and you want to find out what’s using the most tempdb right at the top, you can do this:

EXEC sp_WhoIsActive
    @sort_order = '[tempdb_current] DESC',
    @get_plans = 1,
    @output_column_list = '[start_time][session_id][sql_text][query_plan][wait_info][temp%]';

It will:

  • Sort the results by what currently has the most stuff in tempdb
  • Only output columns related to tempdb use with some other identifying details

What’s nice about this is that not only does it help you totally focus right in on what you care about.

And if you’re taking screenshots to document what hellish things are happening, you don’t have to spend a bunch of time editing them to cut distractions out.

Focus On Blocking


This is one I use a whole bunch when clients are hitting blocking issues.

EXEC sp_WhoIsActive
    @sort_order = '[blocked_session_count] DESC',
    @find_block_leaders = 1,  
    @get_plans = 1,
    @get_task_info = 2,
    @get_additional_info = 1,
    @output_column_list = '[start_time][session_id][sql_text][query_plan][wait_info][block%][additional_info]';

Here’s what it does:

  • Finds which queries have the most blocked sessions under them
  • Sort the results by which sessions are doing the most blocking
  • Gets additional information about the session, like isolation level, locks, etc.
  • Only outputs columns you need to see relevant blocking details

If you don’t have an extended event session set up to capture blocking and stuff like that, this is a great way to get a lot of the same information on the fly.

What You Do


There are probably other variations on this that would be useful to you at various times. These are just to get you started.

You can sort by any one of these columns:

session_id, physical_io, reads, physical_reads, writes, tempdb_allocations,

tempdb_current, CPU, context_switches, used_memory, physical_io_delta, reads_delta,

physical_reads_delta, writes_delta, tempdb_allocations_delta, tempdb_current_delta,

CPU_delta, context_switches_delta, used_memory_delta, tasks, tran_start_time,

open_tran_count, blocking_session_id, blocked_session_count, percent_complete,

host_name, login_name, database_name, start_time, login_time, program_name

Meaning, if you were troubleshooting high CPU, it might make sense to sort by that descending or something.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

This OVERPOWERED Hint Will Break NOLOCK Hints In SQL Server

G4M3R


CREATE TABLE
    dbo.view_me
(
    id int NOT NULL
);
GO 

CREATE VIEW 
    dbo.viewed
AS 
SELECT
    vm.*
FROM dbo.view_me AS vm WITH(READCOMMITTED);
GO 

SELECT
    v.*
FROM dbo.viewed AS v WITH(NOLOCK);
GO

Msg 4138, Level 16, State 1, Line 22

Conflicting locking hints are specified for table “dbo.view_me”. This may be caused by a conflicting hint specified for a view.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Should Query Store Also Capture Blocking And Deadlocks?

Big Ideas


The more I used third party monitoring tools, the more annoyed I get. So much is missing from the details, configurability, and user experience.

I often find myself insisting on also having Query Store enabled. As much as I’d love other improvements, I think it’s also important to have a centralized experience for SQL Server users to track down tricky issues.

There are so many views and metrics out there, it would be nice to have a one stop shop to see important things.

Among those important things are blocking and deadlocks.

Deadlockness


Deadlocks are perhaps the more obvious choice, since they’re already logged to the system health extended event session.

Rather than leave folks with a bazillion scripts and stored procedures to track them down, Query Store should add a view to pull data from there.

If Microsoft is embarrassed by how slow it is to grab all that session data, and they should be, perhaps that’s a reasonable first step to having Query Store live up to its potential.

Most folks out there have no idea where to look for that stuff, and a lot of scripts that purport to get you detail are either wildly outdated, or are a small detail away from turning no results and leaving them frustrated as hell.

I know because I talk to them.

Blockhead


Blocking, by default, is not logged anywhere at all in SQL Server.

If you wanna get that, you have to be ready for it, and turn on the Blocked Process Report:

sp_configure 
    'show advanced options', 
    1;  
GO  
RECONFIGURE;  
GO  
sp_configure 
    'blocked process threshold', 
    10;  
GO  
RECONFIGURE;  
GO

Of course, from there you have to… do more to get the data.

Michael J Swart has a bunch of neat posts on that. For my part, I wrote sp_HumanEvents to help you spin up an Extended Event session to capture that.

Awful lot of prep work to catch blocking in a database with a pessimistic isolation level on by default, eh?

Left Out


If you want to take this to the next level, it could also grab CPU from the ring buffer, file stats, and a whole lot more. Basically everything other than PLE.

Never look at PLE.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Unkillable Sessions and Undetected Deadlocks

I recently experienced a blocking issue in a production environment which had been going on for hours. As a responsible DBA, I tried to kill the head of the blocking chain. Unexpectedly, killing the session seemed to have no effect. The blocking continued and the session seemed to stick around.

How to Create an Unkillable Session

It’s surprisingly easy to create a session that seemingly can’t be killed. Use good judgment in picking an environment to run this code against. First you’ll need to enable the “Ad Hoc Distributed Queries” configuration option if you haven’t done so already:

sp_configure 'Ad Hoc Distributed Queries', 1;
RECONFIGURE;
GO
sp_configure;

You can then run the following code after replacing the SERVER_NAME and INSTANCE_NAME placeholder values with the server name and instance name that you’re running against. In other words, you want the OPENROWSET call to point at the same server that you’re executing the query against:

BEGIN TRANSACTION;

SELECT TOP (1) [name]
FROM master.dbo.spt_values WITH (TABLOCKX);

SELECT TOP (1) d.[name]
FROM OPENROWSET('SQLNCLI', '{{SERVER_NAME}}\{{INSTANCE_NAME}};Trusted_Connection=yes;', master.dbo.spt_values) AS d;

This code seems to run forever. Trying to kill the session is not effective, even though there’s no error message and a line written to the error log claiming that the session was killed. Running kill with the STATUSONLY option gives us the following:

SPID 56: transaction rollback in progress. Estimated rollback completion: 0%. Estimated time remaining: 0 seconds.

Trying to cancel the query using SSMS also seems to have no effect as well.

The Undetected Deadlock

Running sp_whoisactive reveals a pretty alarming state:

Session 56 (the one that I can see in SSMS) is waiting on an OLEDB wait. Based on the documentation, I assume that you’ll get this wait type while waiting for the OPENROWSET call to complete:

Includes all connection information that is required to access remote data from an OLE DB data source. This method is an alternative to accessing tables in a linked server and is a one-time, ad hoc method of connecting and accessing remote data by using OLE DB.

I did get callstacks for the OLEDB wait using the wait_info_external extended event, but there were two problems. The first problem is that wait_info_external fires after the wait is complete and the OLEDB wait won’t finish. The second problem is that the callstacks that I did get weren’t very interesting.

Getting back on topic, I believe that the OLEDB wait for session 56 can only end once session 66 completes its work. Session 66 is the session that was created by the OPENROWSET call. However, session 66 needs an IS lock on the spt_values table. That is not compatible with the exclusive lock held on that object by session 56. So session 56 is waiting on session 66 but session 66 is waiting on session 56. This is an undetected deadlock. The deadlock can be resolved by killing session 66 (the OPENROWSET query).

Note that if you are crazy enough to try this on your own, you may see a MEMORY_ALLOCATION_EXT wait instead of an OLEDB wait. I don’t believe that there’s any meaningful distinction there, but don’t assume that a long running OLEDB wait is required for this problem to occur.

Final Thoughts

This blocking issue was unusual in that killing the blocking session doesn’t resolve the issue. I had to kill the session getting blocked instead. Thanks for reading!

Is Read Uncommitted (NOLOCK) An Optimistic Or Pessimistic Isolation Level?

Poll Worker


I ran a Twitter poll recently, because as I was writing another blog post, I wasn’t quite sure how to group uncommitted/nolock between pessimistic or optimistic isolation levels.

On the one hand, locking and blocking does still come into play with them. anyone who has seen locking on schema stability can attest to that. On the other hand, they will read just about anything you want. That’s sort of optimistic in spirit, but maybe not in letter.

Most of you kind folks out there called it optimistic, though some others had more colorful language to describe it:

At the end of the day, it’s probably not the right isolation to use, no matter what you choose to call it.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

How To Tell You Need An Optimistic Isolation Level In SQL Server

Into The Yonder


When you create a database in SQL Server (everything except Azure SQL DB), you get this garbage isolation level called Read Committed.

That isolation level, along with others like Repeatable Read and Serializable, are considered pessimistic. Though Repeatable Read and Serializable are less garbage, it comes with a strictness of locking that most applications don’t need across the board. They may need it for certain niche activities, but you know…

Then there are isolation levels that are quite useful for most isolation levels, and they’re called optimistic isolation levels. In SQL Server, they’re Snapshot Isolation (SI), and Read Committed Snapshot Isolation (RCSI).

I think they are very much not-garbage, and so do other major database platforms that use specific implementations of MVCC (Multi Version Concurrency Control) by default. There may be some historical reason for SQL Server not doing it by default, which is also garbage.

Differences


There are some differences between the two optimistic isolation levels, which makes them useful in different situations. Most people don’t need both turned on, which is something I see quite a bit, but there just might be someone out there who turns on and actually uses both.

To generalize a little bit:

  • SI is good when you only want certain queries to read versioned data
  • RCSI is good when you want every query to read versioned data

What’s versioned data? You can think of it like the “last known good” version of a row before a modification started.

When an update or a delete starts to change data, SQL Server will send those last known good versions up to tempdb for read queries to grab what they need rather than getting blocked. Inserts are a little different, because they are the only known good version of a row.

There are some other differences, too.

SI:

  • Can be turned on without exclusive access to the database
  • Queries all read data as it looked at the beginning of a transaction

RCSI:

  • Does need exclusive access to the database, but it’s not as bad as it sounds
  • Reads data as it looked when each query in a transaction starts

Getting exclusive access to the database can be done without the single-user/multi-user dance:

ALTER DATABASE YourDatabase
    SET READ_COMMITTED_SNAPSHOT ON
WITH ROLLBACK IMMEDIATE;

If You Know You Know


The reasons why you might want to turn these on are when your application performance suffers because of locking or deadlocking.

If you want some quick and dirty queries to figure out if you’ve got those happening, you can run these queries.

/*Lock waits?*/
SELECT
    dows.wait_type,
    dows.waiting_tasks_count,
    dows.wait_time_ms,
    dows.max_wait_time_ms,
    dows.signal_wait_time_ms
FROM sys.dm_os_wait_stats AS dows
WHERE dows.wait_type LIKE 'LCK%'
AND   dows.waiting_tasks_count > 0
ORDER BY dows.wait_time_ms DESC;

/*Deadlocks?*/
SELECT 
    p.object_name,
    p.counter_name,
    p.cntr_value
FROM sys.dm_os_performance_counters p
WHERE TRIM(p.counter_name) = 'Number of Deadlocks/sec'
AND   TRIM(p.instance_name) = '_Total';

If you need deeper analysis of waits or deadlocks, I’d suggest you use sp_BlitzFirst or sp_BlitzLock.

What you want to look for in general are when readers and writers are interfering with each other. If your blocking or deadlocking problems are between any kind of exclusive locks, optimistic isolation levels won’t help you.

Wait stats from readers will generally have an “S” in them, like LCK_M_S. The same goes for deadlocks, where the lock mode will have an S in either the owner or the waiter.

Reader Writer Fighter


It’s important to keep in mind that it’s not just writers that block readers, or writers that can deadlock.

This is where the “Shared” lock and lock mode stuff comes into play. Again, if all your locks and deadlocks are between modification queries — locks and lock modes with X (exclusive) or U (update) — they’ll still block each other.

There’s a lot more details at the linked posts above, but that’s the general pattern. Another pattern to look for is if your developers keep adding more and more NOLOCK hints to “fix performance issues”.

A lot of times they’re just covering up other issues with indexing or the way queries are written, or they’re totally misunderstood. I’ve said it before, but it doesn’t mean your query doesn’t take any locks, it means that your query doesn’t respect locks taken by other queries.

That often comes as a surprise to people when I tell them, so I say it whenever I write about it. But that’s where the bad reputation comes from — it can read all sorts of in-flight data that may not reflect reality.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Blocking on Columnstore Indexes that RCSI and NOLOCK Don’t Resolve

I recently ran into a production issue where a SELECT query that referenced a NOLOCK-hinted table was hitting a 30 second query timeout. Query store wait stats suggested that the issue was blocking on a table with a nonclustered columnstore index (NCCI). This was quite unexpected to me and I was eventually able to produce a reproduction of the issue. I believe this to be a bug in SQL Server that’s present in both RTM and the current CU as of this blog post (CU14). The issue also impacts CCIs as well but I did significantly less testing with that index type.

The Setup

First I’ll create a heap with 500k rows. All of the rows have the same value. I’ll also create an NCCI on the single column of the table.

DROP TABLE IF EXISTS dbo.TEST_NCCI_1;
CREATE TABLE dbo.TEST_NCCI_1 (
ID VARCHAR(10) NOT NULL
);

INSERT INTO dbo.TEST_NCCI_1
SELECT TOP (500000) '1'
FROM master..spt_values t1
CROSS JOIN master..spt_values t2;

CREATE NONCLUSTERED COLUMNSTORE INDEX NCCI ON dbo.TEST_NCCI_1 (ID) WITH (MAXDOP = 1);

The NCCI has a single compressed rowgroup with no rows in the delta store or delete buffer. It’s an honest table and I’m not trying to trick you:

We also need a table to join to. A heap with a single row will serve that need:

DROP TABLE IF EXISTS dbo.JOIN_TO_ME;
CREATE TABLE dbo.JOIN_TO_ME (
ID VARCHAR(10) NOT NULL
);

INSERT INTO dbo.JOIN_TO_ME VALUES ('1');

Suppose a different session has a long held exclusive table lock on TEST_NCCI_1. For example, the code below could be running in a different session with a yet to be committed transaction:

BEGIN TRANSACTION;

SELECT TOP (1) *
FROM dbo.TEST_NCCI_1 WITH (TABLOCKX);

Would you expect the following query to be blocked by the open transaction?

SELECT COUNT_BIG(*)
FROM dbo.TEST_NCCI_1 a WITH (NOLOCK)
Left Outer Join dbo.JOIN_TO_ME b WITH (NOLOCK) ON a.ID = b.ID;

Test Results

I ran 45 tests in total by varying the isolation level (SET TRANSACTION ISOLATION LEVEL) and the locking hint at the table level. You may be wondering why I bothered to do so. According to the documentation, a table level locking hint overrides the isolation level for read operations:

Only one of the isolation level options can be set at a time, and it remains set for that connection until it is explicitly changed. All read operations performed within the transaction operate under the rules for the specified isolation level unless a table hint in the FROM clause of a statement specifies different locking or versioning behavior for a table.

This is not what I observed which is part of why I believe that the current behavior in SQL Server should be considered a bug. In the table below, “YES” means that the query was blocked and “NO” means the query was able to execute. I used red font for the behavior which I believe to be in error based on my understanding of expected locking behavior in SQL Server.

To summarize the results:

  • The query is always blocked under the read committed, repeatable read, and serializable isolation levels. The table locking hint does not matter.
  • Blocking behavior works as expected under the read uncommitted and snapshot isolation levels.
  • RCSI does not help here. With RCSI, to avoid getting blocked you need to set the transaction isolation level to READ UNCOMMITTED.

I was able to use the lock_acquired extended event to see the problematic requested lock. In the screenshot below, the results are filtered to the object level. The first set of locks that occurred around the 11 second mark were under the read uncommitted isolation level. The second set of locks at around the 21 second mark where under the read committed isolation level. A table level NOLOCK hint was included for both queries.

To be clear, it is unexpected to see an IS object lock for a SELECT query with a table level NOLOCK hint. As far as I can tell, there are no interesting locks taken after the IS lock is acquired:

I attempted to investigate further by getting callstacks before the IS lock is acquired:

sqlmin.dll!XeSqlPkg::lock_acquired::Publish+0x228
sqlmin.dll!lck_lockInternal+0x1139
sqlmin.dll!LockAndCheckState+0x2c5
sqlmin.dll!GetHoBtLockInternal+0x445
sqlmin.dll!ColumnDataSetSession::WakeUp+0xa86
sqlmin.dll!NormalColumnDataSet::WakeUp+0xe
sqlmin.dll!ColumnDataSetSession::Create+0x183
sqlmin.dll!ColumnsetSS::WakeUpInternal+0x1ec
sqlmin.dll!ColumnsetSS::WakeUp+0x15e
sqlmin.dll!CreateDictionaryRowsetHelper+0x250
sqlmin.dll!CBpLocalDictionaryManager::GetSEStringDictionaryRowset+0x416
sqlmin.dll!CBpLocalDictionaryManager::FGetData+0xd1
sqlmin.dll!CBpDeepDataContextForBatch::GetStringValue+0x1d
sqlTsEs.dll!CTEsHashMultiData<167,1>::MdEsIntrinFn<CMDPureInput,CMDIteratorAllPureInputs>+0x13d
sqlTsEs.dll!CEsMDIntrinsicWrapper::UnaryImpl<&CTEsHashMultiData<167,1>::MdEsIntrinFn<CMDPureInput,CMDIteratorAllPureInputs>,&CTEsHashMultiData<167,1>::MdEsIntrinFn<CMDImpureInput,CMDIterator> >+0xd1
sqlTsEs.dll!CEsExecMultiData::GeneralEval+0x188
sqlTsEs.dll!CMultiDataEsRuntime::Eval+0x247
sqlmin.dll!CBpComputeMulticolumnHashRunTime::Eval+0x5b7
sqlmin.dll!CBpPartialJoin::ProcessProbeSide+0x445
sqlmin.dll!CBpQScanHashJoin::Main+0x148
sqlmin.dll!CBpQScanHashJoin::BpGetNextBatch+0x28
sqlmin.dll!CBpQScan::GetNextBatch+0x6f
sqlmin.dll!CBpChildIteratorScanner::BpGetNextBatch+0x12
sqlmin.dll!CBpQScanHashAggNew::ProcessInput+0x8b

I’m not terribly surprised to see a reference to a dictionary in the callstacks because I’m not able to reproduce the issue with an INT or BIGINT column. However, I have no idea where to go from here.

Workarounds

Adding an IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX query hint was an effective workaround for all of my test cases. I experienced the expected locking behavior for all test cases with that hint, so it is apparent to me that the issue is caused by the query plan reading from the NCCI.

Disabling batch mode using the DISALLOW_BATCH_MODE use hint seems to be an effective workaround as well, but I did not test this as thoroughly.

Changing the isolation level may also be an acceptable way to avoid the unexpected blocking for some scenarios. For example, if you already have a NOLOCK table hint in place then changing the transaction isolation level to read uncommitted may not introduce any new data correctness issues.

Final Thoughts

For some queries, a simple join on a VARCHAR column between a table with a columnstore index and another table can lead to unexpected blocking, even with a table level NOLOCK hint or with read committed snapshot isolation enabled for the database. This is frustrating and disappointing behavior from a concurrency perspective. It is an odd scenario where enabling snapshot isolation offers a significant benefit over enabling read committed snapshot isolation. I hope that this locking behavior changes in a future version of SQL Server. Thanks for reading!

My SQL Server Query Ran For A Long Time But Didn’t Use A Lot Of CPU: What Happened?

Of Walls And Clocks


No one ever says a broken record is right twice a day, perhaps because DJs are far more replaceable than clock makers.

I say that only to acknowledge that I may sound like a broken record when I say that when you’re tuning a query, it’s quite important to compare wall clock time and duration. Things you should note:

  • If CPU and duration were about equal in a serial plan, that’s normal
  • If CPU is much higher than duration in a parallel plan, that’s normal
  • If duration and CPU are about equal in a parallel plan, you’ve got yourself a situation
  • If duration is much higher than CPU in any plan, something else held your query up

In this post, I’m going to outline a non-exhaustive list of reasons why that last bullet point just might be.

Hammer Time


Big Data: One common reason you may run into is that you’re returning a large result set, either locally to SSMS, or to an app server that is either overloaded or underpowered. It’s also possible that something is happening on the application side that’s slowing things down. In these cases, you’ll usually see a lot of ASYNC_NETWORK_IO waits. To better test the speed of the actual query, you can dump the results into a #temp table.

Blocking: Another quite common issue is that the query you’re running is getting blocked. Before you go reaching for that NOLOCK hint, make sure you know what it does. Blocking is each to check on with sp_WhoIsActive. If you see your query waiting on waits that start with LCK_ Some common ones are LCK_M_SCH_S, LCK_M_SCH_M, LCK_M_S, LCK_M_U, LCK_M_X, LCK_M_IS, LCK_M_IU, LCK_M_IX. While your query is being blocked, it’s just gonna rack up wall clock time, while using zero CPU.

Stats updates: Once in a while I’ll run into a query that runs slowly the first time and fast the second time because behind the scenes the query had to wait on stats to update. It’s a bit hard to figure out unless you’re on SQL Server 2019, but it can totally make your query look like you sat around doing nothing for a chunk of time, especially if you’re waiting on large tables, or a bunch of smaller updates.

Reading from disk: If the tables or indexes you’re reading are bigger than your buffer pool, your queries are gonna eat it going to disk to read data in. If this is your limitation, you’ll see a lot of PAGIOLATCH_SH or PAGEIOLATCH_EX waits, depending on if your query is reading data, writing data, or both.

Waiting for a worker thread: When your queries can’t get a worker thread to run on, they end up waiting on THREADPOOL. While some waits on it are to be expected, you don’t want queries to wait long periods of time on this. It’s a serious sign your server is jammed way up.

Waiting for memory: When your queries can’t get the memory grant they want, they sit around waiting on RESOURCE_SEMAPHORE. Just like above, it’s a sure sign your server is having problems if it’s a wait you see occurring a lot, or for long average periods of time.

Waiting for a query plan: I know, this sounds like a long compilation time — and it sort of is — but only because the query is waiting a long time to get memory to compile an execution plan. This wait is going to be RESOURCE_SEMAPHORE_QUERY_COMPILE, instead.

Query Plan Compilation: Sometimes the optimizer gets a bit carried away and spend a whole bunch of time in search2 trying to reorder joins, and do other tricks to make your query faster. Under extreme circumstances, you might wait a really long time for that query plan. There’s no wait stats to tell you that, but if you look at your query plan’s properties (F4 key on the root operator), you can see the compile time.

SQL Server Query Plan
yeesh.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Steps For Getting Rid Of NOLOCK Hints In SQL Server Queries

Way Out


Whenever I see people using NOLOCK hints, I try to point out that they’re not a great idea, for various reasons explained in detail all across the internet.

At minimum, I want them to understand that the hint name is the same as setting the entire transaction isolation level to READ UNCOMMITTED, and that the hint name is quite misleading. It doesn’t mean your query takes no locks, it means your query ignores locks taken by other queries.

That’s how you can end up getting incorrect results.

That warning often comes with a lot of questions about how to fix blocking problems so you can get rid of those hints.

After all, if you get rid of them, your SQL Server queries will (most likely) go back to using the READ COMMITTED isolation level and we all know that read committed is a garbage isolation level, anyway.

Cause and Wrecked


An important thing to understand is why the hint was used in the first place. I’ve worked with some nice developers who slapped it on every query just in case.

There was no blocking or deadlocking. They just always used it, and never stopped.

Not that I blame them; the blocking that can occur under read committed the garbage isolation level is plain stupid, and no respectable database platform should use it as a default.

In many ways, it’s easier for a user to re-run a query and hopefully get the right result and shrug and mumble something about computers being awful, which is also true.

So, first step: ask yourself if there was ever really a blocking problem to begin with.

Bing Tutsby


Next, we need to understand where the blocking was coming from. Under read committed the garbage isolation level, writers can block readers, and readers can block writers.

In most cases though, people have added the hint to all of their queries, even ones that never participated in blocking.

  • If the problem was writers blocking writers, no isolation can help you.
  • If the problem was readers blocking writers, you may need to look at long running queries with Key Lookups

If the problem was writers blocking readers, you’d have to look at a few things:

If you have query store enabled, you can use sp_QuickieStore to search it for queries that do a lot of writes. If you don’t, you can use sp_BlitzCache to search the plan cache for them.

Best Case


Of course, you can avoid all of these problems, except for writers blocking writers, by using an optimistic isolation level like Read Committed Snapshot Isolation or Snapshot Isolation.

In the past, people made a lot of fuss about turning these on, because

  • You may not have tempdb configured correctly
  • You have queue type code that relied on blocking for correctness

But in reasonably new versions of SQL Server, tempdb’s setup is part of the install process, and the wacky trace flags you used to have to turn on are the default behavior.

If you do have code in your application that processes queues and relies on locking to correctly process them, you’re better off using locking hints in that code, and using an optimistic isolation level for the rest of your queries. This may also be true of triggers that are used to enforce referential integrity, which would need READCOMMITTEDLOCK hints.

The reason why they’re a much better choice than using uncommitted isolation levels is because rather than get a bunch of dirty reads from in-flight changes, you read the last known good version of the row before a modification started.

This may not be perfect, but it will prevent the absolute majority of your blocking headaches. It will even prevent deadlocks between readers and writers.

No, Lock


If your code has a lot of either NOLOCK hints or READ UNCOMITTED usage, you should absolutely be worried about incorrect results.

There are much better ways to deal with blocking, and I’ve outlined some of them in this post.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.