Server Tuning – Darling Data

How To Find Poorly Performing SQL Server Queries To Tune Using Query Store

Posted on April 11, 2023March 12, 2023 by Erik Darling

Stick And Move

If you take performance for your SQL Servers seriously, you should be using Query Store for your business critical databases.

I used to say this about third party monitoring tools, but the landscape for those has really tanked over the last few years. I used to really love SQL Sentry, but it has essentially become abandonware since SolarWinds bought SentryOne.

At this point, I’m happier to enable query store, and then use a couple extended events to capture blocking and deadlocks. While it would be stellar if Query Store also did that, for now life is easy enough.

To analyze blocking and deadlock Extended Events, I use:

sp_BlitzLock for deadlocks
sp_HumanEventsBlockViewer for blocking

This won’t capture absolutely everything, but that’s okay. We can usually get enough to go on with those three things. If you have bad blocking and deadlocking problems, you should start there.

But once you turn on Query Store, where do you go?

Gooey

If you’re okay with all the limitations of the GUI, you can tweak a few things to get more useful information out of it.

I usually start with the Top Resource Consuming Queries view, since, uh… those are usually good things to tune.

SQL Server Query Store — top resource consuming plans

But the crappy bar graph that Query Store defaults to is not what you want to see. There’s way too much jumping around and mousing over things to figure out what’s in front of you.

I like switching to the grid format with additional details view, by clicking the blue button like so:

But we’re not done yet! Not by a long shot. The next thing we wanna do is hit the Configure button, and change what we’re looking at. See, the other crappy thing is that Query Store defaults to showing you queries by total duration.

What ends up being in here is a bunch of stuff that runs a lot, but tends to run quickly. You might get lucky and find some quick wins here, but it’s usually not where the real bangers live.

To get to those, we need to hit the Configure button and make a couple tweaks to look at queries that use a lot of CPU on average, and push the time back from only showing the last hour to the last week or so.

You can go back further, but usually the further you go back, the longer it takes to get you results.

The problem here is that you can often get back quite a bit of noise that you can’t filter out or ignore. Here’s what mine looks like:

We don’t really need to know that creating indexes took a long time. Substitute those with queries you don’t necessarily care about fixing, and you get the point.

You can sort of control this by only asking for queries with a certain number of plans to come back, but if your queries aren’t parameterized and you have a lot of “single use” execution plans, you’ll miss out on those in the results.

This filter is available under the Configuration settings where we changes the CPU/Average/Dates before.

The major limitation of Query Store’s GUI is that you can’t search through it for specific problems. It totally could and should be in there, but as of this writing, it’s not in there.

That’s where my stored procedure sp_QuickieStore comes in.

Scripted, For Your Pleasure

The nice thing about sp_QuickieStore is that it gets rid of a lot of the click-clacking around to get things set up. You can’t save your Query Store GUI layout to open up and show you what you want every time, you have to redo it.

To get us to where we were with the settings above, all we have to do is this:

EXEC sp_QuickieStore
    @execution_count = 5;

By default, sp_QuickieStore will already sort results by average CPU for queries executed over the last week of Query Store data. It will also filter out plans for stuff we can’t really tune, like creating indexes, updating statistics, and waste of time index maintenance.

You’ll get results that look somewhat like so:

There are a number of things you can do with to include or ignore only certain information, too:

@execution_count: the minimum number of executions a query must have
@duration_ms: the minimum duration a query must have
@execution_type_desc: the type of execution you want to filter
@procedure_schema: the schema of the procedure you’re searching for
@procedure_name: the name of the programmable object you’re searching for
@include_plan_ids: a list of plan ids to search for
@include_query_ids: a list of query ids to search for
@ignore_plan_ids: a list of plan ids to ignore
@ignore_query_ids: a list of query ids to ignore
@include_query_hashes: a list of query hashes to search for
@include_plan_hashes: a list of query plan hashes to search for
@include_sql_handles: a list of sql handles to search for
@ignore_query_hashes: a list of query hashes to ignore
@ignore_plan_hashes: a list of query plan hashes to ignore
@ignore_sql_handles: a list of sql handles to ignore
@query_text_search: query text to search for

You straight up can’t do any of that with Query Store’s GUI. I love being able to focus in on all the plans for a specific stored procedure.

Thanks for reading!

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

SQL Server Community Tools: The Wrap Up And Combined Link

Posted on October 3, 2022September 21, 2022 by Erik Darling

I am a heading

Over the past month (plus or minus a couple days), I’ve shown you in a series of quick posts how I use different SQL Server Community Tools that are free and open source to troubleshoot SQL Server issues.

Here’s the full list of posts.

Thanks for reading!

Going Further

SQL Server Community Tools: Why Does sp_PressureDetector Make A Big Deal About The Remote DAC?

Posted on September 16, 2022August 25, 2022 by Erik Darling

Subject Matter

I’m gonna throw this at you fast, because it’s Friday. If you’re using sp_PressureDetector in a SQL Server performance crisis, there are two things you don’t want:

Slow results from running diagnostic queries
Diagnostic queries waiting in the same CPU and memory lines as user queries

That’s why one of the first things does is see if the Remote DAC is enabled, and urge you to turn it on, like so:

EXEC sp_configure 
    'remote admin connections', 
    1; 

RECONFIGURE;

Why bother? Well, the Remote DAC is sort of like a VIP entrance to Club SQL Server. You get your own little connection and set of resources that are unfettered by lowly user queries that are most-definitely-not-on-the-guest-list.

There are a couple caveats to being on the DAC list:

You have to use an admin account to access it
Only one person can use it at a time

So what is it, and how do you use it?

Connectivity

To use the DAC, all you have to do is add the prefix “ADMIN:” to your SSMS connection:

SQL Server DAC Connection SSSMS — i disconnect from you

Once that’s done, you just connect as normal and you’re good to go. There was a longstanding bug fixed recently in SSMS, where an error message would show up that said you’d not connected using the Remote DAC, even though you were.

This makes running diagnostic queries when your server is Having A Bad Day© much less prone to either adding to or being trampled by performance issues.

And also, uh, it’s just generally a good thing to have enabled for situations where you can’t connect to SQL Server in any of the normal ways, and you maybe want to try to see what the problem is rather that just crawl out of bed to restart the server.

So yeah. Good, that. Enable it.

Thanks for reading!

Going Further

SQL Server Community Tools: sp_PressureDetector Doesn’t Show You Irrelevant Waits

Posted on September 15, 2022September 15, 2022 by Erik Darling

Which Wait Did They Go?

I tried to really focus sp_PressureDetector on things that really matter to overall SQL Server performance, and specifically related to CPU and memory pressure.

It seemed easier to include rather than exclude waits, since the list always seems to keep growing. I also wanted to include a little decoder ring to help you understand what each wait means.

CMEMTHREAD: Tasks waiting on memory objects
CXCONSUMER: Parallelism
CXPACKET: Parallelism
CXSYNC_CONSUMER: Parallelism
CXSYNC_PORT: Parallelism
PAGEIOLATCH_EX: Modifications reading pages from disk into memory
PAGEIOLATCH_SH: Selects reading pages from disk into memory
RESOURCE_SEMAPHORE: Queries waiting to get memory to run
RESOURCE_SEMAPHORE_QUERY_COMPILE: Queries waiting to get memory to compile
SOS_SCHEDULER_YIELD: Query scheduling
THREADPOOL: Worker thread exhaustion

I know this seems really limited, but… Uh. What else would you want, here? I could show locking waits, but they aren’t really related to CPU or memory pressure.

Sure, if enough (probably parallel) queries get blocked, it can lead to THREADPOOL waits, but if that’s the cause then you’ll see all the blocking going on in the running queries results.

Seeing a bunch of lock waits up front is just a distraction from the CPU and memory pressure issues I’m trying to surface, and there’s really nothing specifically hardware related to fixing locking problems.

But anyway, if you feel like there are relevant waits missing from the list, feel free to leave a comment or open an issue on GitHub.

Going Further

SQL Server Community Tools: sp_PressureDetector Shows You Running Queries Taking Up CPU And Memory

Posted on September 14, 2022August 25, 2022 by Erik Darling

One For The Money

This post describes a little more of the philosophy behind sp_PressureDetector more than anything technical about it. Maybe.

I write this at the outset, but who knows where it’ll end up. Might even delete the whole thing and look at expensive cheese.

ANYWAY! If you’ve been following my posts this week, you’ll be painfully aware that sp_PressureDetector looks at things in your SQL Server that could indicate CPU or memory pressure.

Of course, very few servers will experience either of those issues without queries running, and so my dear friend sp_PressureDetector will show you which queries are in the danger zone.

And since it may be different queries causing problems, I show you two different sets of queries in two different ways.

Two For The Memory

The memory grant section shows you queries ordered by highest current memory grant. It will show you stuff you’re accustomed to, like session id, database, query text, query plan, and all that stuff, but it’ll also break down important stuff about what’s going on with memory.

SQL Server Memory Grant Query — fourquery

There are four queries running on my demo VM. Three of them asked for and received memory grants, but one of them didn’t!

The first three queries have all requested and been granted memory and are slowly using it
The fourth query hasn’t gotten a grant yet, but is sitting in a queue waiting for memory to free up

While that fourth query waits for memory, it’s sitting around on RESOURCE_SEMAPHORE.

You may not always see this happening, but even without hitting the global limit for memory grants, you can still get pretty valuable data about queries running asking for memory.

Three To Get CPU Ready

Just like the memory grant query output, the CPU section will show you familiar and helpful columns to identify and analyze the query, but the section that will show you which ones are chewing up CPU is here:

In this cap, there are five queries running (all of them are parallel). You can see associated waits, cpu time, elapsed time, and other resource usage.

You can also see just how parallel a query went. Because there is only one “branch” in this query plan, DOP and parallel worker count are equal. In larger plans, worker count can be a multiple of DOP.

SQL Server Parallel Query Plan — woah mama

In this query plan, there are four parallel exchanges, which means four branches, so we get four x dop 8 parallel worker threads, totaling 32. Only 8 can be active concurrently, but all 32 are reserved by the query until it completes

On really overworked servers, SQL Server will downgrade DOP to preserve worker threads. Screenshots don’t really do that justice, but I demo it at the main page for sp_PressureDetector in the video.

Thanks for reading!

Going Further

SQL Server Community Tools: Detecting Memory Pressure In SQL Server With sp_PressureDetector

Posted on September 13, 2022August 25, 2022 by Erik Darling

Sponsor

People are always surprised by how SQL Server uses, and misuses, memory. First, there are misconceptions about how much memory Standard Edition can use — everyone thinks it’s 128GB — but that’s just for the buffer pool.

Sometimes they have no idea that…

SQL Server caches data pages in memory (buffer pool)
SQL Server doesn’t work with pages on disk (except to read them into memory)
SQL Server queries can ask for TONS of memory that gets pulled from the buffer pool
SQL Server uses memory to manage other things, like locks, caching query plans, compressed backups, and more

So that’s the type of stuff that I thought would be good to expose with sp_PressureDetector.

To look at just memory stuff with it, run this:

EXEC sp_PressureDetector
    @what_to_check = 'memory';

Simple as.

Trudging

When SQL Server is under memory pressure, it can show up in a few different places, just like CPU pressure.

The arrows are pointing to two wait types that are directly related to queries waiting on memory. Read the description column for more detail ;^}

But that’s not the whole story, because like I said above, SQL Server uses memory for a bunch of stuff. If there’s memory pressure on the server, you’ll also see SQL Server having to go out to disk a whole bunch.

Stolen

SQL Server surfaces what things are consuming memory, too. I show that early on in the results.

My server isn’t very busy at the moment, so these numbers are fairly low, but this gives you a pretty good breakdown of:

How much memory is dedicated to the buffer pool
How much memory is stolen from the buffer pool
How much memory other consumers are taking up

You can match some of that up with other results that get returned that show more detail on query memory grants:

It doesn’t match exactly because there’s a little time between when each query runs, but the granted_memory and used_memory columns are close-enough to the MEMORYCLERK_SQLQERESERVATIONS and Stolen Server Memory (KB) lines in the other results shown here.

If you’re paying really close attention, you may notice that 17GB of memory has been given to three queries, and 197 queries are waiting on memory.

In tomorrow’s post, we’ll look at how sp_PressureDetector surfaces queries that are most likely involved in CPU and memory pressure.

Thanks for reading!

Going Further

SQL Server Community Tools: Detecting CPU Pressure In SQL Server With sp_PressureDetector

Posted on September 12, 2022August 25, 2022 by Erik Darling

Playing Favorites

I absolutely adore sp_PressureDetector. It’s short, it’s sweet, and it returns so many great details about what sort of pressure a SQL Server is under.

Today, we’re going to look at various ways that CPU pressure can be exposed.

You know, those expensive things that you license from Microsoft that make your database run?

They seem important.

Sizzling

There are precious few parameters to sp_PressureDetector. The only one you might use is @what_to_check.

EXEC sp_PressureDetector
    @what_to_check = 'cpu';

By default, the value is “both” — meaning you check CPU and memory — but you can choose to check one or the other.

Running that is going to show you all of the following things, as long as you’re on the latest version.

First, you might see signs in wait stats:

My demo VM hasn’t been up terribly long, and I threw a ridiculous CPU workload at it. Basically one parallel query that exhausts worker threads.

All of the waits there can be signs that your server CPU is overworked. They’re not too bad here, but if the hours_wait_time column is much greater than the hours_uptime column, that could be a pretty good indication.

Of course, because I’m throwing a horrible parallel workload at it, some of the other sections are gonna have really obvious problems.

Take this section, for instance.

The negative available_threads column, plus the high runnable columns. Having lots of runnable queries means you have a lot of queries waiting to get on/back on a CPU.

Long lines there can mean that your CPUs are way too busy.

When things are really bad, you might see a bunch of queries that are waiting a really long time to get a CPU, resulting in gobs of THREADPOOL waits.

These are the places that signs of CPU pressure can prop up. If you need help fixing that, young and good looking consultants are standing by.

Thanks for reading!

Going Further

Why Parallel Queries are Sometimes Slow In SQL Server

Posted on August 24, 2022August 24, 2022 by Joe Obbish

You may have noticed large variations in elapsed time for parallel queries while performing query tuning or observing a production workload. This post reviews some of the possible explanations for those variations. I’m testing on SQL Server 2022 RC0 but this behavior can be observed on all currently supported versions of SQL Server.

The Setup

The test data is very simple. Insert ten million sequential integers into a single column table with a clustered index:

CREATE TABLE dbo.ParallelTest (
Id BIGINT NOT NULL,
CONSTRAINT PK_ParallelTest PRIMARY KEY (Id)
);

INSERT INTO dbo.ParallelTest (Id)
SELECT [value]
FROM GENERATE_SERIES(1, 10000000);

Now consider the following query:

SELECT SUM(a.ID + b.ID) s
FROM dbo.ParallelTest a
INNER JOIN dbo.ParallelTest b ON a.ID = b.ID
OPTION (USE HINT ('DISALLOW_BATCH_MODE'));

The query optimizer naturally picks a merge join for this query. The self-joined tables obviously have the same row count and the data is already sorted. That makes this query an ideal candidate for a serial merge join from the optimizer’s point of view . The USE HINT in the query isn’t particularly important and the example works without it, but its presence will be explained later. Here’s what the query plan looks like for me:

The query takes 2721 ms of CPU time and elapsed time to execute on my machine. It’s reasonable for CPU and elapsed time to match here because there’s nothing for the query to wait on. It’s the only user query running against the SQL instance. Next I’ll force a parallel merge join query plan by changing the query hints to the following:

SELECT SUM(a.ID + b.ID) s
FROM dbo.ParallelTest a
INNER JOIN dbo.ParallelTest b ON a.ID = b.ID
OPTION (MAXDOP 8, MERGE JOIN, USE HINT('ENABLE_PARALLEL_PLAN_PREFERENCE', 'DISALLOW_BATCH_MODE'));

I chose MAXDOP 8 because my test machine has 8 physical CPU cores. This is an unnatural query plan with a higher parallel cost than serial cost. The query optimizer only picked it because it was forced to. Note the presence of the exchange operators with a relatively high estimated cost:

The parallel query takes 6465 ms of CPU time and 1723 ms of elapsed time to execute on my machine. It finished about 33% faster than the serial query but CPU time more than doubled. The exchange operators added a large overhead for this query. Next I’ll run the same query again but I’ll make one of my CPUs busier than before. Consider the following infinite loop coded as a stored procedure:

CREATE OR ALTER PROCEDURE #p AS
SET NOCOUNT ON;
DECLARE @i INT;
WHILE 1 = 1
SET @i = 0;
GO

EXEC #p;

This code does not accomplish anything besides efficiently using CPU resources, which is exactly what I need for my test. I’m running the merge query at MAXDOP 8 on a machine with 8 CPU cores so I’m pretty much guaranteed to see scheduler contention for some of the workers of the parallel query. They will compete for CPU resources with the infinite loop.

Take a moment to predict the performance penalty that the parallel query will experience due to the busy scheduler. Will it be 2X slower? 3X slower? Worse? For a few hints, recall that hash distributed exchange operators will generally try to force an equal amount of work to each thread. Also recall that a parallel merge join requires order-preserving exchange operators. I’ve included Erik running at MAXDOP 8 to hopefully prevent you from accidentally seeing the answer before you’ve finalized your prediction.

With the busy scheduler, the query now takes 6818 ms of CPU time and 31745 ms of elapsed time to execute on my machine. The same code is over 18X slower simply due to the highly efficient stored procedure that’s running on the same SQL instance. Here are the query wait stats in case they are of interest:

<Wait WaitType=”CXPACKET” WaitTimeMs=”454673″ WaitCount=”5262″ />
<Wait WaitType=”CXSYNC_PORT” WaitTimeMs=”32539″ WaitCount=”41″ />
<Wait WaitType=”CXSYNC_CONSUMER” WaitTimeMs=”43″ WaitCount=”14″ />

scheduler_ring_buffer_recorded

This is an inappropriately brief introduction to a complex topic, but workers cooperate with each other by sharing time on schedulers. Under normal conditions, a worker will execute code on the scheduler for up to 4 milliseconds before yielding. The worker yields in case there’s another worker waiting to execute on the scheduler. When there’s not the original worker immediately starts executing code again. An example of when a worker yields early is if it needs to wait for an I/O operation.

scheduler_ring_buffer_recorded is the name of a debug extended event that you should probably never use. I often don’t follow my own advice, so I used it to get an idea of how often workers were yielding to the scheduler for the merge join queries. I only looked at action types of “SCHEDULER_NONPREEMPTIVE_RESUME” and I admit that this analysis is quite approximate. For the MAXDOP 1 query, there was a total of 705 events over 2872 milliseconds. In other words, the average time spent on the scheduler before yielding is close to the 4 ms soft maximum. This should be expected for the serial query if all of the data is in the buffer pool. There is no memory grant for the query (the reason for the USE HINT) and there’s not really any reason to yield early.

The parallel query had significantly more yields. Looking at one worker thread as an example, it had 14862 events over 1714 milliseconds. In other words, it had around 35 times the number of scheduler yields compared to the MAXDOP 1 query. The higher number of yields is a performance liability when competing with workers from other queries on the same scheduler. The infinite loop stored procedure dominates the workers from the parallel and gets around 97% of the cpu time on the scheduler. That’s a big part of why the parallel query is 18X slower than before.

Note that you can also get this information by looking at the yield_count column of the sys.dm_of_schedulers DMV, but those numbers are server wide and aren’t isolated to one query. With that said, I did get reasonably similar results when comparing that column to the extended event results.

To summarize, parallel query performance overhead isn’t as simple as just the direct overhead of the exchange operators. A query changing from serial to parallel can result in completely different interactions between the workers and the scheduler. The serial query executed its code 4 milliseconds at a time. The parallel query had to yield significantly more often. This type of change isn’t something that you can directly observe in a query plan.

Why Parallel Queries are Sometimes Slow

There are many reasons why a parallel query may not be as efficient as you hoped. I find these to be the most common:

1) There is an overhead associated with parallel exchange operators which is not present in serial queries. Moving data between worker threads requires additional CPU time which adds to the overall execution time of the query. I believe this is what is described by the following quote from the documentation:

Simple queries, frequently found in transaction processing applications, find the additional coordination required to execute a query in parallel outweigh the potential performance boost.

2) There’s some kind of additional coordination required between threads which delays query completion. For example, an order-preserving repartition streams operator is generally worse for performance than one that doesn’t need to preserve order. A different type of example can be seen in my blog post about slow parallel scans caused by cloud storage.

3) Work may be unevenly distributed between worker threads. For example, an eager index spool can only be built with one thread. Other threads may be idle while that temporary index is built. For a different example, a filtered parallel scan of a table may not output the same number of rows to all threads. All else being equal, a parallel query that cannot use all of the CPUs allowed by MAXDOP is not as efficient as one that can.

4) Work may be evenly distributed between worker threads but some of the CPUs are busier than others. Some parallel queries are as slow as their busiest CPU, even when queries are evenly sharing time on the scheduler. Of course, dramatic performance penalties can be seen when the parallel query gets an uneven amount of time on the scheduler (like in the demo in this blog post).

Mitigations

Individual parallel query performance can degrade due to contention on a single scheduler that’s used by one or more of its workers. You are more likely to run into problems with higher DOP or a higher number of concurrent queries. Lowering MAXDOP decreases the odds that you’ll run into the worst type of contention because the query uses fewer schedulers. Decreasing the number of queries decreases contention per scheduler which also helps.

To explain it in a different way, the simplest way to reduce problems with parallel queries is to have fewer worker threads per scheduler. This is of course a balancing act, but it’s one that you must perform for some workloads. Set Cost Threshold for Parallelism and server/database/resource governor MAXDOP appropriately. When possible, don’t run too many concurrent queries for your hardware.

For very important queries for which performance is critical, consider changing to a type of parallelism that isn’t as susceptible to scheduler contention issues. The two most common methods are a parallel apply pattern, perhaps using demand partitioning, or batch mode execution.

Batch mode is yet another complicated topic that I’m not going to explain here, but it uses a fundamentally different method of parallelism compared to row-mode execution which is generally much more robust against problems caused by busy schedulers. Below is a comparison of elapsed times with and without the infinite loop stored procedure running:

The elapsed time of the parallel batch mode query only changes slightly. This is also a nice property when considering overall server scalability.

Final Thoughts

I was going to originally blog about DOP feedback, but it’s taking a while for my feedback to kick in. Thanks for reading!

Updates To sp_PressureDetector

Posted on July 22, 2022July 4, 2022 by Erik Darling

All Good Things

I haven’t run into a bug with this thing in just about… Well, ever. This proc has been pretty solid.

As I use it though, especially in front of clients, I find myself wanting some additional data points all in one place. With this release, I’ve added a few small things.

Total physical memory in the server (not just max server memory)
CPU details (NUMA nodes, schedulers; if they’re off line, etc.)
A debug mode for the dynamic SQL

The CPU details column is an XML clickable that will look something like this:

<cpu_details>
  <offline_cpus>0</offline_cpus>
  <cpu_count>8</cpu_count>
  <hyperthread_ratio>8</hyperthread_ratio>
  <softnuma_configuration_desc>OFF</softnuma_configuration_desc>
  <socket_count>1</socket_count>
  <cores_per_socket>4</cores_per_socket>
  <socket_count>1</socket_count>
</cpu_details>

This is an example from my test VM. Please don’t think less of me.

You can find out more, and download the script over here.

Thanks for reading!

Going Further

What’s The Point Of DOP Feedback In SQL Server 2022?

Posted on June 21, 2022June 29, 2022 by Erik Darling

Loud Engine

If you’re ever on a long flight and want something to fall asleep to, ask a DBA how to set MAXDOP.

Sometimes I even ask myself that question when I’m tossing and turning at night.

There are a lot of things to consider when fixing settings globally across a workload. For parallelism, it’s less about individual query performance, and more about overall server/workload performance

After all, letting every query go as absolutely parallel as possible is only good up to a point; that point is usually when you start regularly running out of worker threads, or your CPUs could double as crematoriums.

Setting MAXDOP is about limiting the damage that a parallel workload can do to a server. The expectation is that a query running at DOP 8 will run 8x faster than a query running at DOP 1.

But setting MAXDOP for every query isn’t something you catch even the most persnickety performance tuners doing. Perhaps some of the more critical ones, but you know…

Let Me Rust

I’m not going to demo DOP feedback in this post, I’m just going to show you the situation that it hopes to improve upon.

To do that, I’m going to run a simple aggregation query at different degrees of parallelism, and show you the changes in query timing.

At DOP 1:

The query runs for 1.1 seconds, with 886ms consumed while scanning the Posts table.

At DOP 2:

The query runs just about twice as fast, starting with the scan of the Posts table taking about half as long. This is good scaling. Add one CPU, go twice as fast as you did with one CPU.

At DOP 4:

The gets about twice as fast again! The scan of the Posts table is now down to 263ms, and the query in total is at 330ms. Adding in two more cores seems a good choice, here.

At DOP 8:

The query no longer continues to get 2x faster. This isn’t a knock against DOP 8 in general; my query just happens to hit a wall around DOP 4. With 4 additional CPUs, we only save ~130ms at the end of the day.

Why This Is Cool

This new feature will help DBAs have to worry less about getting MAXDOP absolutely right across the board. Who knows, we may even see a day where MAXDOP is left at zero.

But you’d never skip that installer step, would you?

Anyway, for anyone out there who is paranoid about setting DOP too high, this should help your queries find a more approximately-good middle ground.

Hopefully it works as advertised.

Thanks for reading!