Spills Week: When Sort Spills Start To Hurt SQL Server Performance

Imbalance


In yesterday’s post, we looked at a funny situation where a query that spilled was about 5 seconds faster than one that didn’t.

Here’s what the query looked like:

SELECT x.PostId
FROM (
SELECT v.PostId, 
       ROW_NUMBER() OVER ( ORDER BY v.PostId DESC ) AS n
FROM dbo.Votes AS v
) AS x
WHERE x.n = 1;

Now, I can add more columns in, and the timing will hold up:

SELECT x.Id, 
       x.PostId, 
	   x.UserId, 
	   x.BountyAmount, 
	   x.VoteTypeId, 
	   x.CreationDate
FROM (
SELECT v.Id, 
       v.PostId, 
	   v.UserId, 
	   v.BountyAmount, 
	   v.VoteTypeId, 
	   v.CreationDate,
       ROW_NUMBER() OVER ( ORDER BY v.PostId DESC ) AS n
FROM dbo.Votes AS v
) AS x
WHERE x.n = 1;
SQL Server Query Plan
Gas Pedal

They both got slower, the non-spill plan by about 2.5s, and the spill plan by about 4.3s.

But the spill plan is still 3s faster. With fewer columns it was 5s faster, but hey.

No one said this was easy.

Fully comparing things from yesterday, when memory is capped at 0.0, the query takes much longer now, with more columns:

SQL Server Query Plan
Killing Time

To compare the “fast” spills, here’s yesterday and today’s warnings.

SQL Server Query Plan
More Pages, More Problems

With one integer column, we spilled 100k pages.

With five integer columns and one datetime column, we spill 450k pages.

That’s a non-trivial amount. That’s like every column adding 75k pages to the spill.

If you’re really worried about spills: STOP SELECTING SO MANY COLUMNS.

For The Worst


I promised to show you things going quite downhill, and for the spill query to no longer be faster.

To do that, we need a different table.

I’m going to use the Comments table, because it has a column called Text in it, which is an NVARCHAR(700).

Very few comments are 700 characters long. The majority are < 120 or so.

SQL Server Query Results
5-7-9

This query looks about like so:

SELECT x.Id, 
       x.CreationDate, 
	   x.PostId, 
	   x.Score, 
	   x.Text, 
	   x.UserId
FROM (
SELECT c.Id, 
       c.CreationDate, 
	   c.PostId, 
	   c.Score, 
	   c.Text, 
	   c.UserId,
       ROW_NUMBER() 
           OVER ( ORDER BY c.PostId DESC ) AS n
FROM dbo.Comments AS c
) AS x
WHERE x.n = 1

And the results are… icky.

SQL Server Query Plan
Gigs To Spare?

The top query asks for 9.7GB of RAM. That’s as much as my laptop can give out.

It still spills. Nearly 10GB of memory grant, and it still spills.

If you care about spills: STOP OVERSIZING STRING COLUMNS:

SQL Server Query Plan
Billy Budd

Apparently only spilling 1mm pages is a lot faster than spilling 2.5mm pages.

But still much slower than not spilling string columns.

Who knew?

Matters of Whale


I was using the Stack Overflow 2013 database for that, which is fairly big relative to the 64GB of RAM my laptop has.

If I go back to using the 2010 version, we can get a better comparison, because the first query won’t spill anymore.

SQL Server Query Plan
It’s like what all those query tuners keep telling you.

Some points to keep in mind here:

  • I’m testing with (very fast) local storage
  • I don’t have tempdb contention

But still, it seems like spilling out non-string columns is significantly less painful than spilling out string columns.

Ahem.

“Seems.”

I’ll reiterate two points:

  • Stop selecting so many columns
  • Stop oversizing string columns

In the next two posts, we’ll look at hash match and hash join spills under similar circumstances.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.