Four Non Mods
This post isn’t about modifying data. I don’t believe in doing that; get your data right the first time and you’ll avoid a lot of issues. You’ll also never need to worry about Merge.
I kid, of course. You’re doing great with it. Good for you.
This post also isn’t about things that oh-look-how-technically-correct-I-am may also write data via workfiles or worktables.
- Many to many merge joins
- Hash joins aggregates
- A bunch of other stuff that won’t make you popular to know about
This post is about one of my query plan frenemies: Spools. Spools of all variety, lazy and eager.
- Index spools
- Table spools (but not Stack Spools)
There are other kinds of spools too, like rowcount spools and window spools, but they’re different enough that I can’t generalize them in with other types.
For example, rowcount spools only keep a count; they don’t track a full data set. Sure, you may be able to rewrite queries when you see them, but this is about how they operate.
Ditto window spools. I typically don’t sweat those unless someone uses a window function and doesn’t specify the ROWS in the OVER clause. The default is RANGE, and uses a far less efficient disk-based spool.
With that out of the way, let’s part on with the other two.
What’s A Spool To Do
Spools are temporary structures that get stuck over in tempdb. They’re a bit like temp tables, though they don’t have any of the optimizations and enhancements. For example, loading data into a spool is a row-by-row operation.
The structure that spools use varies a bit. Table spools use a “clustered index”, but it’s not built on any of the columns in your data. Index spools use the same thing, but it’s defined on columns in your data that the optimizer thinks would make some facet of the query faster.
In both cases, these spools are used in an attempt to do less work on the inner sign of a nested loops join, either by:
- Table Spool: Reducing how many times the branch executes by only running for unique values
- Index Spool: Creating a more opportune index structure to seek to rows in
Lazy v. Eager
Lazy spools load data as requested, and then truncate themselves to honor a new request (except Lazy Index Spools, which don’t truncate).
In Spool operator properties, you’ll see things like “Rewinds” and “Rebinds”. You can think of rewinds like reusing data in the spool, and Rebinds like putting a new set of data in. You can sometimes judge the efficacy of a Lazy Table Spool by looking at actual rebind vs. rewinds.
If rebinds are and rewinds are close in count, it may not have been an effective spool. These numbers for Lazy Index Spools are almost useless. Don’t look at them.
Eager spools load all the data at once. Where you have to be careful with them is when you see Eager Index spools on large tables.
- No missing index request
- Data is loaded on a single thread even in a parallel plan
- Data is loaded row by row
- Index is thrown out when the query finishes
Look, these are bad traits. They’re so bad I’ve dedicated a lot of blog space to writing about them:
Thanks for reading!
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount on to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.