The Hardest Part Of Query Tuning: Logical Equivalence

Grey Matter

Often when query tuning, I’ll try a change that I think makes sense, only to have it backfire.

It’s not that the query got slower, it’s that the results that came back were wrong different.

Now, this can totally happen because of a bug in previously used logic, but that’s somewhat rare.

And wrong different results make testers nervous. Especially in production.

Here’s a Very Cheeky™ example.

Spread’em

This is my starting query. If I run it enough times, I’ll get a billion missing index requests.

For the sake of argument, I’ll add them all. Here they are:

With all those indexes, the query is still dog slow.

Maybe It’s Me

I’ll take my own advice. Let’s break the query up a little bit.

Still dog slow.

Variability

Alright, I’m desperate now. Let’s try this.

Let’s get some worst practices involved. That always goes well.

Except here.

Getting the right results seemed like it was destined to be slow.

Differently Resulted

At this point, I tried several rewrites that were fast, but wrong.

What I had missed, and what Joe Obbish pointed out to me, is that I needed a cross join and some math to make it all work out.

This finishes instantly, with the correct results.

The value of a college education!

Realizations and Slowness

After thinking about Joe’s rewrite, I had a terrible thought.

All the rewrites that were correct but slow had gone parallel.

“Parallel”

Allow me to illustrate.

In a row?

Repartition Streams usually does the opposite.

But here, it puts all the rows on a single thread.

“For correctness”

Which ends up in a 236 million row parallel-but-single-threaded-cross-hash-join.

Summary Gates Are On My Side

Which, of course, is nicely summarized by P. White.

SQL Server uses the correct join (inner or outer) and adds projections where necessary to honour all the semantics of the original query when performing internal translations between apply and join.

The differences in the plans can all be explained by the different semantics of aggregates with and without a group by clause in SQL Server.

What’s amazing and frustrating about the optimizer is that it considers all sorts of different ways to rewrite your query.

In milliseconds.

It may have even thought about a plan that would have been very fast.

But we ended up with this one, because it looked cheap.

Untuneable

The plan for Joe’s version of the query is amazingly simple.

Bruddah.

Sometimes giving the optimizer a different query to work with helps, and sometimes it doesn’t.

Rewriting queries is tough business. When you change things and still get the same plan, it can be really frustrating.

Just know that behind the scenes the optimizer is working hard to rewrite your queries, too.

If you really want to change the execution plan you end up with, you need to present the logic to the optimizer in different ways, and often with different indexes to use.

Other times, you just gotta ask Joe.

Thanks for reading!

1 thought on “The Hardest Part Of Query Tuning: Logical Equivalence”

Leave a Reply

Your email address will not be published. Required fields are marked *