What Indexes Don’t Do: Redux

This is a re-post of What Indexes Don’t Do, since apparently WordPress is showing different versions to different people based on the eternal whatever

You Fine And All

You can read a lot about how indexes might improve queries. But there are certain query mechanics that indexes don’t solve for, and you can not only get stuck with an index that isn’t really helping, but all the same query problems you were looking to solve.

Part Of The Problem

Indexes don’t always understand your queries. There are very definite ways that you can encourage them to, and probably even more definite ways for you to discourage them from understanding your queries. Even with syntax that seems completely transparent, you can get into trouble. Take the simple example of two date columns: there’s nothing in your index that tracks how two dates in a row relate to each other. Nothing tracks how many years, months, weeks, days, hours, minutes, seconds, milliseconds, or whatever magical new unit Extended Events has to make itself less usable. Heck, it doesn’t even track is one is greater or less than another.

Soggy Flautas

Ah, heck, let’s stick with Stack Overflow. Let’s even create an index. Now let’s look at these super important queries. How much work will the optimizer think it has to do, here? How many rows will it estimate? How will it treat different queries, well, differently? If you said “it won’t”, you’re a smart cookie. Aside from the “Actual rows”, each plan has the same attributes across.
And I’m Homosapien like you
And we’re Homosapien too

Treachery

Neither of these plans is terrible on its own. The problem is really in larger plans, where bad decisions like these have their way with other parts of a query plan. Nasty little lurkers, they are, because you expect things to be fine when creating indexes, and writing sargable predicates. Yet for both queries, SQL Server does the same thing, based on the same guesses on the perceived number of rows at play. It’s one of those quirky things — if it’s a data point we care about, then it’s one we should express somehow. A computed column might work here: Which means we’d have to change our queries a bit, since expression matching has a hard time reasoning this one out:
Better
Butter

What Would Be Better?

Practically speaking, it isn’t the job of an index (or statistic) to track these things. Though neither of these plans is terrible in isolation, bad guesses like these flow all through plans. It’s one of those sort of terrible lurkers just waiting to turn an otherwise good query into that thing you regard with absolute loathing every time it shows up in [your favorite monitoring tool]. Knowing this, we can start to design our data to better reflect data points we care about. A likely hero here is a computed column to return some value based on which is greater, or a DATEDIFF of the two columns.

Wilcox Hotel

Sorry if you started to read this, and realized about half of the post was missing. WordPress decided to eat half of it randomly.

5 thoughts on “What Indexes Don’t Do: Redux”

  1. Interesting post, Erik. When I was reading it, I also thought of using a computed column with an index. I then wondered if an appropriate check constraint would do something similar:

    CONSTRAINT CK_the_dates CHECK (CreationDate ClosedDate. Unfortunately, I couldn’t come up with a way to make the check constraint “short circuit” the query execution and bypass the table/index scan.

    1. Yeah, you could put some more logic in a udf, but then you’ll make everything horrible by putting a udf in your table.

  2. Great post Erik!

    Am I correct that you’re using the ” AND 1 = (SELECT 1);” for forcing a FULL optimization level? I remember to a post writen by you but can’t find it…

    1. Hi Lori!

      Yep, exactly right. I’ve mentioned it in passing in a bunch of posts, but never written specifically about it.

      Thanks!

Leave a Reply

Your email address will not be published. Required fields are marked *