SARGability – Darling Data

Software Vendor Mistakes With SQL Server: Using Date Functions On Columns Or Local Variables

Posted on February 1, 2022May 16, 2022 by Erik Darling

Reusable Logic

In most programming languages, it’s quite sensible to create a variable or parameter, use some predefined logic to assign it a value, and then keep reusing it to prevent having to execute the same code over and over again.

But those languages are all procedural, and have a bit of a different set of rules and whatnot. In SQL Server, there are certainly somewhat procedural elements.

Functions
Control-flow logic
Cursors
While loops
Maybe the inner side of Nested Loops joins

You may be able to name some more, if you really get to thinking about it. That should be a common enough list, though.

Reusable Problems

SQL Server has a wonderful optimizer. It’s capable of many things, but it also has some problems.

Many of those problems exist today for “backwards compatibility”. In other words: play legacy games, win legacy performance.

Lots of people have found “workarounds” that rely on exploiting product behavior, and taking that away or changing it would result in… something else.

That’s why so many changes (improvements?) are hidden behind trace flags, compatibility levels, hints, batch mode, and other “fences” that you have to specifically hop to see if the grass is greener.

One of those things is the use of local variables. The linked post details how lousy those can be.

In this post, I show how you’re better off using the date math expressions instead.

And in this post, I show how you’re better off doing date math on parameters rather than on columns.

Let’s bring all that together!

Reusable Solutions

In SQL Server, context is everything. By context, I mean the way different methods of query execution are able to accept arguments from others.

You’ll sometimes hear this referred to as scope, too. Usually people will say inner context/scope and outer context/scope, or something similar.

What that means is something like this, if we’re talking about stored procedures:

CREATE PROCEDURE
    dbo.InnerContext
(
    @StartDate datetime,
    @EndDate datetime
)
AS
BEGIN
SET NOCOUNT, XACT_ABORT ON;

    SELECT
        C.PostId,
    	Score = 
    	    SUM(C.Score)
    FROM dbo.Comments AS C
    JOIN dbo.Votes AS V
        ON C.PostId = V.PostId
    WHERE C.CreationDate >= @StartDate
    AND   c.CreationDate <  @EndDate
    GROUP BY c.PostId;

END;
GO 

CREATE PROCEDURE
    dbo.OuterContext
(
    @StartDate datetime,
    @EndDate datetime
)
AS
BEGIN
SET NOCOUNT, XACT_ABORT ON;

IF @StartDate IS NULL
BEGIN
   SELECT
       @StartDate = GETDATE();
END;

IF @EndDate IS NULL
BEGIN
   SELECT
       @EndDate = DATEADD(DAY, 30, GETDATE());
END;

EXEC dbo.InnerContext
    @StartDate = @StartDate,
    @EndDate = @EndDate;

END;

If you’re okay using dynamic SQL, and really, you should be because it’s awesome when you’re not bad at it, you can do something like this:

CREATE PROCEDURE
    dbo.OuterContext
(
    @StartDate datetime,
    @EndDate datetime
)
AS
BEGIN
SET NOCOUNT, XACT_ABORT ON;

IF @StartDate IS NULL
BEGIN
   SELECT
       @StartDate = GETDATE();
END;

IF @EndDate IS NULL
BEGIN
   SELECT
       @EndDate = DATEADD(DAY, 30, GETDATE());
END;

DECLARE 
    @sql nvarchar(MAX) = N'
    /*dbo.OuterContext*/
    SELECT
        C.PostId,
    	Score = 
    	    SUM(C.Score)
    FROM dbo.Comments AS C
    JOIN dbo.Votes AS V
        ON C.PostId = V.PostId
    WHERE C.CreationDate >= @StartDate
    AND   c.CreationDate <  @EndDate
    GROUP BY c.PostId;		
    ';

EXEC sys.sp_executesql
    @sql,
  N'@StartDate datetime, 
    @EndDate datetime',
    @StartDate,
    @EndDate;

END;

Which will achieve the same thing.

Thanks for reading!

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Software Vendor Mistakes With SQL Server: Using Functions In Join Or Where Clauses

Posted on January 31, 2022May 16, 2022 by Erik Darling

Easy Thinking

To start with, let’s classify functions into two varietals:

Ones built-in to SQL Server
Ones that developers write

Of those, built-in functions are pretty much harmless when they’re in the select list. Classifying things a bit further for the ones user writes, we have:

Out of the four up there, only the last one doesn’t have a reference link. Why? Because I don’t write C# — I’m not that smart — if you need someone smart about that, go read my friend Josh’s blog. He’s quite capable.

If you’re too lazy to go read the three reference links:

Scalar User Defined Functions generally wreck performance
Multi-Statement Table Valued Functions stand a high chance of generally wrecking performance
Inline Table Valued Functions are okay as long as you don’t do anything awful in them

Smart Thinking

The important thing to understand is that using any of these functions, let’s call it below the belt, can really mess things up for query performance in new and profound ways compared to what they can do in just the select list.

To be more specific for you, dear ~~developer~~ reader, let’s frame below the belt as anything underneath the from clause. Things here get particularly troublesome, because much of the activity here is considered relational, whereas stuff up above is mostly just informational.

Why is the relational stuff a much bigger deal than the informational stuff? Because that’s where all the math happens in a query plan, and SQL Server’s optimizer decides on all sorts of things at compile-time, like:

Which indexes to use
Join order
Join types
Memory grants
Parallelism
Seeks and Scans
Aggregate types
Much, much more!

Those things are partially based on how well it’s able to estimate the number of rows that join and where conditions will produce.

Sticking functions in the way of those join and where conditions is a bit like putting a blindfold on SQL Server’s optimization and cardinality estimation process and asking it to swing a bowling ball sharp saber at a piece of confetti at 100 paces.

In other words, don’t complain when your query plans suck and your queries run slow. You’re doing the hobbling, you dirty bird.

Future Thinking

If you want your customers, users, or whatever you want to call them, to be reliably happy in the future, even as their database sizes grow beyond your wildest imagination, and your application gets used in ways that would make Caligula blush, you need to start by obeying the first law of database physics: thou shalt not get in the way of the optimizer.

Going back to a couple other laws of database physics that cannot be ignored:

Anything that makes your job easier makes the optimizer’s job harder
Store data the way you query it, and query data the way you store it

If it makes you feel better, stick a few thous and shalls or shalt nots or whences or whenceforths in there. It might make you happier.

It will make your customers, users, or whatever you want to call them happier, if you listen to me.

Going Further

The Right Way To Check For NULLs In SQL Server Queries

Posted on November 8, 2021May 16, 2022 by Erik Darling

101ers

This is still one of the most common problems I see in queries.

People are terrified of NULLs. People are afraid to merge on freeways in Los Angeles.

What results is this endless stream of poorly performing queries, and some surprise logic bugs along the way.

I don’t have much more else of an intro. The TL;DR is that you should use natural expressions like IS NULL or IS NOT NULL, rather than any of the built in functions available to you in SQL Server, like ISNULL, COALESCE, et al. which are presentation layer functions with no relational meaning whatsoever.

From here on out, we’ll be calling them unnatural expressions. Perhaps that will get through to you.

Tuning Wizard

First is something I’ve covered before, but when you use unnatural expressions, the optimizer will not give you feedback about useful indexes.

The first query generates a missing index request, the second one does not. The optimizer has abandoned all hope with the use of an unnatural expression.

Lethargy

The other issue with unnatural expressions comes down to implicit conversion.

Take this, for instance.

DECLARE 
    @i int = 0;

SELECT 
    c = 
        CASE ISNULL(@i, '') 
             WHEN '' 
             THEN 1 
             ELSE 0 
        END;

This will return a 1, because 0 and ” can be implicitly converted.

Perhaps less obvious, and more rare, is this:

DECLARE 
    @d datetime = '19000101';

SELECT 
    c = 
        CASE ISNULL(@d, '') 
             WHEN '' 
             THEN 1 
             ELSE 0 
        END;

Which will also return 1.

Not many databases have stuff going back to 1900, but I do see people using that as a canary value often enough.

Perfidy

If that’s not enough to get you off the idea, let’s look at how this stuff plays out in the real world.

First, let’s get ourselves an index. Without that, there’s fundamentally no difference in performance.

CREATE INDEX v ON dbo.Votes
    (BountyAmount);

Our gold standard will be these two queries:

SELECT
    c = COUNT_BIG(*)
FROM dbo.Votes AS v
WHERE v.BountyAmount IS NULL;

SELECT
    c = COUNT_BIG(*)
FROM dbo.Votes AS v
WHERE v.BountyAmount IS NOT NULL;

The first one that checks for NULL values returns a count of 182,348,084.

The second one that checks for NOT NULL values returns a count of 344,070.

Keep those in mind!

The query plans for them both look like this:

Which run, respectively (and respectably), in 846ms and 26ms. Obviously the query with the more selective predicate will have a time advantage, here.

Wrongly

Here’s where things start to go wrong.

This query returns incorrect results, but you’re probably used to that because of all the NOLOCK hints in your queries anyway.

SELECT
    c = COUNT_BIG(*)
FROM dbo.Votes AS v
WHERE ISNULL(v.BountyAmount, '') = '';

A count of 182,349,088 is returned rather than 182,348,084, because there are 1004 rows with a bounty of 0.

Even though we have an empty string in our query, it’s implicitly converted to 0.

And you thought you were so clever.

Badly

The exercises in futility that I see people carrying on with often look make use of ISNULL, COALESCE, and CASE expressions.

It’s worth noting here that COALESCE is only a CASE expression underneath, anyway. They are interchangeable in this respect.

For findings NULLs, people will screw up and do this:

SELECT
    c = COUNT_BIG(*)
FROM dbo.Votes AS v
WHERE ISNULL(v.BountyAmount, -1) = -1;

SELECT
    c = COUNT_BIG(*)
FROM dbo.Votes AS v
WHERE COALESCE(v.BountyAmount, -1) = -1;

We can use -1 here because it doesn’t naturally occur in the data. Results are correct for both, but performance is comparatively horrible.

We’re looking at 2.5 seconds compared to 900ms. This situation gets worse with the more selective predicates, too.

These both take roughly the same time as the other unnatural forms of this query, but recall the natural version of this query finished in under 30ms.

Deadly

I hope I don’t have to write about this anymore, but at the rate I see people doing this stuff, I kind of doubt it.

Broken Record Enterprises, it feels like sometimes.

I’m not sure why anyone thinks this is a good idea. I’ve heard rumors that it comes from application developers who are used to NULLs throwing errors writing SQL queries, where they don’t pose the same threat.

Who knows, though. Maybe people just really like the festive pink text color that functions turn in SSMS.

Thanks for reading!

Going Further

Inline Table Valued Functions And Catch All Queries In SQL Server

Posted on October 25, 2021May 16, 2022 by Erik Darling

This Is Not A Solution

Right off the bat, I want you to know that this is not a solution, and I’ll get to why in a minute. I’m writing this mainly because every once in a while I’ll try something different to get this working, and it always ends up disappointing.

I wish I had better news for you, here. Hell, I wish I had better news for me here. But alas we’re at the mercy of parameters.

And yeah, I know, recompile, recompile, recompile. All the live long day. But I’ve seen some weird stuff happen with that too under high concurrency.

So what’s the point? Let’s talk about that.

Dot Dot Dot

CREATE INDEX p1 ON dbo.Posts(OwnerUserId, CreationDate);

CREATE INDEX p2 ON dbo.Posts(Score, LastActivityDate);

We need some indexes. That’s a fact. I’m intentionally creating them in this way to show you that SQL Server can sometimes be smart about catch all queries.

And here’s the inline table valued function we’ll be working with:

CREATE OR ALTER FUNCTION
    dbo.kitchen_sink
(
    @OwnerUserId int,
    @CreationDate datetime,
    @Score int,
    @LastActivityDate datetime
)
RETURNS table
AS
RETURN
SELECT
    c = COUNT_BIG(*)
FROM dbo.Posts AS p
WHERE 
        (p.OwnerUserId = @OwnerUserId OR @OwnerUserId IS NULL)
    AND (p.CreationDate >= @CreationDate OR @CreationDate IS NULL)
    AND (p.Score >= @Score OR @Score IS NULL)
    AND (p.LastActivityDate >= @LastActivityDate OR @LastActivityDate IS NULL);

This pattern usually eats the optimizer alive, and there’s a lot of posts about using dynamic SQL to fix it.

But when we call this function with literal values, it does just fine.

SELECT
    ks.c
FROM dbo.kitchen_sink(22656, '20130101', NULL, NULL) AS ks;

SELECT
    ks.c
FROM dbo.kitchen_sink(NULL, NULL, 100, '20130101') AS ks;

SELECT
    ks.c
FROM dbo.kitchen_sink(22656, NULL, NULL, '20130101') AS ks;

SELECT
    ks.c
FROM dbo.kitchen_sink(NULL, '20131225', NULL, '20131225') AS ks;

SELECT
    ks.c
FROM dbo.kitchen_sink(22656, NULL, NULL, '20131215') AS ks;

Das Plan

You can run those all yourself and look at the plans. I’m just gonna throw a couple of the more interesting examples in the post, though.

The first two queries do exactly what we’d hope to see.

We use the right indexes, we get seeks. Cardinality estimation is about as reliable as ever with the “””””default””””” estimator in place 🙄

And at one point, we even get a really smart index intersection plan where the optimizer uses both of our nonclustered indexes.

Parameter Problem

The problem is that no one really makes database calls like that.

If you’re using an ORM, you could intentionally not parameterize your queries and get this to “work”, but there are downsides to that around the plan cache. Being honest, most plan caches are useless anyway.

Long Live Query Store, or something.

Most people have their catch all code parameterized, so the query looks like what’s in the function. I’m going to throw the function in a stored procedure now.

CREATE OR ALTER PROCEDURE
    dbo.kitchen_wrapper
(
    @OwnerUserId int,
    @CreationDate datetime,
    @Score int,
    @LastActivityDate datetime
)
AS
BEGIN
SET NOCOUNT, XACT_ABORT ON;

    SELECT
        ks.c
    FROM dbo.kitchen_sink
    (
        @OwnerUserId, 
        @CreationDate, 
        @Score, 
        @LastActivityDate
    ) AS ks;

END;

If we execute the proc like this, everything goes to hell rather quickly.

EXEC dbo.kitchen_wrapper
    @OwnerUserId = 22656,          
    @CreationDate = '20131215',    
    @Score = NULL,                  
    @LastActivityDate = NULL;

EXEC dbo.kitchen_wrapper
    @OwnerUserId = NULL,
    @CreationDate = NULL, 
    @Score = 100, 
    @LastActivityDate = '20131215';

Baywatch

The first execution uses the “right” index, but we lose our nice index seek into the p1 index.

We also end up with Predicates on the Key Lookup, just in case they end up not being NULL. And boy, when they end up not being NULL, we end up with a really slow query.

We re-use the execution plan we saw before, because that’s how SQL Server works. But since we don’t filter any rows from p1 since those parameters are NULL now, we pass all 17 million rows to the key lookup to filter them there, but since it’s a Nested Loops Join, we do it… one row at a time.

Fun.

Floss Too Much

There’s no great fix for this, either. This is a problem we’re stuck with when we write queries this way without using dynamic SQL, or a recompile hint.

I’ve seen people try all sorts of things to “fix” this problem. Case expressions, ISNULL and COALESCE, magic values, and more. They all have this exact same issue.

And I know, recompile, recompile, recompile.

Thanks for reading!

Going Further

A SARGability Riddle In SQL Server: Why Do Some Queries Seek and Some Queries Scan?

Posted on October 20, 2021May 16, 2022 by Erik Darling

Use The Force

I had a RABID FAN ask me an interesting question about a query. I can’t use theirs, but I can repro the question.

The question was: if using ISNULL in a where clause isn’t SARGable, how come I can use a FORCESEEK hint in this query?

Formatting and linking my own, of course.

The query looked something like this:

SELECT
    c = 
        COUNT_BIG(*)
FROM dbo.Users AS u WITH(FORCESEEK)
WHERE ISNULL(u.Id, 0) = 22656;
GO

What Gives?

The first thing you should notice is that the optimizer throws out ISNULL, here.

Why? Because the Id column isn’t NULL-able, and since ISNULL is Microsoft’s special non-ANSI baby, it can do this one special thing.

If we use COALESCE instead, we’ll get an error.

SELECT
    c = 
        COUNT_BIG(*)
FROM dbo.Users AS u WITH(FORCESEEK)
WHERE COALESCE(u.Id, 0) = 22656;
GO

Msg 8622, Level 16, State 1, Line 8
Query processor could not produce a query plan because of the hints defined in this query. Resubmit the query without specifying any hints and without using SET FORCEPLAN.

And if we try to use ISNULL on a NULL-able column like Age, we’ll get the same error:

SELECT
    c = 
        COUNT_BIG(*)
FROM dbo.Users AS u WITH(FORCESEEK)
WHERE ISNULL(u.Age, 0) = 22656;
GO

Coacase? Caselesce?

Under the covers, COALESCE is just a ~~crappy band~~ CASE expression.

Without the FORCESEEK hint, we can get the query to actually run.

SELECT
    c = 
        COUNT_BIG(*)
FROM dbo.Users AS u
WHERE COALESCE(u.Id, 0) = 22656;
GO

And ISNULL is just… ISNULL.

SELECT
    c = 
        COUNT_BIG(*)
FROM dbo.Users AS u
WHERE ISNULL(u.Age, 0) = 22656;
GO

Thanks for reading!

Going Further

SARGability Week: Wrap Up

Posted on September 3, 2021May 16, 2022 by Erik Darling

One Metric Week

This went on a bit longer than I thought it would, but to someone who tries to blog five days a week, that’s a good thing.

While this isn’t the most glamorous subject in the world, it seems it’s still a necessary one to write about. I don’t quite know how to feel about that, as I’ve read and written about it many times over the past 10 or so years.

Thanks for reading!

Going Further

SARGability Week: Why Implicit Conversions Aren’t SARGable

Posted on September 2, 2021May 16, 2022 by Erik Darling

President Precedence

Data types are one of those core things you need to get right. Whether it’s matching types between join columns, or between parameters and predicates, not doing so can really squash application performance in quite similar ways to writing non-SARGable predicates.

That’s because — wait for it — a lot of the stuff we’ve talked about over the last week that can happen with poorly written predicates can happen with poorly matched data types, too.

The main thing to remember, is that aside from max datatypes, what matters most in a situation with implicit conversion is that it doesn’t take place on a column. If it happens on a parameter or variable it’s far less of an issue, but it can still cause oddities in query plans, and with cardinality estimation.

Speak And Spell

Most of the problems I see with implicit conversion is with other datatypes being compared to nchar/nvarchar types, but not always.

For example, this query results in an index seek despite a variable declared as nvarchar(11).

DECLARE 
    @i nvarchar(11) = N'22656';

SELECT
   u.*
FROM dbo.Users AS u
WHERE u.AccountId = @i;
GO

Note that the convert_implicit function is applied to the variable, and not the AccountId column.

Cahn

To show an example of when implicit conversions act the same way was non-SARGable predicates, let’s make a crappy copy of some columns from the Users table.

DROP TABLE IF EXISTS
    dbo.UsersBad;

SELECT u.Id,
       ISNULL
       (
           CONVERT
           (
               varchar(40), 
               u.DisplayName
           ), 
           ''
       ) AS DisplayName,
       ISNULL
       (
           CONVERT
           (
               nvarchar(11), 
               u.Reputation
           ), 
           ''
       ) AS Reputation
INTO dbo.UsersBad
FROM dbo.Users AS u;

ALTER TABLE dbo.UsersBad 
    ADD CONSTRAINT PK_UsersBad_Id 
        PRIMARY KEY CLUSTERED (Id);

CREATE INDEX ur ON dbo.UsersBad(Reputation);
CREATE INDEX ud ON dbo.UsersBad(DisplayName);

Here, we’re converting DisplayName from nvarchar, and Reputation from an int. We’re also creating some indexes that will be rendered nearly useless by implicit conversions.

CREATE OR ALTER PROCEDURE 
    dbo.BadUsersQuery
(
    @DisplayName nvarchar(40),
    @Reputation  int
)
AS
BEGIN

    SELECT 
        u.DisplayName
    FROM dbo.UsersBad AS u
    WHERE u.DisplayName = @DisplayName;

    SELECT
        Reputation = 
            MAX(u.Reputation)
    FROM dbo.UsersBad AS u
    WHERE u.Reputation = @Reputation;

END;
GO 

EXEC dbo.BadUsersQuery 
    @DisplayName = N'Eggs McLaren',
    @Reputation = 1787;

Imaging

For both of these, the convert_implicit winds up on the column rather than on the parameter.

And that’s what makes for the type of implicit conversion that causes most performance problems I see.

This is a relatively small table, so the hit isn’t too bad, but comparatively it’s much worse, like in all the other examples of SARGability we’ve seen lately.

Aware

I think most people who do performance tuning regularly are quite aware of this problem. There’s even a query plan warning about it, for those who don’t.

It’s definitely a good problem to solve, but it often leads to finding other problems. For example, we get the seek predicate warning regardless of if there’s an index we can seek to. If I drop all the indexes off of the UsersBad table and run a query like this, we still get a warning:

DECLARE
    @DisplayName nvarchar(40);

SELECT 
    u.DisplayName
FROM dbo.UsersBad AS u
WHERE u.DisplayName = @DisplayName;

It’s also worth noting that getting rid of the implicit conversion — much like fixing other non-SARGable predicates — may reveal missing index requests that weren’t there before.

DECLARE
    @DisplayName nvarchar(40) = N'Eggs McLaren';

SELECT 
    u.*
FROM dbo.UsersBad AS u
WHERE u.DisplayName = @DisplayName;
GO 

DECLARE
    @DisplayName varchar(40) = 'Eggs McLaren';

SELECT 
    u.*
FROM dbo.UsersBad AS u
WHERE u.DisplayName = @DisplayName;
GO

Haunch

Solving implicit conversion issues is just as important (and often easier) than solving other issues with SARGable predicates, and just as important.

Even though it’s one of the first performance problems people learn about, I still see it out there enough to write about it. I think a lot of the reason that it still crops up is because ORMs leave developers detached from the queries, and they don’t see how parameter types end up getting inferred when not strongly typed.

Anyway, that’s it for this series. Tomorrow’s post will be a wrap up with links.

Thanks for reading!

Going Further

SARGability Week: Using Indexed Views To Make Predicates SARGable

Posted on September 1, 2021May 16, 2022 by Erik Darling

Boniface

There are some things that, in the course of normal query writing, just can’t be SARGablized. For example, generating and filtering on a windowing function, a having clause, or any other runtime expression listed here.

There are some interesting ways to use indexed views to our advantage for some of those things. While windowing functions and having clauses can’t be directly in an indexed view, we can give an indexed view a good definition to support them.

I Don’t Care For It

Starting with these indexes to help things along, they don’t really do as much as we’d hope.

CREATE INDEX c
ON dbo.Comments (PostId);

CREATE INDEX v
ON dbo.Votes (PostId);

CREATE INDEX p
ON dbo.Posts (Id, OwnerUserId, Score);

This query has to process a ton of rows, and no matter what we set the having expression to, the entire result set has to be generated before it can be applied. We could set it to > 0 or > infinity and it would take the same amount of time to have a working set to apply it to.

SELECT 
    p.OwnerUserId,
    TotalScore = 
        SUM(ISNULL(p.Score * 1., 0.)),
    records = 
        COUNT_BIG(*)
FROM dbo.Posts AS p
JOIN  dbo.Comments AS c
    ON c.PostId = p.Id
JOIN dbo.Votes AS v
    ON v.PostId = p.Id
GROUP BY 
    p.OwnerUserId
HAVING 
    SUM(ISNULL(p.Score * 1., 0.)) > 5000000.
ORDER BY
    TotalScore DESC;

Limited Liability

I know that having clause looks funny there, but it’s not my fault. The sum of Score ends up being a really big integer, and overflows the regular sized integers unless you explicitly convert it to a bigint or implicitly convert it to something floaty. The isnull is there because the column is NULLable, which is unacceptable to an indexed view.

So, here we are, forced to write something weird to conform.

Sizzling. Sparkling.

Ignoring the woefully misaligned and misleading operator times, we can see in the query plan that again(!) a late Filter operator is applied that aligns with the predicate in our having clause.

Sarging Ahead

Let’s work some magic, here.

CREATE VIEW 
    dbo.BunchaCrap
WITH 
    SCHEMABINDING
AS
SELECT 
    p.OwnerUserId,
    TotalScore = 
        SUM(ISNULL(p.Score * 1., 0.)),
    records = 
        COUNT_BIG(*)
FROM dbo.Posts AS p
JOIN  dbo.Comments AS c
    ON c.PostId = p.Id
JOIN dbo.Votes AS v
    ON v.PostId = p.Id
GROUP BY 
    p.OwnerUserId;
GO 

CREATE UNIQUE CLUSTERED INDEX bc ON dbo.BunchaCrap(OwnerUserId);

This gives us an indexed view with the TotalScore expression materialized, which means we can search on it directly now without all the 50 some-odd seconds of nonsense leading up to it.

At The Mall

Our options now are either to query the indexed view directly with a noexpand hint, or to run the query as initially designed and rely on expression matching to pick up on things.

SELECT 
    bc.*
FROM dbo.BunchaCrap AS bc WITH (NOEXPAND)
WHERE bc.TotalScore > 5000000.
ORDER BY bc.TotalScore DESC;

In either case, we’ll get this query plan now:

Which looks a whole heck of a lot nicer.

Tomorrow, we’ll look at how implicit conversion can look a lot like non-SARGable predicates.

Thanks for reading!

Going Further

SARGability Week: Rewriting Scalar User Defined Functions To Make Them SARGable

Posted on August 31, 2021May 16, 2022 by Erik Darling

Cheap Replica

The bottom line on scalar UDFs is that they’re poison pills for performance.

They’re bad enough in the select list, but they get even worse if they appear in join or where clause portions of the query.

The example query we’re going to use doesn’t even go out and touch other tables, which can certainly make things worse. It does all its processing “in memory”.

CREATE FUNCTION dbo.nonsargable(@d datetime)
RETURNS bit
WITH SCHEMABINDING,
     RETURNS NULL ON NULL INPUT
AS 
BEGIN

RETURN
(
    SELECT 
        CASE 
            WHEN DATEDIFF
                 (
                     DAY, 
                     @d, 
                     GETDATE()
                 ) > 30
            THEN 1
            ELSE 0
        END
);

END;
GO

Much Merch

When we run this query, the plan is a messmare.

SELECT
    c = COUNT_BIG(*)
FROM dbo.Users AS u
WHERE u.Reputation > 100
AND   dbo.nonsargable(u.LastAccessDate) = 1;

The Filter operator is a familiar face at this point.

The function has to run once per row that leaves the Users table, and the predicate has to be calculated and evaluated later at the filter operation.

Twisty

If we look in dm_exec_function_stats, we can see just how many times, and how much CPU the function used when we ran just this one query:

If you pay extra special attention, you’ll noticed that the execution_count here exactly matches the number of rows that pass through the Filter operator above.

The way to fix this is to rewrite the function as an inline table valued function.

CREATE FUNCTION dbo.nonsargable_inline(@d datetime)
RETURNS TABLE
WITH SCHEMABINDING
AS 
RETURN

    SELECT 
        b = 
        CASE 
            WHEN DATEDIFF
                 (
                     DAY, 
                     @d, 
                     GETDATE()
                 ) > 30
            THEN 1
            ELSE 0
        END;

GO

Now we don’t have all those scalar problems.

Save The Wheels

We have to call our function a little bit differently, but that’s far less of a big deal.

SELECT
    c = COUNT_BIG(*)
FROM dbo.Users AS u
WHERE u.Reputation > 100
AND   (
          SELECT
              * 
          FROM dbo.nonsargable_inline(u.LastAccessDate)
       ) = 1;

And our query can go parallel, and take way less than 10 seconds.

While the predicate isn’t exactly ideal, in this case I’ll deal with it, and maybe even add some better indexes if I need to. At 183 milliseconds, I don’t need to do that right now.

Save The Feels

Scalar UDFs are still the downfall of many queries, and because this one has getdate in it, it can’t be inlined automatically in SQL Server 2019.

With simple enough functions, rewrites to inline table valued functions are easy enough. The more nonsense you put into the body of a function, the harder quick fixes get. I’ve spent days rewriting some that ran into the thousands of lines, and it’s not a pretty process.

Tomorrow we’re going to look at how indexed views can help you solve SARGability issues.

Thanks for reading!

Going Further

SARGability Week: Max Data Type Columns And Predicates Aren’t SARGable

Posted on August 30, 2021May 16, 2022 by Erik Darling

Stucco

When you sit down to choose data types, max really should be last on your list. There are very few legitimate uses for them.

Start by asking yourself if someone may ever throw in a string over 4000 or 8000 characters, you’re storing XML or JSON, or some other foolish endeavor.

If the answer is “no”, or “just to be safe“, stop right there. Put that max datatype down. You’re really gonna hate when they end up doing to your memory grants.

But you’ll also hate what they do to queries that try to filter on them.

Columns

No matter how much you scream, holler, and curse, when you try to filter data in a column with a max type, that predicate can’t be pushed to when you touch the index.

Leaving aside that max data types can’t be in the key of an index, because that would be insane, even if you stick them in the includes you’re looking at a lot of potential bulk, and not the good kind that makes you regular.

Let’s look at these two queries. The first one hits the Body column, and the second one hits the Title column.

SELECT
    c = COUNT_BIG(*)
FROM dbo.Posts AS p
WHERE p.Body = N'A';

SELECT
    c = COUNT_BIG(*)
FROM dbo.Posts AS p
WHERE p.Title = N'A';

Both of these queries suck because we have to scan the entire Posts table, but at least the second one would be indexable if we cared enough to add one.

Arguments

Starting off clear: This will happen regardless of if your search argument is a variable or a parameter, regardless of recompile hints.

DECLARE @A nvarchar(MAX) = N'A';
SELECT
    c = COUNT_BIG(*)
FROM dbo.Posts AS p
WHERE p.Title = @A;
GO 

DECLARE @A nvarchar(MAX) = N'A';
SELECT
    c = COUNT_BIG(*)
FROM dbo.Posts AS p
WHERE p.Title = @A
OPTION(RECOMPILE);
GO

I see this quite often in ORMs where people don’t explicitly define datatypes, and stored procedures where people are being daft.

Surgeons

This is the sort of stuff you have to deal with when you use max data types.

They really are a pain, and the larger your tables are, the harder it can be to make changes later. Add in any sort of data synchronization and it all gets much worse.

In tomorrow’s post, we’ll look at how user defined functions can make things horrible.

As usual.

Thanks for reading!