Query Tuning – Darling Data

How To Find Poorly Performing SQL Server Queries To Tune Using Query Store

Posted on April 11, 2023March 12, 2023 by Erik Darling

Stick And Move

If you take performance for your SQL Servers seriously, you should be using Query Store for your business critical databases.

I used to say this about third party monitoring tools, but the landscape for those has really tanked over the last few years. I used to really love SQL Sentry, but it has essentially become abandonware since SolarWinds bought SentryOne.

At this point, I’m happier to enable query store, and then use a couple extended events to capture blocking and deadlocks. While it would be stellar if Query Store also did that, for now life is easy enough.

To analyze blocking and deadlock Extended Events, I use:

sp_BlitzLock for deadlocks
sp_HumanEventsBlockViewer for blocking

This won’t capture absolutely everything, but that’s okay. We can usually get enough to go on with those three things. If you have bad blocking and deadlocking problems, you should start there.

But once you turn on Query Store, where do you go?

Gooey

If you’re okay with all the limitations of the GUI, you can tweak a few things to get more useful information out of it.

I usually start with the Top Resource Consuming Queries view, since, uh… those are usually good things to tune.

SQL Server Query Store — top resource consuming plans

But the crappy bar graph that Query Store defaults to is not what you want to see. There’s way too much jumping around and mousing over things to figure out what’s in front of you.

I like switching to the grid format with additional details view, by clicking the blue button like so:

But we’re not done yet! Not by a long shot. The next thing we wanna do is hit the Configure button, and change what we’re looking at. See, the other crappy thing is that Query Store defaults to showing you queries by total duration.

What ends up being in here is a bunch of stuff that runs a lot, but tends to run quickly. You might get lucky and find some quick wins here, but it’s usually not where the real bangers live.

To get to those, we need to hit the Configure button and make a couple tweaks to look at queries that use a lot of CPU on average, and push the time back from only showing the last hour to the last week or so.

You can go back further, but usually the further you go back, the longer it takes to get you results.

The problem here is that you can often get back quite a bit of noise that you can’t filter out or ignore. Here’s what mine looks like:

We don’t really need to know that creating indexes took a long time. Substitute those with queries you don’t necessarily care about fixing, and you get the point.

You can sort of control this by only asking for queries with a certain number of plans to come back, but if your queries aren’t parameterized and you have a lot of “single use” execution plans, you’ll miss out on those in the results.

This filter is available under the Configuration settings where we changes the CPU/Average/Dates before.

The major limitation of Query Store’s GUI is that you can’t search through it for specific problems. It totally could and should be in there, but as of this writing, it’s not in there.

That’s where my stored procedure sp_QuickieStore comes in.

Scripted, For Your Pleasure

The nice thing about sp_QuickieStore is that it gets rid of a lot of the click-clacking around to get things set up. You can’t save your Query Store GUI layout to open up and show you what you want every time, you have to redo it.

To get us to where we were with the settings above, all we have to do is this:

EXEC sp_QuickieStore
    @execution_count = 5;

By default, sp_QuickieStore will already sort results by average CPU for queries executed over the last week of Query Store data. It will also filter out plans for stuff we can’t really tune, like creating indexes, updating statistics, and waste of time index maintenance.

You’ll get results that look somewhat like so:

There are a number of things you can do with to include or ignore only certain information, too:

@execution_count: the minimum number of executions a query must have
@duration_ms: the minimum duration a query must have
@execution_type_desc: the type of execution you want to filter
@procedure_schema: the schema of the procedure you’re searching for
@procedure_name: the name of the programmable object you’re searching for
@include_plan_ids: a list of plan ids to search for
@include_query_ids: a list of query ids to search for
@ignore_plan_ids: a list of plan ids to ignore
@ignore_query_ids: a list of query ids to ignore
@include_query_hashes: a list of query hashes to search for
@include_plan_hashes: a list of query plan hashes to search for
@include_sql_handles: a list of sql handles to search for
@ignore_query_hashes: a list of query hashes to ignore
@ignore_plan_hashes: a list of query plan hashes to ignore
@ignore_sql_handles: a list of sql handles to ignore
@query_text_search: query text to search for

You straight up can’t do any of that with Query Store’s GUI. I love being able to focus in on all the plans for a specific stored procedure.

Thanks for reading!

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

Why SQL Server’s OPTIMIZE FOR UNKNOWN Hint Hurts Performance

Posted on April 6, 2023March 7, 2023 by Erik Darling

“Best Practice”

It’s somewhat strange to hear people carry on about best practices that are actually worst practices.

One worst practice that has strong staying power is the OPTIMIZE FOR UNKNOWN hint, which we talked about yesterday.

It probably doesn’t help that Microsoft has products (I’m looking at you, Dynamics) which have a setting to add the hint to every query. Shorter: If Microsoft recommends it, it must be good.

Thanks, Microsoft. Dummies.

Using the OPTIMIZE FOR UNKNOWN hint, or declaring variables inside of a code block to be used in a where clause have the same issue, though: they make SQL Server’s query optimizer make bad guesses, which often lead to bad execution plans.

You can read great detail about that here.

Mistakenly

We’re going to create two indexes on the Posts table:

CREATE INDEX
    p0
ON dbo.Posts
(
    OwnerUserId
)
WITH
(
    SORT_IN_TEMPDB = ON,
    DATA_COMPRESSION = PAGE
);
GO

CREATE INDEX
    p1
ON dbo.Posts
(
    ParentId,
    CreationDate,
    LastActivityDate
)
INCLUDE
(
    PostTypeId
)
WITH
(
    SORT_IN_TEMPDB = ON,
    DATA_COMPRESSION = PAGE
);
GO

The indexes themselves are not as important as how SQL Server goes about choosing them.

Support Wear

This stored procedure is going to call the same query in three different ways:

One with the OPTIMIZE FOR UNKNOWN hint that uses parameters
One with local variables set to parameter values with no hints
One that accepts parameters and uses no hints

CREATE OR ALTER PROCEDURE
    dbo.unknown_soldier
(
    @ParentId int,
    @OwnerUserId int
)
AS
BEGIN
SET NOCOUNT, XACT_ABORT ON;

    SELECT TOP (1)
        p.*
    FROM dbo.Posts AS p
    WHERE p.ParentId = @ParentId
    AND   p.OwnerUserId = @OwnerUserId
    ORDER BY
        p.Score DESC,
        p.Id DESC
    OPTION(OPTIMIZE FOR UNKNOWN);

    DECLARE
        @ParentIdInner int = @ParentId,
        @OwnerUserIdInner int = @OwnerUserId;

    SELECT TOP (1)
        p.*
    FROM dbo.Posts AS p
    WHERE p.ParentId = @ParentIdInner
    AND   p.OwnerUserId = @OwnerUserIdInner
    ORDER BY
        p.Score DESC,
        p.Id DESC;

    SELECT TOP (1)
        p.*
    FROM dbo.Posts AS p
    WHERE p.ParentId = @ParentId
    AND   p.OwnerUserId = @OwnerUserId
    ORDER BY
        p.Score DESC,
        p.Id DESC;

END;
GO

Placebo Effect

If we call the stored procedure with actual execution plans enabled, we get the following plans back.

EXEC dbo.unknown_soldier 
    @OwnerUserId = 22656, 
    @ParentId = 0;

SQL Server Query Plan With Optimize For Unknown Hint — Not a good guess.

The assumed selectivity that the OPTIMIZE FOR UNKNOWN hint produces as a cardinality estimate is way off the rails.

SQL Server thinks three rows are going to come back, but we get 6,050,820 rows back.

We get identical behavior from the second query that uses variables declared within the stored procedure, and set to the parameter values passed in.

SQL Server Query Plan With Local Variables — release me

Same poor guesses, same index choices, same long running plan.

Parameter Effect

The query that accepts parameters and doesn’t have any hints applied to it fares much better.

In this case, we get an accurate cardinality estimate, and a more suitable index choice.

Note that both queries perform lookups, but this one performs far fewer of them because it uses an index that filters way more rows out prior to doing the lookup.

The optimizer is able to choose the correct index because it’s able to evaluate predicate values against the statistics histograms rather than using the assumed selectivity guess.

Thanks for reading!

Going Further

A Quirk When Rewriting Scalar UDFs In SQL Server

Posted on March 28, 2023March 4, 2023 by Erik Darling

Back To Business

I think I have probably spent 500 hours of my life rewriting T-SQL Scalar UDFs to avoid all the performance problems associated with them.

The obvious choice is the Inline Table Valued Function, which has fewer performance issues baked in. For the kids out there: they don’t spill trauma.

But getting the rewrite right can be tricky, especially when it’s possible for the function to return NULL values.

I’m going to walk through a simple example, and show you how to get the results you want, without adding abusing your developers.

What is not covered in this post are all the performance issues caused by UDFs. If you want to get into that, click the training link at the bottom of this post.

The Problem

Here’s the function we need to rewrite. It returns a simple bit value if a particular user was active after a certain date:

CREATE OR ALTER FUNCTION
    dbo.rewrite
(
    @UserId int,
    @LastAccessDate datetime
)
RETURNS bit
AS
BEGIN
    DECLARE
        @b bit = 0,
        @d datetime = GETDATE(); /*NOFROID4U*/
    
    SELECT
        @b = 
            CASE
                WHEN u.Id IS NOT NULL
                THEN 1
                ELSE 0
            END
    FROM dbo.Users AS u
    WHERE u.Id = @UserId
    AND   u.LastAccessDate > @LastAccessDate;

    RETURN
        @b;
END;
GO

Since I’m using SQL Server 2022 in compatibility level 160, I’m declaring a useless datetime parameter and using GETDATE() to set it to a value to avoid scalar UDF inlining.

We can call it about like so (again, this query is too trivial to suffer any performance issues), and get some reasonable-looking results back.

SELECT TOP (10)
    p.OwnerUserId,
    p.LastActivityDate,
    thing0 = 
        dbo.rewrite
        (
            p.OwnerUserId, 
            GETDATE()
        )
FROM dbo.Posts AS p
WHERE p.Score = 1;

Writeable Media

Rewriting this function looks straightforward. All we need to do is Robocop a few parts and pieces and badabing badaboom we’re done.

Note that to really complete this, we’d also need to add a convert to bit to avoid SQL Server implicitly converting the output of the case expression to a (potentially) different datatype, but we’ll fix that in the final rewrite.

CREATE OR ALTER FUNCTION
    dbo.the_rewrite
(
    @UserId int,
    @LastAccessDate datetime
)
RETURNS table
WITH SCHEMABINDING
AS
RETURN    
    SELECT
        b = 
            CASE
                WHEN u.Id IS NOT NULL
                THEN 1
                ELSE 0
            END
    FROM dbo.Users AS u
    WHERE u.Id = @UserId
    AND   u.LastAccessDate > @LastAccessDate;
GO

Of course, this alters how we need to reference the function in the calling query. Inline table valued functions are totally different types of objects from scalar UDFs.

SELECT TOP (10)
    p.OwnerUserId,
    p.LastActivityDate,
    thing1 = 
        (
            SELECT 
                t.b 
            FROM dbo.the_rewrite
            (
                p.OwnerUserId, 
                GETDATE()
            ) AS t
        )
FROM dbo.Posts AS p
WHERE p.Score = 1;

But the results are disappointing! Where we once had perfectly formed zeroes, now we have a bunch of NULLs that severely harsh our mellow.

This can obviously cause problems for whomever or whatever is ingesting the result set.

Expectations: 1 or 0
Reality: NULL

Shame, that.

Changing The Query

Many developers will attempt something like this first, to replace NULLs in the calling query:

SELECT TOP (10)
    p.OwnerUserId,
    p.LastActivityDate,
    thing1 = 
        (
            SELECT 
                ISNULL
                (
                    t.b, 
                    0
                ) 
            FROM dbo.the_rewrite
            (
                p.OwnerUserId, 
                GETDATE()
            ) AS t
        )
FROM dbo.Posts AS p
WHERE p.Score = 1;

But this will still produce NULL realities where we have zeroed expectations. We could take a step way back and do something like this:

SELECT TOP (10)
    p.OwnerUserId,
    p.LastActivityDate,
    thing1 = 
        ISNULL
        (
            (
                SELECT 
                    t.b
                FROM dbo.the_rewrite
                (
                    p.OwnerUserId, 
                    GETDATE()
                ) AS t
            ),
            0
        )
FROM dbo.Posts AS p
WHERE p.Score = 1;
GO

But this is an ugly and annoying thing to remember. Imagine having to explain this to someone reading or trying to incorporate our beautiful new function into a query.

We should fix this inside the function.

Fixer Upper

I’m not going to pretend this is the only way to do this. You can likely figure out half a million ways to pet this cat. It’s just easy.

CREATE OR ALTER FUNCTION
    dbo.the_inner_rewrite
(
    @UserId int,
    @LastAccessDate datetime
)
RETURNS table
WITH SCHEMABINDING
AS
RETURN    
    
    SELECT
        b = 
            CONVERT
            (
                bit,
                MAX(x.b)
            )
    FROM
    (
        SELECT
            b = 
                CASE
                    WHEN u.Id IS NOT NULL
                    THEN 1
                    ELSE 0
                END
        FROM dbo.Users AS u
        WHERE u.Id = @UserId
        AND   u.LastAccessDate > @LastAccessDate
        
        UNION ALL
        
        SELECT
            b = 0
    ) AS x;
GO

We have:

Our original query, which may return 1 or 0 based on existence
A union all to a zero literal so that a result is guaranteed to be produced
An outer max to get the higher value between the two inner selects

And this will produce expected results, with the final output converted to a bit.

Thanks for reading!

Going Further

Why You’re Testing SQL Server Code For Performance The Wrong Way

Posted on October 12, 2022October 4, 2022 by Erik Darling

Turning And Turning

In yesterday’s post, I showed you a function rewrite from Scalar to Inline Table Valued, and gave you a couple example calls.

Now, if this is all you’re doing with a function, there’s absolutely no need to rewrite them.

SELECT
    cl.*
FROM dbo.CountLetters('1A1A1A1A1A') AS cl;

SELECT 
    CountLetters = 
        dbo.CountLetters_Bad('1A1A1A1A1A');

If you’re doing something like this, and maybe assigning it to a variable or using it to guide some branching logic, don’t you sweat it for one single solitary second.

You may want to make sure whatever code inside the function runs well, but changing the type of function here isn’t going to improve things.

More realistically, though, you’re going to be calling functions as part of a larger query.

Second To None

Let’s say you’re doing something a bit like this:

SELECT
    u.DisplayName,
    TotalScore = 
        SUM(p.Score * 1.),
    Thing = 
        dbo.CountLetters_Bad(u.DisplayName)
FROM dbo.Users AS u
JOIN dbo.Posts AS p
    ON p.OwnerUserId = u.Id
WHERE EXISTS
(
    SELECT
        1/0
    FROM dbo.Votes AS v
    WHERE p.Id = v.PostId
)
AND u.Reputation >= 100000
GROUP BY 
    u.DisplayName
ORDER BY
    TotalScore DESC;

It’s a far different scenario than just assigning the output of a Scalar UDF to a variable or using it to guide some branching logic.

Brooklyn Zoo

A few minor syntax changes to the function and to how the query calls it can make a big difference.

SELECT
    u.DisplayName,
    TotalScore = 
        SUM(p.Score * 1.),
    Thing = 
        (SELECT * FROM dbo.CountLetters(u.DisplayName))
FROM dbo.Users AS u
JOIN dbo.Posts AS p
    ON p.OwnerUserId = u.Id
WHERE EXISTS
(
    SELECT
        1/0
    FROM dbo.Votes AS v
    WHERE p.Id = v.PostId
)
AND u.Reputation >= 100000
GROUP BY 
    u.DisplayName
ORDER BY
    TotalScore DESC;

Since this is a table valued function, you have to ask for results from it like you’d ask for results from a table.

HD Video

Whether you like competitive sports, good vs. evil, Great British Bake Off, or watching election results, you can appreciate the magnificence of a Flawless Victory©.

As long as someone wins who you’re rooting for. But here, no one’s rooting for scalar functions. They’re just unlovable.

I’m going to show you the very end of these plans to see the timing differences.

The Scalar UDF plan takes ~23 seconds, and the inline TVF plan takes 7.5 seconds.

And this is why testing certain linguistic elements in SQL needs to be done realistically. Just testing a single value would never reveal performance issues.

Thanks for reading!

Going Further

SQL Server Community Tools: Capturing Query Wait Stats With sp_HumanEvents

Posted on September 2, 2022August 22, 2022 by Erik Darling

Paladin

I have sort of a love/hate relationship with wait stats scripts and analysis. Sometimes they’re great to correlate with larger performance problems or trends, and other times they’re totally useless.

When you’re looking at wait stats, some important things to figure out are:

How much of a wait happened compared to uptime
If the waits lasted a long time on average

And you can do that out of the box with SQL Server. What you can’t get are two very important things:

When the waits happened
Which queries caused the waits

This stuff is vitally important for figuring out if wait stats are benign overall to the workload.

For example, let’s say your server has been up for 100 hours, and you spent 50 hours waiting on PAGEIOLATCH_SH waits. Normally I’d be pretty worried about that, and I’d be looking at if the server has enough memory, if queries are asking for big memory grants, if important queries are missing any indexes, etc.

But if we knew that all 50 of those hours were outside of normal use activity, and maybe even happened in a separate database for warehousing or archiving off data, we might be able to ignore it and focus on other portions of the workload.

Dorking

With sp_HumanEvents, you can get all of those things!

EXEC sp_HumanEvents 
    @event_type = 'waits',
    @seconds_sample = 60;

When this finishes running, you’ll get three results back:

Overall wait stats for the period of time
Wait stats broken down by database for the period of time
Wait stats broken down by database and query for the period of time

And because I don’t want to leave you hanging, you’ll also get details about the waits themselves, like

How much of a wait happened compared to sampled time
How long the waits lasted on average in the sampled time

If you need to figure out which queries are causing wait stats that you’re worried about, this is a great way to get started with that investigation.

Thanks for reading!

Going Further

SQL Server Community Tools: Capturing Query Performance Problems With sp_HumanEvents

Posted on August 30, 2022August 23, 2022 by Erik Darling

Cumulative Noopdate

In yesterday’s post, I talked through how I capture blocking using sp_HumanEvents. Today I’m going to talk about a couple different ways I use it to capture query performance issues.

One thing I want to stress is that you shouldn’t use yesterday’s technique to gather query performance issues. One thing sp_HumanEvents does is capture actual execution plans, and that can really bog a server down if it’s busy.

I tend to use it for short periods of time, or for very targeted data collection against a single stored procedure or session id running things.

I’ve occasionally toyed with the idea of adding a flag to not get query plans, or to use a different event to get them.

I just don’t think there’s enough value in that to be worthwhile since the actual execution plan has so many important details that other copies do not.

So anyway, let’s do a thing.

Whole Hog

You can totally use sp_HumanEvents to grab absolutely everything going on like this:

EXEC sp_HumanEvents 
    @event_type = 'query', 
    @query_duration_ms = 5000, 
    @seconds_sample = 20;

You may need to do this in some cases when you’re first getting to know a server and need to get a feeling for what’s going on. This will show you any query that takes 5 seconds or longer in the 20 second window the session is alive for.

If you’re on a really busy server, it can help to cut down on how much you’re pulling in:

EXEC sp_HumanEvents 
    @event_type = 'query', 
    @query_duration_ms = 5000, 
    @seconds_sample = 20,
    @sample_divisor = '5';

This will only pull in data from sessions if their spid is divisible by 5. The busier your server is, the weirder you might want to make this number, like 15/17/19 or something.

Belly

Much more common for me is to be on a development server, and want to watch my spid as I execute some code:

EXEC sp_HumanEvents
    @event_type = 'query',                   
    @query_duration_ms = 10000,               
    @session_id = N'58',                    
    @keep_alive = 1;

This is especially useful if you’re running a big long stored procedure that calls a bunch of other stored procedures, and you want to find all the statements that take a long time without grabbing every single query plan.

If you’ve ever turned on Actual Execution Plans and tried to do this, you might still be waiting for SSMS to become responsive again. It’s really painful.

By only grabbing query details for things that run a long time, you cut out all the little noisy queries that you can’t really tune.

I absolutely adore this, because it lets me focus in on just the parts that take a long time.

Shoulder

One pretty common scenario is for clients to give me a list of stored procedures to fix. If they don’t have a monitoring tool, it can be a challenge to understand things like:

How users call the stored procedure normally
If the problem is parameter sniffing
Which queries in the stored procedure cause the biggest problems

We can do that like so:

EXEC sp_HumanEvents 
    @event_type = 'query', 
    @query_duration_ms = 5000, 
    @keep_alive = 1,
    @object_schema = 'dbo',
    @object_name = 'TheWorstStoredProcedureEverWritten';

This will only collect sessions executing a single procedure. I’ll sometimes do this and work through the list.

Hoof

There are some slight differences in how I call the procedure in different circumstances.

When I use the @seconds_sample parameter, sp_HumanEvents will run for that amount of time and then spit out a result
When I use the @keep_alive parameter, all that happens is a session gets created and you need to go watch live data like this:

Just make sure you do that before you start running your query, or you might miss something important.

Thanks for reading!

Going Further

SQL Server 2022 CTP 2.1 Improvements To Parameter Sensitive Plan Optimization

Posted on August 15, 2022August 1, 2022 by Erik Darling

Hop On Your Good Foot

Several weeks back, I blogged about a missed opportunity with the new parameter sensitive plan feature. At the time, I thought that there was indeed sufficient skewness available to trigger the additional plan variants, and apparently some nice folks at Microsoft agreed.

If we step back through the old demo, we’ll get different results.

EXEC dbo.VoteSniffing 
    @VoteTypeId = 4;

EXEC dbo.VoteSniffing 
    @VoteTypeId = 2;

I’m not gonna go into a lot of depth here, because I’m writing this while traveling, but we get the (mostly) desired outcome of two different plans being generated. Each plan is more suitable to the amount of data that the query has to process.

I’d consider this a success, even if it adds more queries to the unidentifiable corpse-pile.

Thanks for reading!

Going Further

SQL Server 2022: Introduces the DATETRUNC Function, So You Can Truncate Your Dates And Stuff

Posted on August 11, 2022July 30, 2022 by Erik Darling

Underscoring The Importance

When I first sat down to write about this, I made a funny mistake: I kept writing DATE_TRUNC over and over again.

In SQL Server it’s DATETRUNC.

Why? Because that’s the way it’s implemented in Postgres and DB2. Oracle, of course, just calls it TRUNC.

So, while it’s nice to have (what appears to be) the same behavior, it doesn’t exactly help to not have a 1:1 calling equivalent to other platforms.

I assume most of these additions to T-SQL are for cross-platform development and migrations.

Of course, Microsoft being so gosh darn late to this game means folks have probably been rolling-their-own versions of these functions for years.

If they went and called their system function DATE_TRUNC or even TRUNC, they might have some object naming issues to contend with.

Well, okay. But how does it work?

Childish Games

Here are some quick examples of how you call it.

SELECT TOP (10)
    u.DisplayName,
    year = 
        DATETRUNC(YEAR, u.LastAccessDate),
    quarter = 
        DATETRUNC(QUARTER, u.LastAccessDate),
    month = 
        DATETRUNC(MONTH, u.LastAccessDate),
    dayofyear = 
        DATETRUNC(DAYOFYEAR, u.LastAccessDate),
    day = 
        DATETRUNC(DAY, u.LastAccessDate),
    week = 
        DATETRUNC(WEEK, u.LastAccessDate),
    iso_week = 
        DATETRUNC(ISO_WEEK, u.LastAccessDate),
    hour = 
        DATETRUNC(HOUR, u.LastAccessDate),
    minute = 
        DATETRUNC(MINUTE, u.LastAccessDate),
    second = 
        DATETRUNC(SECOND, u.LastAccessDate),
    millisecond = 
        DATETRUNC(MILLISECOND, u.LastAccessDate),
    microsecond = 
        DATETRUNC(MICROSECOND, u.LastAccessDate) /*Doesn't work with datetime because there are no microseconds*/
FROM dbo.Users AS u;

And here are the results:

The thing to note here is that there’s no rounding logic involved. You just go to the start of whatever unit of time you choose. Of course, this doesn’t seem to do anything to the millisecond portion of DATETIME, because it’s not precise enough.

But for anyone out there who was hoping for a SOMONTH function to complement the EOMONTH function, you get this instead.

Works well enough!

But does it perform, Darling?

UnSARGable?

To make any test like this worthwhile, we need an index to make data searchable.

CREATE INDEX
    v
ON dbo.Votes
    (CreationDate)
WITH
(
    SORT_IN_TEMPDB = ON,
    DATA_COMPRESSION = PAGE
);

You know, because that’s what they do. To make searching faster. Hello.

So look, under these perfect circumstances, everything performs well. But we have to do a lot of typing.

SELECT
    c = COUNT_BIG(*)
FROM dbo.Votes AS v
WHERE v.CreationDate >= DATETRUNC(YEAR, CONVERT(datetime, '20130101 00:00:00.000'));

Note here that we’re working on a literal value, not a column value, and we have to tell the datetrunc function which type we want via the convert function so that we get a simple seek plan:

Without all that, we get a dynamic seek plan:

SELECT
    c = COUNT_BIG(*)
FROM dbo.Votes AS v
WHERE v.CreationDate >= DATETRUNC(YEAR, '20130101 00:00:00.000');

This has some… obvious performance issues compared to the above plan with correct data types.

Query Pranks

Frequent blog readers will not be surprised that wrapping a table column in the new DATETRUNC function yields old performance problems:

SELECT
    c = COUNT_BIG(*)
FROM dbo.Votes AS v
WHERE DATETRUNC(YEAR, v.CreationDate) >= CONVERT(datetime, '20130101 00:00:00.000');

This is particularly annoying because we’re truncating the column to the beginning of the year, which should be searchable in the index since that’s the sort order of the data in the index.

Like most functions, these are fine in the presentation layer, but terrible in the relational layer. There’s no warning about this performance degradation in the documentation, either at the example of using the function in a where clause, or in the final closing remarks.

But that’s par for the course with any of these built-in functions.

Thanks for reading!

Going Further

SQL Server 2017 CU 30: The Real Story With SelOnSeqPrj Fixes

Posted on July 19, 2022July 17, 2022 by Erik Darling

I am a heading

Thanks for watching! Demo scripts below.

Demo Scripts

USE StackOverflow2013;
EXEC dbo.DropIndexes;
SET NOCOUNT ON;
DBCC FREEPROCCACHE;
GO 


CREATE INDEX 
   chunk 
ON dbo.Posts 
    (OwnerUserId, Score DESC) 
INCLUDE  
    (CreationDate, LastActivityDate)
WITH
    (MAXDOP = 8, SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);
GO 

CREATE OR ALTER VIEW 
    dbo.PushyPaul
WITH SCHEMABINDING
AS
    SELECT 
        p.OwnerUserId,
        p.Score,
        p.CreationDate,
        p.LastActivityDate,
        PostRank = 
            DENSE_RANK() OVER
            ( 
               PARTITION BY 
                  p.OwnerUserId 
               ORDER BY     
                  p.Score DESC 
            )
    FROM dbo.Posts AS p;
GO 

SELECT 
    p.* 
FROM dbo.PushyPaul AS p
WHERE p.OwnerUserId = 22656;
GO 

CREATE OR ALTER PROCEDURE 
    dbo.StinkyPete 
(
    @UserId int
)
AS 
SET NOCOUNT, XACT_ABORT ON;
BEGIN
    SELECT 
        p.* 
    FROM dbo.PushyPaul AS p
    WHERE p.OwnerUserId = @UserId;
END;
GO 

EXEC dbo.StinkyPete 
    @UserId = 22656;



/*Start Here*/

ALTER DATABASE 
    StackOverflow2013 
SET PARAMETERIZATION SIMPLE;

DBCC TRACEOFF
(
    4199, 
    -1
);

ALTER DATABASE SCOPED CONFIGURATION 
    SET QUERY_OPTIMIZER_HOTFIXES = OFF;


SELECT 
    p.* 
FROM dbo.PushyPaul AS p
WHERE p.OwnerUserId = 22656
AND   1 = (SELECT 1); /*Avoid trivial plan/simple parameterization*/


/*Let's cause a problem!*/
ALTER DATABASE 
    StackOverflow2013 
SET PARAMETERIZATION FORCED;

SELECT 
    p.* 
FROM dbo.PushyPaul AS p
WHERE p.OwnerUserId = 22656
AND   1 = (SELECT 1); /*Avoid trivial plan/simple parameterization*/


/*Can we fix the problem?*/
DBCC TRACEON
(
    4199, 
    -1
);


SELECT 
    p.* 
FROM dbo.PushyPaul AS p
WHERE p.OwnerUserId = 22656
AND   1 = (SELECT 1); /*Avoid trivial plan/simple parameterization*/


/*That's kinda weird...*/
DBCC FREEPROCCACHE;


SELECT 
    p.* 
FROM dbo.PushyPaul AS p
WHERE p.OwnerUserId = 22656
AND   1 = (SELECT 1); /*Avoid trivial plan/simple parameterization*/


/*Turn Down Service*/
DBCC TRACEOFF
(
    4199, 
    -1
);

SELECT 
    p.* 
FROM dbo.PushyPaul AS p
WHERE p.OwnerUserId = 22656
AND   1 = (SELECT 1); /*Avoid trivial plan/simple parameterization*/


/*Okay then.*/


/*I'm different.*/
ALTER DATABASE SCOPED CONFIGURATION 
    SET QUERY_OPTIMIZER_HOTFIXES = ON;


SELECT 
    p.* 
FROM dbo.PushyPaul AS p
WHERE p.OwnerUserId = 22656
AND   1 = (SELECT 1); /*Avoid trivial plan/simple parameterization*/



/*Cleanup*/
ALTER DATABASE 
    StackOverflow2013 
SET PARAMETERIZATION SIMPLE;

ALTER DATABASE SCOPED CONFIGURATION 
    SET QUERY_OPTIMIZER_HOTFIXES = OFF;

DBCC TRACEOFF
(
    4199, 
    -1
);

Going Further

SQL Server 2022 Parameter Sensitive Plan Optimization: Does Not Care To Fix Your Local Variable Problems

Posted on July 8, 2022June 24, 2022 by Erik Darling

–To Fix Parameter Sniffing

There are some code comments you see that really set the stage for how tuning a query is going to go.

Usually one misgiving about how SQL Server works gives way to a whole levee-breaking bevy of other ones and three days later you can’t feel your legs but dammit it’s done.

Okay, maybe it was three hours, but it felt like three days. Something about the gravitation pull of these black hole queries.

One fix I’ve been wishing for, or wish I’ve been fixing for, is a cure for local variables. I’d even be cool if Forced Parameterization was that cure, but you know…

Time will tell.

Husk

Let’s say we’ve got this stored procedure, which does something similar to the “I’m gonna fix parameter sniffing with a local variable hey why is everything around me turning to brimstone before my very eyes?” idea, but with… less of an end-of-times vibe.

CREATE OR ALTER PROCEDURE 
    dbo.IndexTuningMaster
( 
    @OwnerUserId int,
    @ParentId int, 
    @PostTypeId int 
)
AS
BEGIN
SET NOCOUNT, XACT_ABORT ON;

    /*Someone passed in bad data and we got a bad query plan,
      and we have to make sure that doesn't happen again*/
    
    DECLARE 
        @ParentIdFix int = 
            CASE 
                WHEN @ParentId < 0 
                THEN 0 
                ELSE @ParentId 
            END;
    
    SELECT TOP (1) 
        p.*
    FROM dbo.Posts AS p
    WHERE p.ParentId = @ParentIdFix
    AND   p.PostTypeId = @PostTypeId
    AND   p.OwnerUserId = @OwnerUserId
    ORDER BY 
        p.Score DESC, 
        p.Id DESC;

END;

How bad could a top 1 query be, anyway?

Fortune Teller

When we run this query like so and so:

EXEC dbo.IndexTuningMaster 
    @OwnerUserId = 22656, 
    @ParentId = 0, 
    @PostTypeId = 1;

EXEC dbo.IndexTuningMaster 
    @OwnerUserId = 22656, 
    @ParentId = 184618, 
    @PostTypeId = 2;

We come up with zip zero zilch none nada:

We get a super low guess for both. obviously that guess hurts a large set of matched data far worse than a small one, but the important thing here is that both queries receive the same bad guess.

This is a direct side effect of the local variable’s poor estimate, which PSP isn’t quite yet ready to go up against.

Thanks for reading!