Query Tuning – Page 2

SQL Server 2022 Parameter Sensitive Plan Optimization: Sometimes There’s Nothing To Fix

Posted on July 6, 2022June 24, 2022 by Erik Darling

Best Intentions

After seeing places where the Parameter Sensitive Plan (PSP) optimization quite stubbornly refuses to kick in, it’s somewhat amusing to see it kick in where it can’t possibly have any positive impact.

Even though some parameters are responsible for filtering on columns with highly skewed data, certain other factors may be present that don’t allow for the type of plan quality issues you might run into under normal parameter sensitivity scenarios:

Adequate indexing
Row goals
Other filtering elements

This isn’t to say that they can always prevent problems, but they certainly tend to reduce risks much of the time.

If only everything were always ever perfect, you know?

Setup

Let’s start by examining some data in the Posts table.

First, PostTypeIds:

Questions and answers are the main types of Posts. The data is clearly skewed, here, and in my testing this does qualify for PSP on its own.

The thing is, there are several attributes that Questions can have that Answers can’t. One of those is a ParentId. Looking through how the top 15 or so of those counts breaks down:

Okay, so! Wikis don’t have ParentIds, neither do Moderator Nominations. More importantly, Questions don’t.

The Question with the Most answers is Id 184618, with 518. A far cry from the next-nearest Post Types, and light years from the number of Questions with a ParentId of zero.

More important than loving your data is knowing your data.

To Query A Butterfly

Let’s say we have this query:

SELECT TOP (5000)
    p.Id,
    p.OwnerUserId,
    p.Score
FROM dbo.Posts AS p
WHERE p.PostTypeId = @po
AND   p.ParentId = @pa
ORDER BY 
    p.Score DESC;

The three things we care about getting done are:

Filtering to PostTypeId
Filtering to ParentId
Ordering by Score

Either of these indexes would be suitable for that:

CREATE INDEX 
    popa
ON dbo.Posts
(
    PostTypeId,
    ParentId,
    Score DESC
)
WITH
(
    SORT_IN_TEMPDB = ON,
    DATA_COMPRESSION = PAGE
);

CREATE INDEX 
    papo
ON dbo.Posts
(
    ParentId,
    PostTypeId,
    Score DESC
)
WITH
(
    SORT_IN_TEMPDB = ON,
    DATA_COMPRESSION = PAGE
);

With No PSP At All

Under compatibility level 150, we can run the query in a variety of ways and get nearly identical performance results:

SQL Server Query Plan — PostTypeId = 1, ParentId = 0

There’s a 27 millisecond difference between the two to find the first 5000 rows that match both predicates. You would have to run these in a very long loop to accumulate a meaningful overall difference.

In this case, both queries use and reuse the same execution plan. You can see that in the estimates.

With All The PSP

Switching to compat level 160, the queries are injected with the PLAN PER VALUE hint.

SELECT TOP (5000)
    p.Id,
    p.OwnerUserId,
    p.Score
FROM dbo.Posts AS p
WHERE p.PostTypeId = @po
AND   p.ParentId = @pa
ORDER BY 
    p.Score DESC 
OPTION 
(
    PLAN PER VALUE
    (
        QueryVariantID = 2, 
        predicate_range
        (
            [StackOverflow2013].[dbo].[Posts].[PostTypeId] = @po, 
            100.0, 
            10000000.0
        )
    )
)

SELECT TOP (5000)
    p.Id,
    p.OwnerUserId,
    p.Score
FROM dbo.Posts AS p
WHERE p.PostTypeId = @po
AND   p.ParentId = @pa
ORDER BY 
    p.Score DESC 
OPTION 
(
    PLAN PER VALUE
    (
        QueryVariantID = 3, 
        predicate_range
        (
            [StackOverflow2013].[dbo].[Posts].[PostTypeId] = @po, 
            100.0, 
            10000000.0
        )
    )
)

The thing is, both queries end up with identical execution times to when there was no PSP involved at all.

In other words, there is no parameter sensitivity in this scenario, despite there being skew in the column data.

Even searching for the “big” result — Questions with a ParentId of zero, finishes in <30 milliseconds.

Ah well. Gotta train the models somehow.

Thanks for reading!

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

SQL Server 2022 Parameter Sensitive Plan Optimization: Does PSP Work With Dynamic SQL?

Posted on July 4, 2022July 7, 2022 by Erik Darling

No, Really

When I talk to clients about using dynamic SQL, they’re usually under the misconception that those plans can’t get reused.

That may be true under some circumstances when:

It’s not properly parameterized
You use EXEC only and not sp_executesql

Under more favorable circumstances, dynamic SQL gets run, executed, and plans cached and reused with the same frequency as stored procedures.

Now, dynamic SQL isn’t exactly the same as stored procedures. There’s a lot you can do with those that just looks a mess in dynamic SQL, especially longer bits of code.

In today’s post, we’re going to look at how the Parameter Sensitive Plan (PSP) optimization works with dynamic SQL.

Bright, Sunshiny

I just learned how to spell “sunshiny”. Don’t let anyone ever tell you there’s nothing left to learn.

To keep up the sunshiny visage of today’s post, let’s get a TL;DR here: PSP does work with parameterized dynamic SQL.

Here’s an example, using a query with a parameter eligible for the PSP optimization.

DECLARE
    @sql nvarchar(MAX) = 
        N'',
    @parameters nvarchar(MAX) = 
        N'@ParentId int';

SELECT 
    @sql += N'
SELECT
    c = COUNT_BIG(*)
FROM dbo.Posts AS p
WHERE p.ParentId = @ParentId;
';

EXEC sys.sp_executesql
    @sql,
    @parameters,
    0;

EXEC sys.sp_executesql
    @sql,
    @parameters,
    184618;

Both executions here get the option(plan per value... text at the end that indicates PSP kicked in, along with different query plans as expected.

Being Dense

Writing the not-good kind of dynamic SQL, like so:

SELECT 
    @sql = N'
SELECT
    c = COUNT_BIG(*)
FROM dbo.Posts AS p
WHERE p.ParentId = ' + CONVERT(nvarchar(11), 0) + ';';

You will of course get different execution plans, but you’ll get a new execution plan for every different value that gets passed in. You will not get the PSP optimization.

This is not a good example of how you should be writing dynamic SQL. Please don’t do this, unless you have a good reason for it.

Anyway, this is good news, especially for parameterized ORM queries that currently plague many systems in crisis that I get to see every week.

Fun.

Thanks for reading!

Going Further

SQL Server 2022: Cardinality Estimation Feedback

Posted on June 23, 2022June 23, 2022 by Erik Darling

Quiet As Kept

I’ve been trying to take the general temperature when it comes to SQL Server 2022. At least from a performance perspective, some interesting things have been introduced so far.

There have been a few neat things:

Parameter Sensitive Plan optimizations
Query Store Hints
Memory Grant Feedback improvements
DOP Feedback
Cardinality Estimation Feedback

I’m not seeing a whole lot out there. I’m not sure why. I follow quite a few SQL bloggers via Feedly.

Perhaps it’s just too new. Maybe everyone is waiting for CTP SP1.

Well, anyway. In this post I want to talk a little bit about what Cardinality Estimation Feedback can do, and what it can’t do.

What It Do

First, you need Query Store enabled to get this to work. It relies on the Query Store Plan hints also introduced for SQL Server 2022.

For queries that execute frequently and retain cached plans, the optimizer will look at some of the assumptions that get made under different Cardinality Estimation models.

Things like:

Row Goals
Predicate independence/correlation
Join containment being simple or base

What each of those things means isn’t terribly important to the post, but all of them are things that are influenced by using the legacy or default cardinality estimators.

As I understand it, this is a bit like Memory Grant Feedback. If estimation issues are detected, a different plan will be attempted. If that plan corrects a performance issue, then the hint will get persisted in Query Store.

Pretty cool, but…

What It Don’t Do

It doesn’t fix things while they’re running, like Adaptive Joins can do. That’s sort of unfortunate! Hear me out on why.

Often, when model errors are incorrect, queries run for a long time. Particularly when row goals are introduced, query plans are quite sensitive to those goals not being met quickly.

It’d be really unfortunate for you to sit around waiting for 15-16 executions of a poor performing query to finish executing before an intervention happens.

I would have either:

Reduced, or made this threshold configurable
Been more aggressive about introducing Adaptive Joins when CE models influence plan choices

After all, Adaptive Joins help queries at runtime rather than waiting for an arbitrary number of executions and then stepping in.

Perhaps there was a good reason for not doing this, but those were the first two things to cross my mind when looking into the feature.

How It Do

I was able to get the feature to kick in using a familiar query.

Here’s the setup script:

DBCC FREEPROCCACHE;
ALTER DATABASE 
    StackOverflow2010 
SET 
    QUERY_STORE CLEAR;
GO

    CREATE INDEX whatever 
        ON dbo.Votes(CreationDate, VoteTypeId, PostId);
    
    CREATE INDEX apathy
        ON dbo.Posts (PostTypeId)
            INCLUDE (OwnerUserId, Score, Title);
GO

    SELECT TOP (2500) 
        p.OwnerUserId, 
        p.Score, 
        p.Title, 
        v.CreationDate,
        ISNULL(v.BountyAmount, 0) AS BountyAmount
    FROM dbo.Posts AS p
    JOIN dbo.Votes AS v
        ON  p.Id = v.PostId
    WHERE v.VoteTypeId = 1
    AND   p.PostTypeId = 1
    ORDER BY v.CreationDate DESC;
    GO 17

SELECT qspf.* FROM sys.query_store_plan_feedback AS qspf;

SELECT qsqh.* FROM sys.query_store_query_hints AS qsqh;

For the first 16 runs, we get the same query plan that takes about 2 seconds.

Then, magically, on run #17, we get a different query plan!

Pretty cool! The plan totally changed, and clearly got better. I am happy about this. Not so happy that it would have taken 16 executions of a Potentially Painful© query to get here, but you know.

Here we are.

In Query Store

There are a couple views that will detail where hints came from and which were applied:

Since I just cleared out query store prior to this running, we can infer some things:

CE Feedback kicked in and gave us a new plan with a hint to disable row goals
The second plan generated was identified by the engine as needing memory grant feedback

I suppose this is a good reason to do some work on sp_QuickieStore so y’all can see this stuff in action.

Thanks for reading!

Going Further

What’s The Point Of DOP Feedback In SQL Server 2022?

Posted on June 21, 2022June 29, 2022 by Erik Darling

Loud Engine

If you’re ever on a long flight and want something to fall asleep to, ask a DBA how to set MAXDOP.

Sometimes I even ask myself that question when I’m tossing and turning at night.

There are a lot of things to consider when fixing settings globally across a workload. For parallelism, it’s less about individual query performance, and more about overall server/workload performance

After all, letting every query go as absolutely parallel as possible is only good up to a point; that point is usually when you start regularly running out of worker threads, or your CPUs could double as crematoriums.

Setting MAXDOP is about limiting the damage that a parallel workload can do to a server. The expectation is that a query running at DOP 8 will run 8x faster than a query running at DOP 1.

But setting MAXDOP for every query isn’t something you catch even the most persnickety performance tuners doing. Perhaps some of the more critical ones, but you know…

Let Me Rust

I’m not going to demo DOP feedback in this post, I’m just going to show you the situation that it hopes to improve upon.

To do that, I’m going to run a simple aggregation query at different degrees of parallelism, and show you the changes in query timing.

At DOP 1:

The query runs for 1.1 seconds, with 886ms consumed while scanning the Posts table.

At DOP 2:

The query runs just about twice as fast, starting with the scan of the Posts table taking about half as long. This is good scaling. Add one CPU, go twice as fast as you did with one CPU.

At DOP 4:

The gets about twice as fast again! The scan of the Posts table is now down to 263ms, and the query in total is at 330ms. Adding in two more cores seems a good choice, here.

At DOP 8:

The query no longer continues to get 2x faster. This isn’t a knock against DOP 8 in general; my query just happens to hit a wall around DOP 4. With 4 additional CPUs, we only save ~130ms at the end of the day.

Why This Is Cool

This new feature will help DBAs have to worry less about getting MAXDOP absolutely right across the board. Who knows, we may even see a day where MAXDOP is left at zero.

But you’d never skip that installer step, would you?

Anyway, for anyone out there who is paranoid about setting DOP too high, this should help your queries find a more approximately-good middle ground.

Hopefully it works as advertised.

Thanks for reading!

Going Further

Finding Query Store Queries With Missing Index Requests In SQL Server 2019

Posted on June 7, 2022May 18, 2022 by Erik Darling

All The Pretty Little Caveats

I’ve said quite a bit about missing index request utility generally in SQL Server, even as recently as last week!

But then I got a user question about using Query Store to do something similar, so here goes.

If you need a pre-2019 way to do this with Query Store, Kendra Little has a blog post about that here.

WITH
    queries AS
(
    SELECT TOP (100)
        parent_object_name = 
            ISNULL
            (
                OBJECT_NAME(qsq.object_id),
                'No Parent Object'
            ),
        qsqt.query_sql_text,
        query_plan = 
            TRY_CAST(qsp.query_plan AS xml),
        qsrs.first_execution_time,
        qsrs.last_execution_time,
        qsrs.count_executions,
        qsrs.avg_duration,
        qsrs.avg_cpu_time,
        qsp.query_plan_hash,
        qsq.query_hash
    FROM sys.query_store_runtime_stats AS qsrs
    JOIN sys.query_store_plan AS qsp
        ON qsp.plan_id = qsrs.plan_id
    JOIN sys.query_store_query AS qsq
        ON qsq.query_id = qsp.query_id
    JOIN sys.query_store_query_text AS qsqt
        ON qsqt.query_text_id = qsq.query_text_id
    WHERE qsrs.last_execution_time >= DATEADD(DAY, -7, SYSDATETIME())
    AND   qsrs.avg_cpu_time >= (10 * 1000)
    AND   qsq.is_internal_query = 0
    AND   qsp.is_online_index_plan = 0
    ORDER BY qsrs.avg_cpu_time DESC
)
SELECT
    qs.*
FROM queries AS qs
CROSS APPLY
(
    SELECT TOP (1)
        gqs.*
    FROM sys.dm_db_missing_index_group_stats_query AS gqs
    WHERE qs.query_hash = gqs.query_hash
    AND   qs.query_plan_hash = gqs.query_plan_hash
    ORDER BY
        gqs.last_user_seek DESC,
        gqs.last_user_scan DESC
) AS gqs
ORDER BY qs.avg_cpu_time DESC
OPTION(RECOMPILE);

I don’t love this query, because I don’t love querying Query Store views. That’s why I wrote sp_QuickieStore to make it a whole lot easier.

But anyway, this will get you a similar bunch of information.

Thanks for reading!

Going Further

Comparing Query CPU And Logical Reads Using The Plan Cache In SQL Server

Posted on June 6, 2022April 4, 2024 by Erik Darling

Let’s Not Kid Ourselves

There are lots of things that can go wrong with SQL Server, performance-wise. Just about anything can fly off the handle.

Recently, I was handing out a passing thought on Twitter that caught the attention of the beloved and venerable Michael J Swart (b|t):

logical reads are the new fragmentation.

— Erik Darling Data (@erikdarlingdata) May 17, 2022

We got to Twalking a little bit about why I said that, too, and he found what I usually find when I start comparing things.

Hmmm… not as correlated as I thought pic.twitter.com/FTRcrjhWTE

— Michael J Swart (@MJSwart) May 17, 2022

Scripty Kid

I decided to expand on some scripts to look at how queries use CPU and perform reads, and found some really interesting stuff. I’ll talk through some results and how I’d approach tuning them afterwards.

Here are the queries:

/*Queries that do no logical reads, but lots of CPU work*/
SELECT TOP (100)
    total_logical_reads = 
        FORMAT(qs.total_logical_reads, 'N0'), 
    total_worker_time_ms = 
        FORMAT(qs.total_worker_time / 1000., 'N0'),
    execution_count = 
        FORMAT(qs.execution_count, 'N0'),
    query_text = 
        SUBSTRING
        (
            st.text,
            qs.statement_start_offset / 2 + 1,
            CASE qs.statement_start_offset
                 WHEN -1 
                 THEN DATALENGTH(st.text)
                 ELSE qs.statement_end_offset
            END - qs.statement_start_offset / 2 + 1
        ),
    qp.query_plan
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS st
CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) AS qp
WHERE qs.total_logical_reads = 0
AND   qs.total_worker_time > 5000
ORDER BY qs.total_worker_time DESC;

/*Queries that do 2x more reads than CPU work*/
SELECT TOP (100)
    total_logical_reads = 
        FORMAT(qs.total_logical_reads, 'N0'), 
    total_worker_time_ms = 
        FORMAT(qs.total_worker_time / 1000., 'N0'),
    execution_count = 
        FORMAT(qs.execution_count, 'N0'),
    query_text = 
        SUBSTRING
        (
            st.text,
            qs.statement_start_offset / 2 + 1,
            CASE qs.statement_start_offset
                 WHEN -1 
                 THEN DATALENGTH(st.text)
                 ELSE qs.statement_end_offset
            END - qs.statement_start_offset / 2 + 1
        ),
    qp.query_plan
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS st
CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) AS qp
WHERE qs.total_logical_reads > (qs.total_worker_time * 2)
ORDER BY qs.total_logical_reads DESC;

/*Queries that do 4x more CPU work than reads*/
SELECT TOP (100)
    total_logical_reads = 
        FORMAT(qs.total_logical_reads, 'N0'), 
    total_worker_time_ms = 
        FORMAT(qs.total_worker_time / 1000., 'N0'),
    execution_count = 
        FORMAT(qs.execution_count, 'N0'),
    query_text = 
        SUBSTRING
        (
            st.text,
            qs.statement_start_offset / 2 + 1,
            CASE qs.statement_start_offset
                 WHEN -1 
                 THEN DATALENGTH(st.text)
                 ELSE qs.statement_end_offset
            END - qs.statement_start_offset / 2 + 1
        ),
    qp.query_plan
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS st
CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) AS qp
WHERE qs.total_worker_time > (qs.total_logical_reads * 4)
ORDER BY qs.total_worker_time DESC;

Resultant

A quick note about these is that the comparison between CPU and logical reads happens in the where clause, and I convert CPU time to milliseconds in the select list.

That might make the number look a little funny, but it makes them somewhat more easy to understand than microseconds in the grand scheme of things.

First, queries that do no logical reads but use CPU time:

A lot of these were functions that processed input but didn’t touch data. Assembling and splitting string lists, XML, other variable assignment tasks, and occasionally DMV queries.

The “easy” button here is to stop using scalar and multi-statement functions so much. Those execution counts are hideous.

Second, queries that do 2x more reads than CPU work:

I only found six of these, while the other two categories easily found the 100 row goal.

The queries in here largely seemed to either be:

Insert queries to temporary objects
Queries with parameter sniffing issues

Looking at these, the problem was largely the optimizer choosing Nested Loops joins when it really shouldn’t have. The worst part was that it wasn’t an indexing issue — every single one of these queries was doing seeks across the board — they were just happening in a serial plan, and happening way more than the optimizer estimated they would. Perhaps this is something that Adaptive Joins or Batch Mode more generally could have intervened in.

Third, queries that do 2x more CPU work than reads:

These queries were far more interesting from a tuning perspective, because there were obvious ineffiencies:

No good indexes to use
Large scans because of non-SARGable predicates
Predicate Key Lookups

But the important thing here is that these queries were able to do a lot of logical reads quickly — data they needed was already in memory — and just push the hell out of CPUs.

These are the queries you can have a field day fixing and making people happy.

Residuals

This selection of query results is why I tend to ignore logical reads and focus on CPU. I do still look at things like physical reads, and select queries that do suspicious amounts of writes.

Physical reads means going to disk, and disk is your mortal enemy
Select queries doing writes often indicate spools and spills, which can also be pretty bad

You may not like it, but this is what peak performance tuner looks like.

Thanks for reading!

Going Further

What Do Missing Index Requests Really Mean In SQL Server?

Posted on June 2, 2022May 18, 2022 by Erik Darling

Scripts Ahoy

There are a million scripts out there that will give you all of the missing index requests for a database (or even a whole server).

Some will even try to prioritize based on the metrics logged along with each request.

Right now, most of you get:

Uses: How many times a query compiled that could have used the index
Average Query Cost: A unit-less cost used by the optimizer for choosing a plan
Impact: A metric relative to the unit-less cost of the operator the index will help

Breaking each of those down, the only one that has a concrete meaning is Uses, but that of course doesn’t mean that a query took a long time or is even terribly inefficient.

That leaves us with Average Query Cost, which is the sum of each operator’s estimated cost in the query plan, and Impact.

But where does Impact come from?

Impactful

Let’s look at a query plan with a missing index request to figure out what the Impact metric is tied to.

Here’s the relevant part of the plan:

And here’s the missing index request:

/*
The Query Processor estimates that implementing the following index could improve the query cost by 16.9141%.
*/

/*
USE [StackOverflow2013]
GO
CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[Comments] ([Score])
INCLUDE ([PostId])
GO
*/

Here’s the breakdown:

The optimizer estimates that hitting the Comments table will cost 762 query bucks, which is 17% of the total plan cost
The optimizer further estimates that hitting the Comments table with the suggested index will reduce the total plan cost by 16.9%

Here’s the relevant properties from the scan of the Comments table:

What I want you to take away from this is that, while hitting the Comments table may be 17% of the plan’s total estimated cost, the time spent scanning that index is not 17% of the plan’s total execution time, either in CPU or duration.

You can see in the screenshot above that it takes around 450ms to perform the full scan of 24,534,730 rows.

Doubtful

In full, this query runs for around 23 seconds:

The estimated cost of hitting the Comments tables is not 17% of the execution time. That time lives elsewhere, which we’ll get to.

In the meantime, there are two more egregious problems to deal with:

The optimizer severely miscalculates the cost of scanning the Posts table at 70% (note the 860ms time here):

2. It buries other missing index requests in the properties of the root operator:

SQL Server Missing Index Request — train tracks

Now, there are two other missing index requests listed here that are a) of higher “impact” b) not ordered by that impact number and c) even if both a and b were true, we know that adding those indexes would not substantially reduce the overall runtime of the stored procedure.

Assuming that we added every single missing index here, at best we would reduce the estimated cost of the plan by 87%, while only reducing the actual execution time of the plan by about 1.3 seconds out of 23 seconds.

Not a big win, here.

Hurtful

Examining where time is spent in this plan, this branch will stick out as the dominating factor:

Some interesting things to note here, while we’re talking about interesting things:

The scan of the Badges table takes 1.4 seconds, and has an estimated cost of 1%
The estimated cost of the eager index spool is 11%, but accounts for 20 seconds of elapsed time (less the 1.4 seconds for the scan of Badges)
There was no missing index request generated for the Badges table, despite the optimizer creating one on the fly

This is a bit of the danger in creating missing index requests without first validating which queries generated them, and where the benefit in having them would be.

In tomorrow’s post, we’ll look at how SQL Server 2019 makes figuring this stuff out easier.

Thanks for reading!

Going Further

Proving That SQL Server T-SQL Scalar UDFs Execute Once Per Row

Posted on June 1, 2022June 1, 2022 by Erik Darling

Scripted Content

If you use this script, you can follow along with the results.

The sys.dm_exec_function_stats DMV is only available in SQL Server 2016+, so if you’re on an earlier version than that, you should focus your energies on upgrading.

I’m sure you’ve just been busy for the last five years.

/*Context is everything*/
USE master;
GO 

/*Piece of crap function*/
CREATE OR ALTER FUNCTION 
    dbo.useless_function(@dbid int)
RETURNS sysname
AS
BEGIN
    DECLARE /*This is to ruin scalar UDF inlining*/
        @gd datetime = GETDATE();

    RETURN DB_NAME(@dbid);
END;
GO 

/*Just in the select list*/
SELECT
    database_name = 
        dbo.useless_function(d.database_id),
    total_databases = 
        (SELECT c = COUNT_BIG(*) FROM sys.databases AS d2)
FROM sys.databases AS d;
GO 

/*Executions here should match the count above*/
SELECT
    object_name =
        OBJECT_NAME(defs.object_id),
    defs.execution_count
FROM sys.dm_exec_function_stats AS defs
WHERE defs.object_id = OBJECT_ID('dbo.useless_function');

/*Reset the counters*/
DBCC FREEPROCCACHE;
GO 

/*Now in the where clause*/
SELECT
    mf.name,
    total_database_files = 
        (SELECT c = COUNT_BIG(*) FROM sys.master_files AS mf2)
FROM sys.master_files AS mf
WHERE mf.name = dbo.useless_function(mf.database_id)
GO 

/*Executions here should match the count above*/
SELECT
    object_name =
        OBJECT_NAME(defs.object_id),
    defs.execution_count
FROM sys.dm_exec_function_stats AS defs
WHERE defs.object_id = OBJECT_ID('dbo.useless_function');

Data Driven

On my server, I have 9 total databases and 42 total database files.

The results, therefore, look precisely and consistently like so:

Assumption

For the select list, T-SQL scalar UDFs will execute once per row projected by the query, e.g. the final resulting row count, under… Every circumstance I’ve ever seen.

In SQL server. Of course.

As a couple easy-to-digest examples. Let’s say you execute a query that returns 100 rows:

Your T-SQL scalar UDF is referenced once in the select list, so it’ll run 100 times
Your T-SQL scalar UDF is referenced twice in the select list, so it’ll run 200 times

For T-SQL scalar UDFs in other parts of a query, like:

Where Clause
Join Clause

They will execute for as many rows need to be filtered when these parts of the query are executed, for as many individual references to the function as there are in the query.

The results here may vary, depending on if there are any other predicates involved that may filter out other rows.

As a couple easy-to-digest examples:

If you use a T-SQL scalar UDF as a predicate on a million row table, it’ll execute a million times to produce a result and apply the predicate
If you do the same thing as above, but there’s another part of the where clause that filters out 500k rows, the function will only execute 500k times

All sorts of other things might change this, like if the other predicate(s) can be pushed to when the data is accessed, and if there are multiple invocations of the function.

You can see an edge case where that’s not true in this post:

https://erikdarling.com/sql-server/a-hidden-value-of-apply/

Thanks for reading!

Going Further

Putting Spinlocks In Perspective In SQL Server

Posted on May 31, 2022June 1, 2022 by Erik Darling

Get Around

The more people want to avoid fixing what’s really wrong with their server, the more they go out and find all the weird stuff that they can blame on something else (usually the product), or keep doing the same things that aren’t fixing the problem.

Spinlocks are one of those things. People will measure them, stare at them, Google them, and have no idea what to make of them, all while being sure there’s something going on with them.

I don’t want to discount when spinlocks can actually cause a problem, but you shouldn’t treat every performance problem like it’s a bridge too far from what you can solve.

Before you go ignoring all the other things that are going wrong, here’s a simple script to give you some idea if you need to follow the spinlock trail any further. Note that it might also be worth replacing the s.spins sort order with s.spins_per_collision, too.

Which sort order you choose long-term will depend on which yields numbers of interest on your system, which I can’t predict. Sorry about that.

SELECT TOP (20)
    days_uptime = 
        CONVERT(decimal(38,2), d.seconds_uptime / 86400.),
    rundate = 
        SYSDATETIME(),
    s.name,
    s.collisions,
    collisions_per_second = 
        CONVERT(bigint, s.collisions / d.seconds_uptime),
    s.spins,
    spins_per_second = 
        CONVERT(bigint, s.spins / d.seconds_uptime),
    s.spins_per_collision,
    spins_per_collision_per_second = 
        CONVERT(decimal(38,6), s.spins_per_collision / d.seconds_uptime),
    s.sleep_time,
    sleep_time_per_second = 
        CONVERT(bigint, s.sleep_time / d.seconds_uptime),
    s.backoffs,
    backoffs_per_second = 
        CONVERT(bigint, s.backoffs / d.seconds_uptime)
FROM sys.dm_os_spinlock_stats AS s 
CROSS JOIN
(
    SELECT 
        seconds_uptime = 
            DATEDIFF
            (
                SECOND, 
                d.sqlserver_start_time, 
                SYSDATETIME()
            )
    FROM sys.dm_os_sys_info AS d
) AS d
ORDER BY s.spins DESC;

Telltale

I understand that some spinlocks tend to happen in storms, and that this isn’t going to help to illuminate many situations when run in isolation. Bursty workloads, or workloads that only hit some crazy amount of spinlocks during shorter periods of high activity might escape it.

It can help you put the number of spinlocks you’re hitting in perspective compared to uptime, though.

If you see any numbers in the results that still make you say the big wow at your screen, you can easily log the output to a table every X minutes to gather more detail on when it’s happening.

Once you figure out when any potentially large spikes in spinlocks are occurring, you can match that up with:

Any independent query logging you’re doing
The plan cache, if it’s reliable
Query Store, if you’re smart enough to turn it on
Your monitoring tool data

Which should tell you which queries were executing at the time. I’d probably look for any high CPU effort queries, since those tend to be the spinlockiest in my experience.

Thanks for reading!

Going Further

Using Views To Reduce Memory Grants In SQL Server

Posted on May 11, 2022May 16, 2022 by Erik Darling

We All Have It

You know those tables, right? The ones where developers went and got lazy or didn’t know any better and decided every string column was going to be gigantic.

They may have read, of course, that SQL Server’s super-smart variable length data types only consume necessary space.

It’s free real estate.

Except it isn’t, especially not when it comes to query memory grants.

The bigger a string column’s defined byte length is, the bigger the optimizer’s memory grant for it will be.

Memory Grant Primer

In case you need some background, the short story version is:

All queries ask for some memory for general execution needs
Sorts, Hashes, and Optimized Nested Loops ask for additional memory grants
Memory grants are decided based on things like number of rows, width of rows, and concurrently executing operators
Memory grants are divided by DOP, not multiplied by DOP
By default, any query can ask for up to 25% of max server memory for a memory grant
Approximately 75% of max server memory is available for memory grants at one

Needless to say, memory grants are very sensitive to misestimates by the optimizer. Going over can be especially painful, because that memory will most often get pulled from the buffer pool, and queries will end up going to disk more.

Underestimates often mean spills to disk, of course. Those are usually less painful, but can of course be a problem when they’re large enough. In particular, hash spills are worth paying extra attention to.

Memory grant feedback does supply some relief under modern query execution models. That’s a nice way of saying probably not what you have going on.

Query Noogies

Getting back to the point: It’s a real pain in the captain’s quarters to modify columns on big tables, even if it’s reducing the size.

SQL Server’s storage engine has to check page values to make sure you’re not gonna lose any data fidelity in the process. That’ a nice way of saying you’re not gonna truncate any strings.

But if you do something cute like run a MAX(LEN(StringCol) and see what you’re up against, you can use a view on top of your table to assuage SQL Server’s concerns about such things.

After all, functions are temporary. Data types are forever (usually).

An easy way to illustrate what I mean is to look at the details of these two queries:

SELECT TOP (1000)
    p.Body
FROM dbo.Posts AS p
ORDER BY p.Score DESC
OPTION(RECOMPILE);

SELECT TOP (1000)
    Body = 
        CONVERT
        (
            nvarchar(100), 
            p.Body
        )
FROM dbo.Posts AS p
ORDER BY p.Score DESC
OPTION(RECOMPILE);

Some of this working is dependent on the query plan, so let’s look at those.

Pink Belly Plans

You can ignore the execution times here. The Body column is not a good representation of an oversized column.

It’s defined as nvarchar(max), but (if I’m remembering my Stack lore correctly) is internally limited to 30k characters. Many questions and answers are longer than 100 characters anyway, but on to the plans!

In the plan where the Body column isn’t converted to a smaller string length, the optimizer asks for a 16GB memory grant, and in the second plan the grant is reduced to ~3.5GB.

This is dependent on the compute scalar occurring prior to the Top N Sort operator, of course. This is where the convert function is applied to the Body column, and why the grant is reduced

If you were to build a view on top of the Posts table with this conversion, you could point queries to the view instead. That would get you the memory grant reduction without the pain of altering the column, or moving the data into a new table with the correct definition.

Thanks for reading!