T-SQL – Page 4 – Darling Data

An Annoying OPTIMIZE FOR Limitation In SQL Server

Posted on October 12, 2021May 16, 2022 by Erik Darling

Infrequent

I have occasionally cheated a little and used OPTIMIZE FOR some_value to fix a parameter sniffing issue that didn’t have any other viable options available to it.

This is a pretty rare situation, but there’s a place for everything. Keep in mind that I’m not talking about UNKNOWN here. I’m talking about a real value.

Recently I had to fix a specific problem where cardinality estimates for datetime values would get completely screwed up if they were older than a day.

You’d be tempted to call this an ascending key problem, but it was really an ascending key solution. Whenever a query got an off histogram estimate, it chose a good plan — when it got a histogram step hit, the estimate was high by several million rows, and the plan looked like someone asked for all the rows in all the databases in all the world.

So, you go through the usual troubleshooting steps:

More frequent stats updates: uh oh, lots of recompiles
Stats updates with fullscan during maintenance: crapped out during the day
Various trace flags and acts of God: Had the opposite effect
Is my query doing anything dumb? Nope.
Are my indexes eating crayons? Nope.

Drawing Board

The problem with OPTIMIZE FOR is that… it’s picky. You can’t just optimize for anything.

For example, you can’t do this:

DECLARE 
    @s datetime = '19000101'''

SELECT 
    c = COUNT_BIG(*)
FROM dbo.Users AS u
WHERE u.CreationDate >= @s
OPTION
(
    OPTIMIZE FOR (@s = GETDATE())
);

And you can’t do this:

DECLARE 
    @s datetime = '19000101',
    @d datetime = GETDATE()

SELECT 
    c = COUNT_BIG(*)
FROM dbo.Users AS u
WHERE u.CreationDate >= @s
OPTION
(
    OPTIMIZE FOR (@s = @d)
);

We get a nasty message.

Msg 320, Level 15, State 1, Line 26
The compile-time variable value for ‘@s’ in the OPTIMIZE FOR clause must be a literal.

Ever Heard Of A Chef Who Can’t Cook?

The solution is, as usual, dynamic SQL, but there’s a catch. Because there’s always a catch.

For example this works, but if you run it a minute or so apart, you get multiple plans in the cache.

DBCC FREEPROCCACHE;

DECLARE
    @sql nvarchar(MAX) = N'',
    @s datetime = '19000101',
    @d datetime = GETDATE(),
    @i int = 0;

WHILE @i < 10
BEGIN
    
    SELECT
       @sql = N'
        SELECT 
            c = COUNT_BIG(*)
        FROM dbo.Users AS u
        WHERE u.CreationDate >= @s
        OPTION
        (
            OPTIMIZE FOR (@s = ''' + CONVERT(nvarchar(30), @d) + ''')
        );     
       ';
    
    EXEC sys.sp_executesql
        @sql,
        N'@s datetime',
        @s;
    
    SELECT 
        @i += 1;
    
END

EXEC sp_BlitzCache 
    @DatabaseName = 'StackOverflow';

Are You Ready For Some Date Math?

Depending on how we want to address this, we can either:

Do some date math to go to the beginning of the current day
Do some date math to go to the end of the current day
Set the value to the furthest possible date in the future

The first two cases should generally be fine. Saying the quiet part out loud, not a lot of plans survive a long time, either due to plan cache instability or other recompilation events, like from stats updates.

If you’re super-picky about that, go with the third option. This may also be considered the safest option because a stats update might give you a histogram for today’s value. The datetime max value will theoretically “never” be a histogram step value, but that depends on if you let users do Stupid Things™

DBCC FREEPROCCACHE;

DECLARE
    @sql nvarchar(MAX) = N'',
    @s datetime = '19000101',
    @d datetime = '99991231',
    @i int = 0;

WHILE @i < 10
BEGIN
    
    SELECT
       @sql = N'
        SELECT 
            c = COUNT_BIG(*)
        FROM dbo.Users AS u
        WHERE u.CreationDate >= @s
        OPTION
        (
            OPTIMIZE FOR (@s = ''' + CONVERT(nvarchar(30), @d) + ''')
        );     
       ';
    
    EXEC sys.sp_executesql
        @sql,
        N'@s datetime',
        @s;
    
    SELECT 
        @i += 1;
    
END

No matter how many times you run this, the plan will get reused and you’ll always have the off-histogram step.

Qualifying Events?

This is one of those “good ideas” I have for a specific circumstance without any other easy workarounds. I don’t suggest it as a general practice, and it certainly has some drawbacks that would make it dangerous in other circumstances.

I can’t easily reproduce the problem this solved locally, but I can show you why you probably don’t want to make it a habit.

In cases where you are able to get good cardinality estimates, this will hamper it greatly.

So, you know, last resort.

Thanks for reading!

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Be Careful Where You Call Inline Table Valued Functions In SQL Server Queries

Posted on October 11, 2021May 16, 2022 by Erik Darling

Keep’em Comin!

While helping someone tame a bunch of rather unfortunate scalar valued functions, we eventually hit a point where they were able to do some of the rewrites themselves. During testing, they ran into a situation where performance got worse when they made the switch over, and it wasn’t because an Eager Index Spool popped up.

I was able to come up with a demo that shows a reasonable enough performance difference in a couple queries using the same technique as I gave them to fix things.

So uh, here goes that.

Muppet

The query they were tuning had a couple OUTER APPLYs already written into it, and so they added the function on as another.

SELECT
    u.DisplayName,
    b.Name,
    nbi.AvgPostsPerDay
FROM dbo.Users AS u
OUTER APPLY
(
    SELECT 
        CreationDate = 
            MIN(p.CreationDate)
    FROM dbo.Posts AS p
    JOIN dbo.Votes AS v
        ON  v.PostId = p.Id
        AND v.VoteTypeId = 1
    WHERE p.OwnerUserId = u.Id
) AS p
OUTER APPLY
(
    SELECT TOP (1)
        b.Name
    FROM dbo.Badges AS b
    WHERE b.UserId = u.Id
    ORDER BY b.Date DESC
) AS b
OUTER APPLY dbo.no_bueno_inline(u.Id, p.CreationDate) AS nbi
WHERE u.Reputation >= 100000
ORDER BY u.Reputation DESC;

Since they didn’t want to lose rows to the function, they couldn’t use CROSS APPLY. Good enough.

Moutarde

But what they really wanted was to move the function up into the select list, like this:

SELECT
    u.DisplayName,
    b.Name,
    AvgPostsPerDay = 
    (
        SELECT 
            nbi.AvgPostsPerDay 
        FROM dbo.no_bueno_inline
        (
            u.Id, 
            p.CreationDate
        ) AS nbi
    )
FROM dbo.Users AS u
OUTER APPLY
(
    SELECT 
        CreationDate = 
            MIN(p.CreationDate)
    FROM dbo.Posts AS p
    JOIN dbo.Votes AS v
        ON  v.PostId = p.Id
        AND v.VoteTypeId = 1
    WHERE p.OwnerUserId = u.Id
) AS p
OUTER APPLY
(
    SELECT TOP (1)
        b.Name
    FROM dbo.Badges AS b
    WHERE b.UserId = u.Id
    ORDER BY b.Date DESC
) AS b
WHERE u.Reputation >= 100000
ORDER BY u.Reputation DESC;

That way you don’t lose any rows like you could with CROSS APPLY, and the optimizer is more likely to holler at the function later on in the query plan, since the values from it are only being projected — that’s fancy for selected.

Mapperoo

The full query plan is a bit much to untangle quickly in this post, but the timing difference is noticeable enough for my purposes:

So if you ever end up rewriting a scalar valued function as an inline table valued function, make sure you test calling it in the same way. Moving query syntax around may produce logically equivalent results, but won’t always produce equivalent performance.

Thanks for reading!

Going Further

How OPTIMIZE FOR UNKNOWN Makes Troubleshooting SQL Server Performance Problems Harder

Posted on October 8, 2021May 16, 2022 by Erik Darling

Detained

Despite the many metric tons of blog posts warning people about this stuff, I still see many local variables and optimize for unknown hints. As a solution to parameter sniffing, it’s probably the best choice 1/1000th of the time. I still end up having to fix the other 999/1000 times, though.

In this post, I want to show you how using either optimize for unknown or local variables makes my job — and the job of anyone trying to fix this stuff — harder than it should be.

Passenger

Like most things, we’re going to start with an index:

CREATE INDEX r ON dbo.Users(Reputation);
GO

I’m going to have a stored procedure that uses three different ways to pass a value to a where clause:

CREATE OR ALTER PROCEDURE 
    dbo.u 
(
    @r int, 
    @u int
)
AS
BEGIN

    /* Regular parameter */
    SELECT
        c = COUNT_BIG(*)
    FROM dbo.Users AS u
    WHERE u.Reputation = @r
    AND   u.UpVotes = @u;

    /* Someone who saw someone else do it at their last job */
    DECLARE 
        @LookMom int = @r,
        @IDidItAgain int = @u;
    
    SELECT
        c = COUNT_BIG(*)
    FROM dbo.Users AS u
    WHERE u.Reputation = @LookMom
    AND   u.UpVotes = @IDidItAgain;

    /* Someone who read the blog post URL wrong */
    SELECT
        c = COUNT_BIG(*)
    FROM dbo.Users AS u
    WHERE u.Reputation = @r
    AND   u.UpVotes = @u
    OPTION(OPTIMIZE FOR UNKNOWN);

END;
GO

First Way

The best case is we run this for a small number of rows, and no one really notices. Even though we get bad guesses for the second two queries, it’s not a huge deal.

When you run procs like this, SQL Server doesn’t cache the compile time values the same way it does when you use parameters. Granted, this is because it technically shouldn’t matter, but if you’re looking for a way to execute the proc again to reproduce the issue, it’s up to you to go figure out what someone did.

Since I’m getting the actual plans here, I get the runtime values for both, but those don’t show up in the plan cache or query store version of plans.

That’s typically a huge blind spot when you’re trying to fix performance issues of any kind, but it’s up to you to capture that stuff.

Just, you know, good luck doing it in a way that doesn’t squash performance.

Second Way

In this example, our index is only on the Reputation column, but our where clause is also on the UpVotes column.

In nearly every situations, it’s better to have your query do all the filtering it can from one index source — there are obviously exceptions — but the point here is that the optimizer doesn’t bother with a missing index request for the second two queries, only for the first one.

That doesn’t matter a ton if you’re looking at the query and plan right in front of you, but if you’re also using the missing index DMVs to get some idea about how useful overall a new index might be, you’re out of luck.

In this case, the optimizer doesn’t think the second two plans are costly enough to warrant anything, but it does for the first plan.

I’m not saying that queries with local variables or optimize for unknown hints always do this, or that parameterized plans will always ask for (good) indexes. There are many issues with costing and SARGability that can prevent them from showing up, including getting a trivial plan.

This is just a good example of how Doing Goofy Things™ can backfire on you.

Thanks for reading!

Going Further

UDF Inlining And String Aggregations In SQL Server 2019

Posted on October 4, 2021May 16, 2022 by Erik Darling

Quatro Enemigos

This post is really four different posts. Maybe five. The main points are:

STRING_AGG can’t be inlined if a UDF
Getting DISTINCT values into STRING_AGG is more obtuse than it should be
The documentation for UDF inlining is misleading
XML PATH can be inlined in a UDF

Let’s start at the top, because tops are top.

Strung Out

If you’re like me and you got excited by the induction of STRING_AGG into the T-SQL Lexicon because of all the code odd-balling it would replace, you were likely also promptly disappointed for a few reasons.

Needing to convert the element to be AGGed to a MAX to avoid errors
- STRING_AGG aggregation result exceeded the limit of 8000 bytes. Use LOB types to avoid result truncation
The WITHIN GROUP ordering is clunky compared to an outer ORDER BY (but hey, logical query processing…)
No support for DISTINCT in the function, and an outer DISTINCT tries to make the wrong thing DISTINCT (see above)

And of course, it’s a breaking limitation for UDF inlining.

The UDF does not reference the STRING_AGG function

Let’s look at all of that stuff at once, in one function.

CREATE OR ALTER FUNCTION 
    dbo.IsStringAgg (@UserId int)
RETURNS 
    nvarchar(max)
WITH 
    RETURNS NULL ON NULL INPUT, 
    SCHEMABINDING
AS
BEGIN
RETURN
(
    SELECT
        STRING_AGG
        (
            CONVERT
            (
                nvarchar(MAX),
                b2.Name
            ), 
            N', '
        )
        WITHIN GROUP 
        (
            ORDER BY 
                b2.Name
        )
    FROM
    (
        SELECT DISTINCT 
            b.Name
        FROM dbo.Badges AS b
        WHERE b.UserId = @UserId
    ) AS b2
);
END;
GO

Not exactly a thing of beauty, is it? Let’s hold onto that for one second, though.

XML > JSON

Okay, so let’s get back to that UDF documentation.

The UDF does not reference XML methods

That sort of reads like NOXML4U, which is my friend Jeremiah’s license plate. In reality though, it means all the .method thingies, like value, node, query, etc.

So if you were to try to follow my prior advice on string concatenation, the function couldn’t be inlined.

SELECT
   x = 
   (
       SELECT 
           [text()] = 
               b.Name
       FROM dbo.Badges AS b
       WHERE b.Id = 100564
       FOR XML 
           PATH(''),
           TYPE
   ).value
     (
         './text()[1]', 
         'nvarchar(max)'
     );

Having the .value breaks inlining. So there’s that. I believe this is where Reddit users post something like le sigh with some poorly drawn stick figure.

Del The Funktion

We can write the function semi-correctly like so:

CREATE OR ALTER FUNCTION 
    dbo.NotStringAgg (@UserId int)
RETURNS 
    nvarchar(max)
WITH 
    RETURNS NULL ON NULL INPUT, 
    SCHEMABINDING
AS
BEGIN
RETURN
    STUFF
    (
        (
            SELECT
                N', ' + 
                b.Name
            FROM dbo.Badges AS b
            WHERE b.UserId = @UserId
            GROUP BY b.Name
            ORDER BY b.Name
            FOR XML PATH (N'')
        ), 
        1,
        2,
        N''
    );
END;
GO

Alright, now let’s BRING THAT CODE BACK (air horn)

Sea Vessel Soliloquy

If we compare the execution plans for these two functions, the XML one gets inlined, and the STRING_AGG one does not.

SELECT TOP (5000)
    u.DisplayName,
    ThanksForAllYourHardWork = 
        dbo.NotStringAgg(u.Id)
FROM dbo.Users AS u
ORDER BY u.Reputation DESC;
GO 

SELECT TOP (5000)
    u.DisplayName,
    ThanksForAllYourHardWork = 
        dbo.IsStringAgg(u.Id)
FROM dbo.Users AS u
ORDER BY u.Reputation DESC;
GO

Here’s the plan for the inlined function:

Here’s the plan for the non-inlined function:

Stay Thirsty

The inlined function finishes about twice as fast, though one may pause for a moment to consider whether the 400ms difference is an earth shattering kaboom in this case.

Of course, the real kicker is when scalar UDFs are invoked as part of larger queries where parallelism, etc. is important.

Thanks for reading!

Going Further

On Entity Framework And SQL Server

Posted on August 20, 2021May 16, 2022 by Erik Darling

Specificity

I know there are other ORMs in the world, but since my working life is spent performance fixing SQL Server, I end up seeing Entity Framework the most.

This isn’t going to be an Entity Framework bash-fest (that’s what Twitter is for), it’s more a critique of the way developers blindly trust this abstraction layer to always do the smart and sensible thing.

After all, I’ve seen applications using Entity Framework quite successfully. Part of what made it successful was the developers being comfortable with the database, figuring out how what they do with code gets translated into a query, and when that ends up going terribly long.

One of the big lightbulb moments they had was realizing that maybe doing it all in one big query isn’t always the best option.

Until It Doesn’t

On small databases, perhaps with uncomplicated schema and application requirements, you can get by without giving much of a care about these things.

However, as databases grow up, more tables get added, and all that, you need to start paying more attention. I’ve said it before: abstraction isn’t magic, and that holds true for Entity Framework, too.

I totally understand: you might not know anything about databases, might not have any interest in learning more about databases, and your job is to focus on the code to develop new features. Because of that, you put your faith and trust into Entity Framework to do things the best, most correct way.

That isn’t always what happens, though. It might work, but it might not work well. Not being an Entity Framework expert, I’m not always sure if the disconnect is in the way Entity Framework is designed, or in the way the developer used it.

Funny Valentine

For transactional queries, you’ll generally be okay so long as you obey common database decency rules around table and index design.

What is usually overly-ambitious is reporting and dashboard population queries. They’re often interpreted poorly, with dozens of left joins and nested derived tables.

And look, no, I don’t expect someone coding those types of queries in Entity Framework to be able to write better T-SQL queries. They probably have even less training and experience there. I’m basically repeating myself: abstraction isn’t magic.

If you’re going to work heavily with Entity Framework code that hits SQL Server, you need to:

Get into databases
Get someone who’s into databases

You need someone who can get in there, find problem queries, review indexes, and help track down which sections of the code generate them.

Whether some portions of the application need to be replaced with stored procedures, or you write custom SQL that can take advantage of a different approach to accessing the data, you need someone with the skills to write that T-SQL well, or you’ll just end up with the same problem in a different way.

Thanks for reading!

Going Further

Starting SQL: Why SELECT * Is a Bad Idea In SQL Server Queries

Posted on July 12, 2021May 16, 2022 by Erik Darling

Starting SQL: Why SELECT * Is a Bad Idea

Going Further

Starting SQL: Query Execution In SQL Server

Posted on July 6, 2021May 16, 2022 by Erik Darling

Starting SQL: Query Execution

Going Further

Don’t use ISNUMERIC: Use TRY_CONVERT or a CASE Expression In SQL Server

Posted on July 2, 2021May 16, 2022 by Erik Darling

Kid Anymore

A while back I blogged about how ISNUMERIC can give you some wacky results.

Working with a client… Well, it’s hard to know based on my publishing schedule. But they were using ISNUMERIC all over the place, and getting the same wacky results, and even errors.

Here’s a short setup:

SELECT
    x.n
INTO #t
FROM
(
    SELECT '$'AS n
    UNION ALL 
    SELECT ','AS n
    UNION ALL
    SELECT '.'AS n
    UNION ALL
    SELECT ',1,'AS n
    UNION ALL
    SELECT '-'AS n
    UNION ALL
    SELECT '+'AS n
    UNION ALL
    SELECT CHAR(9)AS n 
    UNION ALL
    SELECT CHAR(10)AS n 
    UNION ALL
    SELECT CHAR(11)AS n
    UNION ALL
    SELECT CHAR(12)AS n 
    UNION ALL
    SELECT CHAR(13)AS n
) AS x;

We’re definitely not dumping anything in that table that’s a number.

Competitors

Let’s see what happens when we run this query:

SELECT
    t.n,
    i = ISNUMERIC(t.n),
    t = TRY_CONVERT(bigint, t.n),
    c = CASE 
            WHEN t.n NOT LIKE '%[^0-9]%'
            THEN 1
            ELSE 0
        END
FROM #t AS t

The first column is what each of the next three columns are evaluating:

You can see ISNUMERIC stink it up every time. TRY_CONVERT mostly does okay, but I’m not sure how I feel about + and – being converted to zero.

That case expression is the only thing that seems reliable.

Of course, integers are stodgy. Stiff. Maybe we need something a little more floaty.

SELECT
    t.n,
    i = ISNUMERIC(t.n),
    t = TRY_CONVERT(float, t.n),
    c = CASE 
            WHEN t.n NOT LIKE '%[^0-9]%'
            THEN 1
            ELSE 0
        END
FROM #t AS t

What happens?

Well, ISNUMERIC still stinks, but at least TRY_CONVERT does better.

Majority

No one should be using ISNUMERIC, period. Depending on your query needs, you can certainly use TRY_CONVERT in most situations, assuming you’re on SQL Server 2016+.

For everyone else, and even maybe people on 2016+, that CASE expression works really well for rooting out things that aren’t numbers.

Thanks for reading!

Going Further

A Query To Help You Find Plans With Eager Index Spools In SQL Server’s Plan Cache

Posted on June 28, 2021May 16, 2022 by Erik Darling

Yawn And Yawn

I’ve written a bunch about Eager Index Spools, and how to fix them, but I’ve always sort of left the “finding” part up to you, or pointed curious tuners to tools like sp_BlitzCache.

Recently though, I worked with a client who had Eager Index Spools so frequently that we needed to track them down specifically.

This is the plan cache query that I used to do it — they didn’t have Query Store enabled — and I wanted to share it.

WITH 
    XMLNAMESPACES 
('http://schemas.microsoft.com/sqlserver/2004/07/showplan' AS x),
    plans AS 
(
    SELECT TOP (10)
        deqs.query_plan_hash,
        sort = 
            SUM(deqs.total_worker_time / deqs.execution_count)
    FROM sys.dm_exec_cached_plans AS decp
    JOIN sys.dm_exec_query_stats AS deqs
        ON decp.plan_handle = deqs.plan_handle
    CROSS APPLY sys.dm_exec_query_plan(decp.plan_handle) AS deqp
    CROSS APPLY deqp.query_plan.nodes('//x:RelOp') AS r (c)
    WHERE  r.c.exist('//x:RelOp[@PhysicalOp="Index Spool" and @LogicalOp="Eager Spool"]') = 1
    AND    EXISTS
           (      
               SELECT 
                   1/0
               FROM sys.dm_exec_plan_attributes(decp.plan_handle) AS pa 
               WHERE pa.attribute = 'dbid'
               AND   pa.value > 4
           )   
    GROUP BY deqs.query_plan_hash
    ORDER BY sort DESC
)
SELECT
    deqp.query_plan,
    dest.text,
    avg_worker_time = 
        (deqs.total_worker_time / deqs.execution_count),
    deqs.total_worker_time,
    deqs.execution_count
FROM sys.dm_exec_cached_plans AS decp
JOIN sys.dm_exec_query_stats AS deqs
    ON decp.plan_handle = deqs.plan_handle
CROSS APPLY sys.dm_exec_query_plan(decp.plan_handle) AS deqp    
CROSS APPLY sys.dm_exec_sql_text(deqs.sql_handle) AS dest
WHERE EXISTS
(
    SELECT
        1/0
    FROM plans AS p
    WHERE p.query_plan_hash = deqs.query_plan_hash
)
ORDER BY avg_worker_time DESC
OPTION(RECOMPILE, MAXDOP 1);

It’s maybe not the prettiest thing in the world, but it got the job done.

Thanks for reading!

Going Further

Common Table Expressions Are Useful For Rewriting Scalar Functions In SQL Server

Posted on June 23, 2021May 16, 2022 by Erik Darling

Punching

Thanks for watching!