Blog – Page 9 – Darling Data

Happy New Year, From Darling Data

Posted on January 1, 2024December 24, 2023 by Erik Darling

Up All Night

I don’t wanna. Neither do you. Have a good one.

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

The Art Of The SQL Server Stored Procedure: Parameter Sniffing

Posted on December 29, 2023December 25, 2023 by Erik Darling

Responsible Reading

One of the most fun parts of my job is telling people that parameter sniffing is a good thing. They’re often shocked, because the phrase has such ominous connotations to developers.

The parameters! They get sniffed! Performance! Something! Stuff!

Parameter sniffing does not mean the sky is falling. Like I’ve said before, if it were such a bad idea, it would have been abandoned.

What you have to be mindful of is how data gets distributed over time, and doing a few things to make sure the optimizer has as few reasonable plan choices as possible.

Indexes keys that cover all where clause predicates at minimum
Indexes includes that cover all select list columns in special circumstances
Queries written to best take advantage of those indexes (SARGability, etc.)
Using temporary objects to isolate critical parts of queries into manageable chunks

That will help cut down on how disastrous plan sharing is between queries that need to process very different amounts of data.

Under really crazy circumstances (and I know, I’ve linked to this before), I use dynamic SQL to fix parameter sensitivity issues.

SQL Server 2022 Not To The Rescue

SQL Server 2022 has a feature that’s suppose to help with parameter sensitivity issues. It’s okay sometimes, but the current set of limitations are:

Only equality predicate parameters are evaluated for sensitivity issues
Only one parameter is selected for managing sensitivity
Only three query plan choices are generated for different estimation buckets

I have seen it be helpful in a few cases, but there are times when it should obviously kick in, but doesn’t.

There is currently no query hint to force the feature to kick in when it doesn’t (and it should).

Just so you know how I set up for this:

ALTER DATABASE StackOverflow2013 
SET COMPATIBILITY_LEVEL = 160;

Here are my indexes:

CREATE INDEX
    beavis
ON dbo.Posts
    (OwnerUserId, PostTypeId)
WHERE
    (PostTypeId = 1)
WITH
    (MAXDOP = 8, SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);

CREATE INDEX
    butthead
ON dbo.Votes
    (VoteTypeId, UserId, PostId)
INCLUDE
    (BountyAmount, CreationDate)
WITH
    (MAXDOP = 8, SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);

CREATE INDEX
    stewart
ON dbo.Badges
    (UserId)
WITH
    (MAXDOP = 8, SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);
GO

Here are the checks that validate options, etc.

SELECT
    d.name,
    d.compatibility_level,
    d.is_query_store_on
FROM sys.databases AS d
WHERE d.database_id = DB_ID();

SELECT
    dsc.configuration_id,
    dsc.name,
    dsc.value,
    dsc.is_value_default
FROM sys.database_scoped_configurations AS dsc
WHERE dsc.name = N'PARAMETER_SENSITIVE_PLAN_OPTIMIZATION';

With That Out Of The Way

Here’s the stored procedure that causes us grief.

CREATE OR ALTER PROCEDURE
   dbo.VoteSniffing
(
    @VoteTypeId integer
)
AS
BEGIN
SET XACT_ABORT, NOCOUNT ON;

    SELECT
        UserId  = 
            ISNULL(v.UserId, 0),
        Votes2013  = 
            SUM
            (
                CASE
                    WHEN 
                    (
                            v.CreationDate >= '20130101'
                        AND v.CreationDate <  '20140101'
                    )
                    THEN 1
                    ELSE 0
                END
            ),
        TotalBounty  = 
            SUM
            (
                CASE
                    WHEN v.BountyAmount IS NULL
                    THEN 0
                    ELSE 1
                END
            ),
        PostCount  = 
            COUNT(DISTINCT v.PostId),
        VoteTypeId  = 
            @VoteTypeId
    FROM dbo.Votes AS v
    WHERE v.VoteTypeId = @VoteTypeId
    AND   NOT EXISTS
    (
        SELECT
            1/0
        FROM dbo.Posts AS p
        JOIN dbo.Badges AS b
          ON b.UserId = p.OwnerUserId
        WHERE  p.OwnerUserId = v.UserId
        AND    p.PostTypeId = 1
    )
    GROUP BY 
        v.UserId
    ORDER BY
        PostCount DESC;
END;
GO

The good news is that all formatting checks pass here. Thank you, thank you, thank you. You’re far too kind.

Execution Problems

There’s only one parameter. It’s an equality predicate. And the VoteTypeId Column is highly volative.

You would think that with data distributions that look this way, there would be some attempt to remedy the situation by the Parameter Sensitive Plan Optimization feature.

Unfortunately, there is not.

Little Big Plan

When executed first with 4 as the VoteTypeId, the plan is very fast. Of course it is. We’re finding 733 rows to work with.

EXEC dbo.VoteSniffing
    @VoteTypeId = 4;

I have no qualms with this plan, other than the erroneous No Join Predicate warning. There is a join predicate, and you can see it in the Seek operators.

Heck, you can even see it in the query text. It’s pretty stupid.

When the plan is repeated for VoteTypeId 2, it is a disaster.

EXEC dbo.VoteSniffing
    @VoteTypeId = 2;

A Seek takes 9.5 seconds, a Nested Loops join occurs for 5 seconds, and a spill occurs for 22 seconds.

After the spill, another ~22 seconds is spent in various operators until it’s complete.

In other words, we still have our work cut out for us to tune this thing. If we go back to our list:

Indexes keys that cover all where clause predicates at minimum ✅
Indexes includes that cover all select list columns in special circumstances ✅
Queries written to best take advantage of those indexes (SARGability, etc.) ✅
Using temporary objects to isolate critical parts of queries into manageable chunks ❌

In this case, if we use a #temp table and isolate the portion of the query that evaluates a parameter (just getting data from the Votes table), we can solve 99% of the parameter sniffing problems between the majority of the VoteTypeIds.

However, VoteTypeId 2 is still a special flower. This is a good place to pause and ask ourselves if repeatedly filling a temporary object with 37 million rows is a good use of time.

This is when some techniques we’ve discussed before would come in handy:

Dynamic SQL to treat VoteTypeId 2 differently from the others
A separate stored procedure to intercept VoteTypeId 2 executions

Game Change

The reason why this is a more sensible arrangement is because when the plan is compiled initially for VoteTypeId 2, it runs just fine.

It’s also true that when VoteTypeId 4 uses the plan for VoteTypeId 2, it goes from taking 0 seconds across to taking a couple seconds. Sharing is not caring.

This is the fast plan for VoteTypeId 2. It has a lot of operators with additional startup costs that make grabbing small amounts of data slow.

Parameter sensitivity is a tough thing at times. Managing multiple plans for the same query can end up being quite complicated.

There are absolutely times when you’re going to have to throw in the towel and use a statement-level recompile hint to make things function appropriately.

Thanks for reading!

Going Further

The Art Of The SQL Server Stored Procedure: Temporary Objects

Posted on December 28, 2023December 22, 2023 by Erik Darling

And Then You Were Gone

I probably overuse temp tables. I’m the first one to admit it. But often when I choose to use one, it’s a choice between:

Spending a long time trying to business understand logic for a company I don’t work for
Tinkering with query syntax and using really weird tricks and hints to get different plan shapes
Waiting a long time to create or tweak indexes on huge tables (usually on “dev” hardware)

And you know what? If I had to work anywhere full time, and not balance performance tuning requests from a bunch of people, I just might spend more time on those things.

Don’t tell anyone, but sometimes if I’m really busy, I have no problem waiting a long time for an index change to go through, so I can look at something else for a while.

In general, temp tables offer a convenient way to focus on one problem part of the query, without the noise of a bazillion other joins and applys and subqueries and everything else going on.

They also offer a familiar paradigm to anyone looking at the work later, that doesn’t involve SELECT TOP (2147483647) UNION ALL SELECT TOP (1) type syntax, which can really raise some eyebrows on unfamiliar faces.

A Recent Example

I’m using Stack Overflow in my query, but a recent client engagement had a problem query that looked about the same.

WITH
    top5 AS
(
    SELECT
        p.*,
        n = 
            ROW_NUMBER() OVER
            (
                PARTITION BY
                    p.OwnerUserId
                ORDER BY
                    p.Score DESC
            )
    FROM dbo.Posts AS p
    WHERE p.OwnerUserId = 22656
    AND   p.PostTypeId = 1
)
SELECT
    u.DisplayName,
    t1.Title,
    t2.Title,
    t3.Title,
    t4.Title,
    t5.Title
FROM dbo.Users AS u
LEFT JOIN top5 AS t1
  ON  t1.OwnerUserId = u.Id
  AND t1.n = 1
LEFT JOIN top5 AS t2
  ON  t2.OwnerUserId = u.Id
  AND t2.n = 2
LEFT JOIN top5 AS t3
  ON  t3.OwnerUserId = u.Id
  AND t3.n = 3
LEFT JOIN top5 AS t4
  ON  t4.OwnerUserId = u.Id
  AND t4.n = 4
LEFT JOIN top5 AS t5
  ON  t5.OwnerUserId = u.Id
  AND t5.n = 5
WHERE t1.OwnerUserId IS NOT NULL;

The problems with this contain multitudes, but let’s let the query plan do the talking:

In the words of Drake: “Take Care”

If you’re the type of developer who insists that common table expressions are some sort of magic potion, I do hope that you this is your hangover.

Practical Magic

Temporary tables are the cure for common table expression hangovers, for the most part.

Let’s see how we do with one.

WITH
    top5 AS
(
    SELECT
        p.*,
        n = 
            ROW_NUMBER() OVER
            (
                PARTITION BY
                    p.OwnerUserId
                ORDER BY
                    p.Score DESC
            )
    FROM dbo.Posts AS p
    WHERE p.OwnerUserId = 22656
    AND   p.PostTypeId = 1
)
SELECT
    t.OwnerUserId,
    t.Title,
    n
INTO #top5
FROM top5 AS t
WHERE t.n <= 5;

SELECT
    u.DisplayName,
    t1.Title,
    t2.Title,
    t3.Title,
    t4.Title,
    t5.Title
FROM dbo.Users AS u
LEFT JOIN #top5 AS t1
  ON  t1.OwnerUserId = u.Id
  AND t1.n = 1
LEFT JOIN #top5 AS t2
  ON  t2.OwnerUserId = u.Id
  AND t2.n = 2
LEFT JOIN #top5 AS t3
  ON  t3.OwnerUserId = u.Id
  AND t3.n = 3
LEFT JOIN #top5 AS t4
  ON  t4.OwnerUserId = u.Id
  AND t4.n = 4
LEFT JOIN #top5 AS t5
  ON  t5.OwnerUserId = u.Id
  AND t5.n = 5
WHERE t1.OwnerUserId IS NOT NULL;

Here’s the new query plan:

Without changing indexes, we get our query from 26 seconds to 1.5 seconds.

Really puts the “common” in “common table expressions”, doesn’t it?

Argue

While the “should I use common table expressions or temp tables?” question is largely a settled one for me, one thing that always comes up in seemingly chaotic practice is temp tables vs. table variables.

Table variables certainly have their place in the world, though often not without compromises or alternatives.

Many developers will use table variables for the purpose of passing them directly to another stored procedure. That’s fine and all, but you can just use #temp tables and reference them directly in another stored procedure, too.

Other times, which on gets used seems to be a question of moods and fancies of the developer currently working on something.

Some time ago I talked about when table variables are most useful in SQL Server practice.

And I stand by the majority of what’s discussed in here still. Hopefully you find it enlightening, even if you’ve already seen it.

Thanks for reading!

Going Further

The Art Of The SQL Server Stored Procedure: Dynamic SQL

Posted on December 27, 2023December 21, 2023 by Erik Darling

Time Well Spent

I’ve spent a lot of time on this blog telling you different reasons why you should, when you should, and how you should use dynamic SQL.

In this post, I’m going to go into some of the finer points of how I approach dynamic SQL to avoid issues — not performance issues — more procedural issues.

Things you should use to protect yourself from wonky object names, string truncation, object identification, and more.

There are many things that I wish Microsoft would do with dynamic SQL to make working with it easier and safer. For one, there should be a specific data type for it with built-in safe guards against SQL injection. Sort of like a template that would make parameter substitution a lot easier, and allow queries in strings to be written without a ton of escape quotes so that they’d be easier to debug and troubleshoot.

It’s a dream that will never come true, but hey we got ledger tables that will eat up plenty of disk space in Azure, so that’s just great.

Quote And Quote Alike

The first and most obvious rule I have with dynamic SQL is that if I need to accept object names, I’m using QUOTENAME.

That goes for any object. Server, database, schema, table, view, procedure, function, you get the idea.

Why? Because developers don’t always do nice things when they name things. The most common one that I see is putting spaces in names, but over the years I’ve run into unprintable characters, brackets, symbols like @, #, $, and &, and… unicode characters.

If you look in any of the stored procedures I write, you’ll see how heavily I use QUOTENAME to save myself headaches later.

Now, you could write your own brackets into strings, but they don’t offer the same level of protection against SQL injection.

One thing I will say is smart to do is to keep two copies of anything you’re quoting in the name of. One to use in the dynamic SQL, and one to use for other information you may need to retrieve about the thing you’ve quoted.

As an example, if you accept a database name as a parameter, and then you add quotes to it, it’ll be awfully hard to find any information about it in sys.databases, because database_name will not equal [database_name]

There you go. Your mind is blown.

Always Unicode

Whenever I see parameters or variables created to hold dynamic SQL with varchar (and not nvarchar) as the type, I get real nervous.

Why? Because I know some lazy bones is about to concatenate a bunch of user input into their strings and then:

EXEC (@sql);

And that is unsafe, my friends. Very unsafe. This where unfunny people will tell an unfunny joke about Bobby Tables.

When you use sp_executesql to write and run the safe, parameterized dynamic SQL that you should, it expects unicode inputs for both the query that you execute.

DECLARE
    @s varchar(MAX) = 'SELECT d.* FROM sys.databases AS d WHERE d.database_id = @d;',
    @p varchar(MAX) = '@d integer',
    @d integer = 1;

EXEC sys.sp_executesql
    @s,
    @p,
    @d;

This will fail with the error:

Msg 214, Level 16, State 2, Procedure sys.sp_executesql, Line 1 [Batch Start Line 0]

Procedure expects parameter ‘@statement’ of type ‘ntext/nchar/nvarchar’.

And then with the error

Msg 214, Level 16, State 3, Procedure sys.sp_executesql, Line 1 [Batch Start Line 0]

Procedure expects parameter ‘@params’ of type ‘ntext/nchar/nvarchar’.

Because we did not follow instructions.

This, however, will run wonderfully, and keep us safe from unfunny jokes about Bobby Tables.

DECLARE
    @s nvarchar(MAX) = N'SELECT d.* FROM sys.databases AS d WHERE d.database_id = @d;',
    @p nvarchar(MAX) = N'@d integer',
    @d integer = 1;

EXEC sys.sp_executesql
    @s,
    @p,
    @d;

And besides, it’s good form. You never know when something unicodey may sneak in to your string, and you don’t want to end up with question marks.

Concatenation Street

One big problem, and one that can happen suddenly, out of nowhere, without any rhyme or reason, and only on one server, and when you test it on another server it won’t happen, is string truncation because of implicit conversion.

See, when you put strings together, even if you declare the base parameter or variable to hold the string as a max data type, shorter strings will somehow cause SQL Server to decide that the result will be… something else.

Even more annoying is that even if you put an outer CONVERT(nvarchar(max), everything) around the whole ordeal, one string concatenation inside will still mess the whole works.

If you look at the dynamic SQL that I write, you’ll notice that certain shorter strings are surrounded with converts to a max type.

This is why you’ll see code that looks like this:

+
CONVERT
(
    nvarchar(MAX),
    CASE @new
         WHEN 1
         THEN
N'
qsp.plan_forcing_type_desc,
qsp.force_failure_count,
qsp.last_force_failure_reason_desc,
w.top_waits,'
         ELSE
N''
    END
)
+

This is also why I go heavy on having debugging to print string out: Because you can usually tell by which part of the string gets cut off where the implicit conversion happened.

Formation

Formatting dynamic SQL is especially important. Remember when I blogged about formatting T-SQL?

That goes doubly-double for dynamic SQL. I’ll often write and format the query first, and then make it dynamic so that SQL Prompt can work it’s 80% magic.

That dynamic SQL I showed you up there? I’d never put that out into the world for people to deal with. Rude.

It would look way more like this:

DECLARE
    @s nvarchar(MAX) = N'
SELECT 
/*I came from Erik Darling''s awesome stored procedure*/
    d.* 
FROM sys.databases AS d 
WHERE d.database_id = @d;
',
    @p nvarchar(MAX) = N'@d integer',
    @d integer = 1;

RAISERROR(@s, 0, 1) WITH NOWAIT;

EXEC sys.sp_executesql
    @s,
    @p,
    @d;

It’s formatted nicely
It’s aligned so that it pretty prints in RAISERROR
There are leading and trailing new lines so it separates nicely from other returned messaged
There’s a comment to tell me where it came from

Now if only there were a real dynamic SQL type so I didn’t have to use two apostrophes in there.

Thanks for reading!

Going Further

The Art Of The SQL Server Stored Procedure: Wrapper Stored Procedures

Posted on December 26, 2023December 20, 2023 by Erik Darling

Happenstance

Wrapper stored procedures are useful for things like:

Transforming declared local variables into parameters
Preventing code from compiling when it isn’t used
Generating different query plans to deal with parameter sniffing

The upside of using this over dynamic SQL is that you have a convenient object name attached to the code.

The downside is that if stored procedures share code logic, you now have more to maintain. Likewise, if you have many IF...ELSE branches, you’ll have many more stored procedures to dig around to (though they’ll be a lot easier to identify in the plan cache and query store).

There are other handy things about stored procedures that dynamic SQL makes iffy, like permissions, and making sure developers don’t write the kind of code that makes headlines.

Stored procedures also aren’t a good use case for all the “kitchen sink” queries that use a lot of optional parameters. Dynamic SQL is king here, because creating a stored procedure for every permutation of parameter combinations is a bigger no than well done steak.

To learn how to deal with those, watch these videos.

Thanks for reading (and watching)!

Going Further

Merry Christmas, From Darling Data

Posted on December 25, 2023December 12, 2023 by Erik Darling

Nope, Not Today

Today you do other things. Go. Goodbye.

Going Further

The Art Of The SQL Server Stored Procedure: Local Variables

Posted on December 22, 2023December 21, 2023 by Erik Darling

Nein Nein Nein

Just like you can’t fix a broken bone with good vibes, you can’t fix parameter sniffing with idiocy. More on that later, though.

Like many other things we’ve discussed thus far, local variables are a convenience to you that have behavior many people are still shocked by.

You, my dear and constant reader, may not be shocked, but the nice people who pay me money to fix things seem quite astounded by what happens when you invoke local variables.

So I find myself in a difficult position: do I dredge up more red meat for the millions of die-hard SQL Server performance nuts who come here for the strange and outlandish, or produce evergreen content for people who pay my substantial bar tabs.

Coin, tossed.

Corn, popped.

Greenery

Local variables present an attractive proposition for many reasons. For instance, you can:

Set a constant date/time variable
Assign values from complex subqueries to a single variable
Increment values in a loop to batch modifications

And here’s the thing: I’m with you on the need and convenience for all those things. It’s just how you usually end up using them that I disagree with.

Many developers are under the impression that parameter sniffing is a bad thing; it is not. If it were, modern database systems would have thrown the whole idea out ages ago.

Constantly generating execution plans is not a good use of your CPU’s brain power. The real problem is parameter sensitivity.

I’m going to say this as emphatically as I can for those of you who call local variables and using optimize for unknown as a best practice:

Local variables use an estimate derived from the total rows in the table multiplied by the assumed uniqueness of a column’s data. More often than not, that is a very small number.

If you have the kind of incredibly skewed data that is sensitive to parameters being sniffed, and plans being generated based on those initial estimates, it is highly unlikely that plans derived from fuzzy math will be suitable for general execution.

There may be exceptions to this, of course.

Every database is subject to many local factors that make arrangements outside the norm being sensible.

Drudgery

What you need to do as a developer is test these assumptions of yours. I spend a lot of my time fixing performance problems arising developers not testing assumptions.

A common feedback loop occurs when testing code against very small data sets that don’t expose the types of performance problems that arise from larger ones, or testing against data sets that may be similar in terms of volume, but not in terms of distribution.

I am being charitable here, though. Most code is only tested for result correctness. That’s a fine starting point.

After all, most developers are paid to develop, and are not performance tuning experts. It would be nice if performance tuning were part of the development process, but… If you’re testing against wack data, it’s impossible.

Timelines are short, demands are substantial, and expectations are based mostly around time to deliver.

Software development against databases is rarely done by developers who are knowledgable about databases, and more often than not by developers who are fluent in the stack used for the application.

Everything that happens between front and back may as well be taking place in Narnia or Oz or On A Planet Far, Far Away.

This section ended up being way more philosophical than I had intended, but it’s Côte-Rôtie season.

Let’s move on!

Assumptive

When you want to test whether or not something you’re working on performs best, you need to understand which options are available to you.

There are many ways to pet a dog when working with SQL Server. Never assume the way you’re doing it is the best and always the best.

One assumption you should absolutely never make is that the way you see everyone around you doing something is the best way to do something.

That’s how NOLOCK turned into an epidemic. And table variables. And Common Table Expressions. And Scalar UDFs. And Multi-Statement UDFs. And sticking ISNULL in every join and where clause.

And, of course, local variables.

One way to test your assumptions about things is to create a temporary stored procedure and test things in different ways.

CREATE OR ALTER PROCEDURE
    #p
(
    @ParentId integer
)
AS
BEGIN
    SET NOCOUNT ON;

    DECLARE
        @ParentIdInner integer = @ParentId;

    SELECT
        c = COUNT_BIG(*)
    FROM dbo.Posts AS p
    WHERE ParentId = 0;

    SELECT
        c = COUNT_BIG(*)
    FROM dbo.Posts AS p
    WHERE ParentId = @ParentId;

    SELECT
        c = COUNT_BIG(*)
    FROM dbo.Posts AS p
    WHERE ParentId = @ParentIdInner;
END;
GO

CREATE INDEX 
    p ON 
dbo.Posts
    (ParentId) 
WITH
    (SORT_IN_TEMPDB = ON, DATA_COMPRESSION  = PAGE);

EXEC dbo.#p 
    @ParentId = 0;

We’re using:

A literal value
An actual parameter
A declared variable

With an example this simple, the local variable won’t:

Slow things down
Change the query plan
Prevent a seek

But it will throw cardinality estimation in the toilet and flush twice.

These are the kinds of things you need to test and look at when you’re figuring out the best way to pet SQL Server and call it a very good dog.

Thanks for reading!

Going Further

The Art Of The SQL Server Stored Procedure: Conditional Logic

Posted on December 21, 2023December 21, 2023 by Erik Darling

Other Thing

There are two forms of conditional logic that I often have to fix in stored procedures:

Branching to run different queries at different times
Complicated join and where clause logic

The problems with both are similar in terms of performance. You see, when smart people tell you that SQL is a declarative language, and not a procedural language, they’re usually trying to get you to stop using cursors.

And that’s not always wrong or bad advice, trust me. But it also applies here.

When you’re developing stored procedures, the thing you need to understand is that SQL Server builds query plans for everything in them first time on the first compile, and then after any causes of a recompile.

It does not compile for just the branch of logic that gets explored on compilation. No no. That would be too procedural. Procedural we are not.

There are two exceptions to this rule:

When the branches execute dynamic SQL
When the branches execute stored procedures

If this sounds familiar to you, you’ve probably hear me talk about parameter sniffing, local variables, SARGability, and… well, more things dealing with SQL Server performance.

Hm.

Problem 1: IF Branching

Like I mentioned above, the only way to get if branching to only compile plans for explored branches, is to tuck them away.

Probably the easiest way to demonstrate this is to create a stored procedure with logical branching that accesses an object that doesn’t even pretend to exist.

CREATE OR ALTER PROCEDURE
    dbo.i_live
(
    @decider bit = NULL
)
AS
BEGIN
    SET NOCOUNT, XACT_ABORT ON;
    
    IF @decider = 'true'
    BEGIN
        SELECT
            dp.*
        FROM dbo.DinnerPlans AS dp;
    END;
    
    IF @decider = 'false'
    BEGIN
        SELECT
            dp.*
        FROM dbo.DinnerPlans AS dp WITH(INDEX = 2);
    END;
    
    IF @decider IS NULL
    BEGIN
        SELECT
            result = 'please make a decision.'
    END;
END;
GO 

EXEC dbo.i_live 
    @decider = 'true';

If you run this, you’ll get an error saying the index doesn’t exist, even though the code branch doesn’t run.

Where things get even weirder, but is well besides the point of the post, if you execute a store procedure that references a table that doesn’t exist, but not in the branch that executes, no plan will be cached for it.

You can see a stupid demo of that here. It’s probably not something you’ll run into a whole lot, but it’s probably worth noting.

If you’d like to read more about the performance problems IF branching doesn’t solve, you click on these links:

Problem 2: Conditional Logic In Your Predicates

You may love a big, messy, sloppy, join or where clause, but SQL Server’s query optimizer hates it. This may be disappointing to hear, but query optimization is a really hard job.

Feeding in a bunch of runtime complexity and expecting consistently good results is a shamefully naive approach to query writing.

The query optimizer is quite good at applying its craft to a variety of queries. At the base of things, though, it is a computer program written by people. When you think carefully about the goal of a generalized query optimizer, it has to:

Come up very quickly with a good enough execution plan
Regardless of the surrounding hardware
Respecting the logic of the query
Within the confines of available indexes, constraints, hints, and settings

As you add complexity to queries, various things become far harder to forecast and plan for in a generalized way.

Think of it like planning a car trip. The more stops you add, the harder it is to find the fastest route. Then throw in all the unexpecteds — traffic, construction, weather, people randomly gluing themselves to the road, breakdowns — and what have you got?

Chaos. Pure chaos.

While the idealism of writing everything in one big query seems attractive to SQL developers — stacking common table expressions, nesting view upon view and subquery upon subquery, and adding in all the OR logic one can possible surmise — it only opens the optimizer up to error, mis-estimates, and even missed opportunities.

The reality is that query optimizers across all database platforms have plenty of issues, blind spots, and shortcomings. Sometimes you need to write queries in a way that is less convenient to you in order to avoid them.

Thanks for reading!

Going Further

The Art Of The SQL Server Stored Procedure: Data Types

Posted on December 20, 2023December 15, 2023 by Erik Darling

Mix And Match

There are only a few data types that make me nervous when I see them:

MAX strings, or approaching the upper byte limit (except for dynamic SQL)
XML
sql_variant

It doesn’t matter if they’re stored procedure parameters, or if they’re declared as local variables. When they show up, I expect something bad to happen.

One thing that makes me really nervous about string data specifically, is that many developers don’t pay close attention to varchar vs. nvarchar.

This doesn’t just apply to stored procedures. Many ORMs suffer the same issue with data types not being strongly typed, so you sometimes end up with all varchar(8000) or nvarchar(4000) input parameters, and other times end up with n/varchar strings with lengths inferred at compile-time based on the length of the string passed in. That means that if you have an ORM query that takes, let’s say a name as input, it might the infer the string as unicode when it shouldn’t, and if you were to pass in different names for different executions, you’d get all different plans, too.

Erik: nvarchar(4)
Kendra: nvarchar(5)
Al: nvarchar(2)
Tom: nvarchar(3)

You get the picture. It’s a nutty nightmare, and it’s made worse if the name column you’re searching is a varchar data type, regardless of length. But those are ORM problems, and we wield mighty stored procedures like sane and rational people.

Let’s play a game called pattern and anti-pattern.

Anti-Pattern: One parameter for searching many columns

The sheer number of times I’ve seen something like this justifies most of the self-medicating I apply to myself.

CREATE OR ALTER PROCEDURE
    dbo.BadIdea
(
    @SearchString nvarchar(whatever)
)
AS
BEGIN
    SET @SearchString = N'%' + @SearchString + N'%';

    SELECT
        p.*
    FROM dbo.Posts AS p
    WHERE p.Id LIKE @SearchString
    OR    p.OwnerUserId LIKE @SearchString
    OR    p.AcceptedAnswerId LIKE @SearchString
    OR    p.CreationDate LIKE @SearchString
    OR    p.LastActivityDate LIKE @SearchString
    OR    p.Tags LIKE @SearchString
    OR    p.Title LIKE @SearchString
    OR    p.Body LIKE @SearchString
    ORDER BY
        p.ViewCount DESC;
END;

All sorts of bad things happen when you do this. You can’t index for this in any meaningful way, and comparing non-string data types (numbers, dates, etc.) with a double wildcard string means implicit conversion hell.

You don’t want to do this. Ever.

Unless you want to hire me.

Pattern: Properly typed parameters for each column

Rather than get yourself into that mess, create your procedure with a parameter for each column, with the correct data type.

Next, don’t fall into the trap where you do something like (p.OwnerUserId = @OwnerUserId OR @OwnerUserId IS NULL), or anything resembling that.

It’s a rotten idea. Instead, watch this video to learn how to write good dynamic SQL to deal with code like this:

Anti-Pattern: Passing Lists In Parameters And…

Splitting the string directly in your join or where clause.

When you do this, SQL Server can’t make good cardinality estimates, because it has no idea what’s in the list. Worse, I usually see generic string split functions that can deal with any data type as input and output.

You might be outputting wonky data types that compare to column(s) of different data type(s), and suffering implicit conversion woes.

A cleaner option all around is to use Table Valued Parameters, backed by User Defined Table Types with the correct data types.

You may still need to follow my advice below, but it’s a bit easier to manage.

Pattern: Passing Lists In Parameters And…

Dumping the parsed output into a #temp table with the right column data types.

When you do that, two nice things happen:

SQL Server builds a histogram on the values
You can index it for large lists

I find myself changing to this pattern quite a bit when tuning code for clients. They’re often surprised by what a difference it makes.

Even when using Table Valued Parameters, I find myself doing this. Since Table Valued parameters are read only, you don’t need to worry about the contents changing even if you pass them to other procedures.

Anti-Pattern: Using Unnecessary MAX Types

I’ve blogged before about why you shouldn’t declare parameters or variables as MAX types in the past, but the issue is mainly that they can’t be used to seek into an index.

Because of the internals of MAX parameters and variables, you’ll see a filter after data is acquired in the query plan, which is usually much less efficient than filtering out data when a table or index is initially accessed.

It’s also a bad idea for columns in tables for similar reasons. I understand that there is some necessity for them, but you should avoid them for searches as much as possible, and make them retrieve-only in your queries.

A good example is an Entity Attribute Value table, where developers allow searches on the Value column, which is either sql_variant, or nvarchar(max) so that it can accommodate any contents that need to be stored there.

Pattern: Using Reasonable Data Types

The number of times that I’ve seen MAX types used for anything involved in searches that actually needed to be MAX is relatively small compared to the number of times I’ve checked the max length of data and found it to be around 100 bytes or so.

Making significant changes like that to large tables is sometimes painful. Often, it’s easier to add computed columns in various ways to allow searching and indexes to be easier:

TRY_CAST or TRY_CONVERT to integers, dates, etc.
SUBSTRING to an appropriate string type with a reasonable length
Hashing the contents of the column to make binary searches possible

As long as you don’t persist the computed columns, the table isn’t locked when they’re added. However, you do need to index them to make them useful for searching. That will only be painful if you don’t pay Microsoft enough money.

Plans Within Plans

As you review stored procedure code, keep an eye out for these anti-patterns. Fixing small things can have big downstream effects.

While not every change will yield many seconds or minutes of performance improvements, it helps to follow the rules as well as possible to clarify what the real issues are.

Getting a stored procedure to the point where you’re sure exactly what needs to change to improve performance can be a tough task.

Having a mental (or written) checklist of things that you know to fix makes it faster and easier.

Thanks for reading!

Going Further

The Art Of The SQL Server Stored Procedure: Isolation Levels

Posted on December 19, 2023December 21, 2023 by Erik Darling

Dread Pirate

I’ve talked about isolation levels a bit lately because I need you all to understand that no isolation level is perfect, and that most everyone is completely wrong about how they really work.

For a very high percentage of workloads, Read Committed Snapshot isolation is the best choice. Why?

Because most developers would expect:

Read queries to not block with write queries
Read queries to not deadlock with write queries
Read queries to return correct results from committed transactions

You only get *one* of those things from Read Committed, but you get all of those things from Read Committed Snapshot Isolation.

Sure, there’s a thing you don’t get from it, but if you want that thing, you have to put up with read queries blocking and deadlocking with write queries.

That thing is: Reading only the most current version of row values, which is the only meaningful guarantee that Read Committed will give you.

If you’ve been battling stupid locking problems for a long time, you’ve probably got NOLOCK hints everywhere, which means you’re not getting that anyway.

You’re getting back garbage.

So hear me out: If you’ve got some queries that require the most current version of row values to work correctly, you have the READCOMMITTEDLOCK table hint to save you.

What Read Committed Doesn’t Get You

To illustrate the concurrency issues that can arise under Read Committed, here are some slides I made for a thing:

read committed sucks — read committed: not great

If any of these surprise you, you’re not allowed to argue with me about Read Committed Snapshot Isolation.

Read Committed Snapshot Isolation And Dumb Queries

Queries that are susceptible to race conditions with optimistic isolation levels are queries that are written in stupid ways.

One example is a query like this:

BEGIN TRANSACTION;
     DECLARE
         @name nvarchar(40) = 'Erik';
   
    UPDATE dp
      SET
          dp.name = @name,
          dp.is_free = 0
    OUTPUT
        Inserted.*
    FROM dbo.DinnerPlans AS dp
    WHERE EXISTS
    (
        SELECT
            1/0
        FROM dbo.DinnerPlans AS dp2
        WHERE dp.id = dp2.id
        AND   dp2.is_free = 1
    );
COMMIT;

It will behave differently under optimistic vs. pessimistic isolation levels. Let’s pretend that two people try to book the very last seat at the very last time.

Under Read Committed, the update to DinnerPlans would block things so that the read in the exists subquery would wait for it to complete, and find no rows.
Under Read Committed Snapshot Isolation, the update to DinnerPlans would generate a row version, and the read in the exists subquery would read that version where it would find a row.

This is, of course, a very stupid query. If you’re just using direct updates, you won’t have problems:

UPDATE
    dp
  SET
    dp.name = N'Erik',
    dp.is_free = 0
FROM dbo.DinnerPlans AS dp
WHERE is_free = 1;

For Every Occasion

There are likely times when each and every isolation level is appropriate, or even required, for parts of a workload to function correctly.

Just like settings for parallelism, max server memory, and many other things in SQL Server, it’s your job to set them as appropriate guardrails for the workload as a whole.

Most workloads work better using Read Committed Snapshot Isolation. If there are specific queries in your workload with different needs, you have many wonderful options to fix them.

In some cases the READCOMMITTEDLOCK hint may be the minimum effective dose. You may also read this post and realize that you need a stronger isolation level with better guarantees, like Repeatable Read or Serializable.

Many people are surprised that Repeatable Read only takes its special locks on rows as it’s reading them, and that changes ahead of where the reads have occurred can still occur, and inserts can still occur between rows that have been read.

Like I said before, no isolation level is perfect, and many developers are surprised by the details of each one.

Most people think that Read Committed works the way Serializable does, where the set of rows you’ve read and are yet to read are somehow a golden copy of the data. That is not true.

The misunderstandings usually arise from a lack of testing for expected results under high concurrency.

Tools like SQL Query Stress and ostress can be invaluable for making sure you’re getting what you want from whatever isolation level you’re using.

Thanks for reading!