In yesterday’s post, I showed you a function rewrite from Scalar to Inline Table Valued, and gave you a couple example calls.
Now, if this is all you’re doing with a function, there’s absolutely no need to rewrite them.
SELECT
cl.*
FROM dbo.CountLetters('1A1A1A1A1A') AS cl;
SELECT
CountLetters =
dbo.CountLetters_Bad('1A1A1A1A1A');
If you’re doing something like this, and maybe assigning it to a variable or using it to guide some branching logic, don’t you sweat it for one single solitary second.
You may want to make sure whatever code inside the function runs well, but changing the type of function here isn’t going to improve things.
More realistically, though, you’re going to be calling functions as part of a larger query.
Second To None
Let’s say you’re doing something a bit like this:
SELECT
u.DisplayName,
TotalScore =
SUM(p.Score * 1.),
Thing =
dbo.CountLetters_Bad(u.DisplayName)
FROM dbo.Users AS u
JOIN dbo.Posts AS p
ON p.OwnerUserId = u.Id
WHERE EXISTS
(
SELECT
1/0
FROM dbo.Votes AS v
WHERE p.Id = v.PostId
)
AND u.Reputation >= 100000
GROUP BY
u.DisplayName
ORDER BY
TotalScore DESC;
It’s a far different scenario than just assigning the output of a Scalar UDF to a variable or using it to guide some branching logic.
Brooklyn Zoo
A few minor syntax changes to the function and to how the query calls it can make a big difference.
SELECT
u.DisplayName,
TotalScore =
SUM(p.Score * 1.),
Thing =
(SELECT * FROM dbo.CountLetters(u.DisplayName))
FROM dbo.Users AS u
JOIN dbo.Posts AS p
ON p.OwnerUserId = u.Id
WHERE EXISTS
(
SELECT
1/0
FROM dbo.Votes AS v
WHERE p.Id = v.PostId
)
AND u.Reputation >= 100000
GROUP BY
u.DisplayName
ORDER BY
TotalScore DESC;
Since this is a table valued function, you have to ask for results from it like you’d ask for results from a table.
As long as someone wins who you’re rooting for. But here, no one’s rooting for scalar functions. They’re just unlovable.
I’m going to show you the very end of these plans to see the timing differences.
1945
The Scalar UDF plan takes ~23 seconds, and the inline TVF plan takes 7.5 seconds.
And this is why testing certain linguistic elements in SQL needs to be done realistically. Just testing a single value would never reveal performance issues.
Thanks for reading!
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount on to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly.
Imagine you have a rather complicated query that you want to abstract into a simple query for your less-than-with-it end users.
A view is probably a pretty good way of doing that, since you can shrink your preposterously-constructed tour through every table in the schema down to a simple select-from-one-object.
The problem is that now everyone expects it to perform well throughout all time, under any circumstances, come what may. It’s sort of like how your parents expect dinner to be $20 and tips to be 20% regardless of where they go or what they order.
Lobster? $5.
Steak? $5.
Bottle of wine? $5.
Any dessert you can imagine? $5.
Tip? Gosh, mister, another $5?
I sincerely apologize to anyone who continues to live in, or who moved to Europe to avoid tipping.
If you’d like some roommates, I have some parents you’d get along with.
Viewfinder
Creating a view in SQL Server doesn’t do anything special for you, outside of not making people remember your [reference to joke above] query.
You can put all manner of garbage in your view, make it reference another half dozen views full of garbage, and expect sparkling clean query performance every time.
Guess what happens?
Reality.
When you use views, the only value is abstraction. You still need to be concerned with how the query is written, and if the query has decent indexes to support it. In other words, you can’t just write a view and expect the optimizer to do anything special with it.
SQL Server doesn’t cache results, it only caches raw data. If you want the results of a view to be saved, you need to index it.
Take these two dummy queries, one against a created view, and the other an ad hoc query identical to what’s in the view:
CREATE OR ALTER VIEW
dbo.just_a_query
WITH SCHEMABINDING
AS
SELECT
p.OwnerUserId,
TotalScore =
ISNULL
(
SUM(p.Score),
0
),
TotalPosts =
COUNT_BIG(*)
FROM dbo.Posts AS p
WHERE EXISTS
(
SELECT
1/0
FROM dbo.Votes AS v
WHERE
v.PostId = p.Id
)
GROUP BY
p.OwnerUserId;
GO
SELECT
p.OwnerUserId,
TotalScore =
ISNULL
(
SUM(p.Score),
0
),
TotalPosts =
COUNT_BIG(*)
FROM dbo.Posts AS p
WHERE EXISTS
(
SELECT
1/0
FROM dbo.Votes AS v
WHERE
v.PostId = p.Id
)
AND
p.OwnerUserId = 22656
GROUP BY
p.OwnerUserId;
GO
SELECT
jaq.*
FROM dbo.just_a_query AS jaq
WHERE
jaq.OwnerUserId = 22656;
GO
The plans are identical, and identically bad. Why? Because I didn’t try very hard, and there’s no good indexes for them.
Remember when I said that’s important?
avenues lined with trees
Keep in mind this is a query with some batch mode involved, so it could be a lot worse. But both instances complete within a second or so of each other.
The horrible thing is that indexed views are so strict in SQL Server that we can’t even create one on the view in question. That really sucks. We get this error.
CREATE UNIQUE CLUSTERED INDEX
cuqadoodledoo
ON dbo.not_just_a_query
(
OwnerUserId
)
WITH
(
SORT_IN_TEMPDB = ON,
DATA_COMPRESSION = PAGE
);
Msg 10127, Level 16, State 1, Line 95
Cannot create index on view “StackOverflow2013.dbo.not_just_a_query” because it contains one or more subqueries.
Consider changing the view to use only joins instead of subqueries. Alternatively, consider not indexing this view.
Alternatively, go screw yourself. Allowing joins but not exists is somewhat baffling, since they’re quite different in that joins allow for multiple matches but exists does not. We’d have to do a lot of fancy grouping footwork to get equivalent results with a join, since distinct isn’t allowed in an indexed view in SQL Server either.
We could also pull the exists out of the view, add the Id column to the select list, group by that and OwnerUserId, index both of them, and… yeah nah.
I have no idea who’s in charge of indexed views in the product at this point, but a sufficiently lubricated republic would likely come calling with tar and feathers in the face of this injustice.
This is basic query syntax. It’s not like uh… min, max, sum, avg, except, intersect, union, union all, cross apply, outer apply, outer joins, or um, hey, is it too late for me to change careers?
The Pain In Pain Falls Painly On The Pain
You may have ended up here looking to learn all the minute differences between views and indexed views in SQL Server.
You may be disappointed in reading this post, but I can assure you that you’re not nearly as disappointed in this post as I am with indexed views in SQL Server.
They’re like one of those articles about flying cars where you read the headline and you’re like “woah, I’m living in the future”, but then three paragraphs in you find out the cars don’t really fly or drive and they might actually just be igloos that are only big enough for an Italian Greyhound or a paper plane that the author’s kid glued wheels to.
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount on to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly.
When you’re trying to figure out how to store string data, it often seems easiest to just choose an extra long — even MAX — data type to avoid future truncation errors.
Even if you’re storing strings with a known, absolute length, developers may choose to not enforce that in the application, either via a drop down menu or other form of validation.
And so to avoid errors when users try to put their oh-so-important data in their oh-so-expensive database, we get columns added to tables that can fit a galaxy of data in them, when we only need to store an ashtray worth of data.
While getting data into those columns is relatively easy — most application inserts are single rows — getting data out of those columns can be quite painful, whether it’s searching or just presenting in the select portion of a query.
Let’s look at a couple simple examples of how that happens.
Search Engine
Let’s take a query like this one:
SELECT TOP (20)
p.Id,
p.Title,
p.Body
FROM dbo.Posts AS p
WHERE p.Body LIKE N'SQL Server%';
The Body column in the Posts table is nvarchar and MAX, but the same thing would happen with a varchar column.
If you need a simple way to remember how to pronounce those data types, just remember to Pahk yah (n)vahcah in Hahvahd Yahd.
Moving on – while much has been written about leading wildcard searches (that start with a % sign), we don’t do that here. Also, in general, using charindex or patindex instead of leading wildcard like searching won’t buy you all that much (if anything at all).
Anyway, since you can’t put a MAX datatype in the key of an index, part of the problem with them is that there’s no way to efficiently organize the data for searching. Included columns don’t do that, and so we end up with a query plan that looks some-such like this:
ouch in effect
We spend ~13.5 seconds scanning the clustered index on the Posts table, then about two minutes and twenty seven seconds (minus the original 13.5) applying the predicate looking for posts that start with SQL Server.
That’s a pretty long time to track down and return 19 rows.
Let’s change the query a little bit and look at how else big string columns can cause problems.
Memory Bank
Rather than search on the Body column, let’s select some values from it ordered by the Score column.
Since Score isn’t indexed, it’s not sorted in the database. That means SQL Server needs to ask for memory to put the data we’re selecting in the order we’re asking for.
SELECT TOP (200)
p.Body
FROM dbo.Posts AS p
ORDER BY p.Score DESC;
The plan for this query asks for a 5GB memory grant:
quietly
I know what you’re thinking: the Body column probably has some pretty big data in it, and you’re right. In this case, it’s the right data type to use.
The bad news is that SQL Server will makes the same memory grant estimation based on the size of the data we need to sort whether or not it’s a good choice.
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount on to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly.
Several weeks back, I blogged about a missed opportunity with the new parameter sensitive plan feature. At the time, I thought that there was indeed sufficient skewness available to trigger the additional plan variants, and apparently some nice folks at Microsoft agreed.
I’m not gonna go into a lot of depth here, because I’m writing this while traveling, but we get the (mostly) desired outcome of two different plans being generated. Each plan is more suitable to the amount of data that the query has to process.
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount on to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly.
Why? Because that’s the way it’s implemented in Postgres and DB2. Oracle, of course, just calls it TRUNC.
So, while it’s nice to have (what appears to be) the same behavior, it doesn’t exactly help to not have a 1:1 calling equivalent to other platforms.
I assume most of these additions to T-SQL are for cross-platform development and migrations.
Of course, Microsoft being so gosh darn late to this game means folks have probably been rolling-their-own versions of these functions for years.
If they went and called their system function DATE_TRUNC or even TRUNC, they might have some object naming issues to contend with.
Well, okay. But how does it work?
Childish Games
Here are some quick examples of how you call it.
SELECT TOP (10)
u.DisplayName,
year =
DATETRUNC(YEAR, u.LastAccessDate),
quarter =
DATETRUNC(QUARTER, u.LastAccessDate),
month =
DATETRUNC(MONTH, u.LastAccessDate),
dayofyear =
DATETRUNC(DAYOFYEAR, u.LastAccessDate),
day =
DATETRUNC(DAY, u.LastAccessDate),
week =
DATETRUNC(WEEK, u.LastAccessDate),
iso_week =
DATETRUNC(ISO_WEEK, u.LastAccessDate),
hour =
DATETRUNC(HOUR, u.LastAccessDate),
minute =
DATETRUNC(MINUTE, u.LastAccessDate),
second =
DATETRUNC(SECOND, u.LastAccessDate),
millisecond =
DATETRUNC(MILLISECOND, u.LastAccessDate),
microsecond =
DATETRUNC(MICROSECOND, u.LastAccessDate) /*Doesn't work with datetime because there are no microseconds*/
FROM dbo.Users AS u;
And here are the results:
workin’
The thing to note here is that there’s no rounding logic involved. You just go to the start of whatever unit of time you choose. Of course, this doesn’t seem to do anything to the millisecond portion of DATETIME, because it’s not precise enough.
But for anyone out there who was hoping for a SOMONTH function to complement the EOMONTH function, you get this instead.
Works well enough!
But does it perform, Darling?
UnSARGable?
To make any test like this worthwhile, we need an index to make data searchable.
CREATE INDEX
v
ON dbo.Votes
(CreationDate)
WITH
(
SORT_IN_TEMPDB = ON,
DATA_COMPRESSION = PAGE
);
You know, because that’s what they do. To make searching faster. Hello.
So look, under these perfect circumstances, everything performs well. But we have to do a lot of typing.
SELECT
c = COUNT_BIG(*)
FROM dbo.Votes AS v
WHERE v.CreationDate >= DATETRUNC(YEAR, CONVERT(datetime, '20130101 00:00:00.000'));
Note here that we’re working on a literal value, not a column value, and we have to tell the datetrunc function which type we want via the convert function so that we get a simple seek plan:
SELECT
c = COUNT_BIG(*)
FROM dbo.Votes AS v
WHERE v.CreationDate >= DATETRUNC(YEAR, '20130101 00:00:00.000');
This has some… obvious performance issues compared to the above plan with correct data types.
query pan
Query Pranks
Frequent blog readers will not be surprised that wrapping a table column in the new DATETRUNC function yields old performance problems:
SELECT
c = COUNT_BIG(*)
FROM dbo.Votes AS v
WHERE DATETRUNC(YEAR, v.CreationDate) >= CONVERT(datetime, '20130101 00:00:00.000');
This is particularly annoying because we’re truncating the column to the beginning of the year, which should be searchable in the index since that’s the sort order of the data in the index.
shined out
Like most functions, these are fine in the presentation layer, but terrible in the relational layer. There’s no warning about this performance degradation in the documentation, either at the example of using the function in a where clause, or in the final closing remarks.
But that’s par for the course with any of these built-in functions.
Thanks for reading!
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount on to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly.
USE StackOverflow2013;
EXEC dbo.DropIndexes;
SET NOCOUNT ON;
DBCC FREEPROCCACHE;
GO
CREATE INDEX
chunk
ON dbo.Posts
(OwnerUserId, Score DESC)
INCLUDE
(CreationDate, LastActivityDate)
WITH
(MAXDOP = 8, SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);
GO
CREATE OR ALTER VIEW
dbo.PushyPaul
WITH SCHEMABINDING
AS
SELECT
p.OwnerUserId,
p.Score,
p.CreationDate,
p.LastActivityDate,
PostRank =
DENSE_RANK() OVER
(
PARTITION BY
p.OwnerUserId
ORDER BY
p.Score DESC
)
FROM dbo.Posts AS p;
GO
SELECT
p.*
FROM dbo.PushyPaul AS p
WHERE p.OwnerUserId = 22656;
GO
CREATE OR ALTER PROCEDURE
dbo.StinkyPete
(
@UserId int
)
AS
SET NOCOUNT, XACT_ABORT ON;
BEGIN
SELECT
p.*
FROM dbo.PushyPaul AS p
WHERE p.OwnerUserId = @UserId;
END;
GO
EXEC dbo.StinkyPete
@UserId = 22656;
/*Start Here*/
ALTER DATABASE
StackOverflow2013
SET PARAMETERIZATION SIMPLE;
DBCC TRACEOFF
(
4199,
-1
);
ALTER DATABASE SCOPED CONFIGURATION
SET QUERY_OPTIMIZER_HOTFIXES = OFF;
SELECT
p.*
FROM dbo.PushyPaul AS p
WHERE p.OwnerUserId = 22656
AND 1 = (SELECT 1); /*Avoid trivial plan/simple parameterization*/
/*Let's cause a problem!*/
ALTER DATABASE
StackOverflow2013
SET PARAMETERIZATION FORCED;
SELECT
p.*
FROM dbo.PushyPaul AS p
WHERE p.OwnerUserId = 22656
AND 1 = (SELECT 1); /*Avoid trivial plan/simple parameterization*/
/*Can we fix the problem?*/
DBCC TRACEON
(
4199,
-1
);
SELECT
p.*
FROM dbo.PushyPaul AS p
WHERE p.OwnerUserId = 22656
AND 1 = (SELECT 1); /*Avoid trivial plan/simple parameterization*/
/*That's kinda weird...*/
DBCC FREEPROCCACHE;
SELECT
p.*
FROM dbo.PushyPaul AS p
WHERE p.OwnerUserId = 22656
AND 1 = (SELECT 1); /*Avoid trivial plan/simple parameterization*/
/*Turn Down Service*/
DBCC TRACEOFF
(
4199,
-1
);
SELECT
p.*
FROM dbo.PushyPaul AS p
WHERE p.OwnerUserId = 22656
AND 1 = (SELECT 1); /*Avoid trivial plan/simple parameterization*/
/*Okay then.*/
/*I'm different.*/
ALTER DATABASE SCOPED CONFIGURATION
SET QUERY_OPTIMIZER_HOTFIXES = ON;
SELECT
p.*
FROM dbo.PushyPaul AS p
WHERE p.OwnerUserId = 22656
AND 1 = (SELECT 1); /*Avoid trivial plan/simple parameterization*/
/*Cleanup*/
ALTER DATABASE
StackOverflow2013
SET PARAMETERIZATION SIMPLE;
ALTER DATABASE SCOPED CONFIGURATION
SET QUERY_OPTIMIZER_HOTFIXES = OFF;
DBCC TRACEOFF
(
4199,
-1
);
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount on to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.
In the release notes for SQL Server 2017 CU30, there’s a note that it fixes a problem where parameters can’t be pushed passed Sequence Project operators:
“In Microsoft SQL Server 2017, running parameterized queries skips the SelOnSeqPrj rule. Therefore, pushdown does not occur.” But it doesn’t actually do that.
Here are the good and bad plans, comparing using a literal value vs. a parameterized value:
dunksville
In the plan with a literal value, the predicate is applied at the index seek, and the filtering is really quick.
In the plan with a parameterized value, the index is scanned, and applied at a filter way later in the query plan.
This is where the SelOnSeqPrj rule comes in: The parameter can’t be pushed past the Sequence Project operator like the literal value can.
Thanks for reading!
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount on to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.
I can’t have my dear friend Brent being all distraught with all those fast cars around. That’s how accidents happen, and I fear he might leave the Blitz scripts to me in his will or something.
In Paul’s post, he talks about using undocumented trace flag 8666 to get additional details about Sort operators.
Let’s do that. Paul is smart, though he is always completely wrong about which season it is.
DROP TABLE IF EXISTS
dbo.Votes_CCI;
SELECT
v.*
INTO dbo.Votes_CCI
FROM dbo.Votes AS v;
I’m using the Votes table because it’s nice and narrow and I don’t have to tinker with any string columns.
Strings in databases were a mistake, after all.
DBCC TRACEON(8666);
CREATE CLUSTERED COLUMNSTORE INDEX
vcci
ON dbo.Votes_CCI
ORDER (Postid);
DBCC TRACEOFF(8666);
Here’s what we get back in the query plan:
Tainted Sort
We’ve got a Soft Sort! What does our seasonally maladjusted friend say about those?
A “soft sort” uses only its primary memory grant and never spills. It doesn’t guarantee fully-sorted output. Each sort run using the available memory grant will be sorted. A “sort sort” represents a best effort given the resource available. This property can be used to infer that a Sort is implemented with CQScanPartitionSortNew without attaching a debugger. The meaning of the InMemory property flag shown above will be covered in part 2. It does not indicate whether a regular sort was performed in memory or not.
Well, with that attitude, it’s not surprising that there are so many overlapping buckets in the column store index. If it’s not good enough, what can you do?
Building the index with the Soft Sort here also leads to things being as bad as they were in Brent’s post.
Insert Debugging Here
Alas, there’s (almost) always a way. Microsoft keeps making these trace flag things.
There are a bunch of different ways to track them down, but figuring out the behavior of random trace flags that you may find just by enabling them isn’t easy.
One way to tie a trace flag to a behavior is to use WinDbg to step through different behaviors in action, and see if SQL Server checks to see if a trace flag is enabled when that behavior is performed.
If you catch that, you can be reasonably sure that the trace flag will have some impact on the behavior. Not all trace flags can be enabled at runtime. Some need to be enabled as startup options.
Sometimes it’s hours and hours of work to track this stuff down, and other times Paul White (b|t) already has notes on helpful ones.
The trace flag below, 2417, is present going back to SQL Server 2014, and can help with the Soft Sort issues we’re seeing when building ordered clustered column store indexes today.
Here’s another one:
DBCC TRACEON(8666, 2417);
CREATE CLUSTERED COLUMNSTORE INDEX
vcci
ON dbo.Votes_CCI
ORDER (Postid)
WITH(MAXDOP = 1);
DBCC TRACEOFF(8666, 2417);
The MAXDOP 1 hint isn’t strictly necessary. With a parallel plan, you may see up to DOP overlapping row groups.
community service
That’s why it was a popular maneuver to emulate this behavior by creating a clustered row store index, and then create a clustered column store index over it with drop existing and a MAXDOP 1 hint.
At DOP 1, you don’t see that overlap. It takes a lot longer of course — 3 minutes instead of 30 or so seconds — which is a real bummer. But without it, you could see DOP over lapping rowgroups.
If you want All The Pretty Little Rowgroups, this is what you have to do.
Anyway, the result using sp_BlitzIndex looks a lot better now:
EXEC sp_BlitzIndex
@TableName = 'Votes_CCI';
capture the flag
How nice.
You can also use undocumented and unsupported trace flag 11621, which is
[A] feature flag for the ‘partition sort on column store order’ so the end result is similar, but via a different mechanism to 2417.
A partition sort is useful in general to prevent unnecessary switching between partitions. If you sort the stream by partition, you process all the rows for one before moving on to the next. A soft sort is ok there because it’s just a performance optimization. Worst case, you end up switching between partitions quite often because the sort ran out of memory, but correct results will still occur.
Chain Gang
A “reasonable” alternative to trace flags maybe to adjust the index create memory configuration option. If we set it down to the minimum value, we get a “helpful” error message:
This index operation requires 123208 KB of memory per DOP.
The total requirement of 985800 KB for DOP of 8 is greater than the sp_configure value of 704 KB set for the advanced server configuration option “index create memory (KB)”.
Increase this setting or reduce DOP and rerun the query.
If you get the actual execution plan for the clustered column store index create or rebuild with the Soft Sort disabled and look at the memory grant, you get a reasonable estimate for what to set index create memory to.
Changing it does two things:
Avoids the very low memory grant that Soft Sorts receive, and causes the uneven row groups
The Soft Sort keeps the index create from going above that index create memory number
Setting index create memory for this particular index creation/rebuild to 5,561,824 gets you the nice, even row groups (at MAXDOP 1) that we saw when disabling the Soft Sort entirely.
Bottom line, here is that uneven row groups happen with column store indexes when there’s a:
Parallel create/rebuild
Low memory grant create/rebuild
If this sort of thing is particularly important to you, you could adjust index create memory to a value that allows the Soft Sort adequate memory.
But that’s a hell of a lot of work, and I hope Microsoft just fixes this in a later build.
The bits for this were technically available in SQL Server 2019 as well, but I’m not telling you how to do that. It’s not supported, and bad things might happen if you use it.
I mean, bad things happen in SQL Server 2022 where it’s supported unless you use an undocumented trace flag, but… Uh. I dunno.
This trace flag seems to set things back to how things worked in the Before Times, though, which is probably how they should have stayed.
Thanks for reading!
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount on to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.
There are some code comments you see that really set the stage for how tuning a query is going to go.
Usually one misgiving about how SQL Server works gives way to a whole levee-breaking bevy of other ones and three days later you can’t feel your legs but dammit it’s done.
Okay, maybe it was three hours, but it felt like three days. Something about the gravitation pull of these black hole queries.
One fix I’ve been wishing for, or wish I’ve been fixing for, is a cure for local variables. I’d even be cool if Forced Parameterization was that cure, but you know…
Time will tell.
Husk
Let’s say we’ve got this stored procedure, which does something similar to the “I’m gonna fix parameter sniffing with a local variable hey why is everything around me turning to brimstone before my very eyes?” idea, but with… less of an end-of-times vibe.
CREATE OR ALTER PROCEDURE
dbo.IndexTuningMaster
(
@OwnerUserId int,
@ParentId int,
@PostTypeId int
)
AS
BEGIN
SET NOCOUNT, XACT_ABORT ON;
/*Someone passed in bad data and we got a bad query plan,
and we have to make sure that doesn't happen again*/
DECLARE
@ParentIdFix int =
CASE
WHEN @ParentId < 0
THEN 0
ELSE @ParentId
END;
SELECT TOP (1)
p.*
FROM dbo.Posts AS p
WHERE p.ParentId = @ParentIdFix
AND p.PostTypeId = @PostTypeId
AND p.OwnerUserId = @OwnerUserId
ORDER BY
p.Score DESC,
p.Id DESC;
END;
We get a super low guess for both. obviously that guess hurts a large set of matched data far worse than a small one, but the important thing here is that both queries receive the same bad guess.
This is a direct side effect of the local variable’s poor estimate, which PSP isn’t quite yet ready to go up against.
Thanks for reading!
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount on to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.
While the Parameter Sensitive Plan (PSP) optimization won’t fix every problem with this lazy coding habit, it can fix some of them in very specific circumstances, assuming:
The parameter is eligible for PSP
The parameter is present across IF branches
We’re going to use a simple one parameter example to illustrate the potential utility here.
After all, if I make these things too complicated, someone might leave a comment question.
The horror
IFTTT
Here’s the procedure we’re using. The point is to execute one branch if @Reputation parameter is equal to one, and another branch if it equals something else.
In the bad old days, both queries would get a plan optimized at compile time, and neither one would get the performance boost that you hoped for.
In the good news days that you’ll probably get to experience around 2025, things are different!
CREATE OR ALTER PROCEDURE
dbo.IFTTT
(
@Reputation int
)
AS
BEGIN
SET NOCOUNT, XACT_ABORT ON;
SET STATISTICS XML ON;
IF @Reputation = 1
BEGIN
SELECT
u.Id,
u.DisplayName,
u.Reputation,
u.CreationDate
FROM dbo.Users AS u
WHERE u.Reputation = @Reputation;
END;
IF @Reputation > 1
BEGIN
SELECT
u.Id,
u.DisplayName,
u.Reputation,
u.CreationDate
FROM dbo.Users AS u
WHERE u.Reputation = @Reputation;
END;
SET STATISTICS XML OFF;
END;
GO
Johnson & Johnson
If we execute these queries back to back, each one gets a new plan:
EXEC dbo.IFTTT
@Reputation = 1;
GO
EXEC dbo.IFTTT
@Reputation = 2;
GO
psychic driving
Optimize For You
The reason why is in the resulting queries, as usual. The Reputation column has enough skew present to trigger the PSP optimization, so executions with differently-bucketed parameter values end up with different plans.
And of course, each plan has different compile and runtime values:
care
If I were to run this demo in a compatibility level under 160, this would all look totally different.
This is one change I’m sort of interested to see the play-out on.
Thanks for reading!
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount on to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.