Even If SQL Server Table Variables Were Always In Memory, It Wouldn’t Make Them Better Than Temp Tables

Past Perfect


You know all those pesky developer myths that never go away?

  • CTEs and Views can’t use indexes: WRONG
  • Subqueries are slower than joins: WRONG
  • Temp tables are bad for performance: WRONG
  • Table Variables are always in memory: WRONG

Okay, that’s not a complete list, but a complete list would fill the internet to its very brim and cause all sorts of overflow errors.

My job as a performance tuning consultant is to teach folks when they’ve been lead astray.

To confuse things a bit further, Microsoft allows for in-memory @table variables, but you have do a lot of work to get set up to use them.

They’re not the ones you get out of the box.

Future Mediocre


Let’s say for the sake of argument that @table variables were always more “in-memory” than #temp tables (they’re not). They’d still have a lot of the same fundamental problems:

With all that stuff in mind: who cares if they’re only in memory? There are so many downsides for most query patterns, they could all happen in quantum space and performance would still stink.

I know, they have their uses:

But most of y’all aren’t using them that wisely.

And here’s the other thing: Most of y’all don’t have enough memory to begin with, and the memory you do have you misuse in a million ways. Why do you wanna make that worse?

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

Software Vendor Mistakes With SQL Server: Misusing @Table Variables

The Miseducation Of Everyone


Temporary objects are one of those tricky things. You probably know you should be using them for certain things, but which one to use is a constant source of trial, error, and coin-tosses.

In these videos from my training, I’m going to go through the downsides of table variables. There’s one free video from YouTube at the end about when you should use them, too.


Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Using Batch Mode for the “Matching Supply With Demand” Challenge

Itzik Ben-Gan posted an interesting T-SQL challenge on SQL performance dot com. I’m writing up my solution in my own blog post because I have a lot to say and getting code formatting right can be tricky in blog post comments. For reference, my test machine is using SQL Server 2019 CU14 with an i7-9700K CPU @ 3.60 GHz processor. The baseline cursor solution completes in 8465 ms on my machine.

Running Totals

A simple way to solve this problem is to calculate running totals of quantity separately for supply and demand, treat the resulting rows as intervals, and find the intersections between supply and demand intervals. It is fairly straightforward and fast to calculate the interval start and end points and load them into temp tables, so I’ll omit that code for now. We can find the intersecting intervals with the query below( which you should not run):

SELECT d.ID DemandId, s.ID as SupplyID,
CASE
WHEN d.IntervalEnd >= s.IntervalEnd THEN s.IntervalEnd - CASE WHEN s.IntervalStart >= d.IntervalStart THEN s.IntervalStart ELSE d.IntervalStart END
ELSE d.IntervalEnd - CASE WHEN s.IntervalStart >= d.IntervalStart THEN s.IntervalStart ELSE d.IntervalStart END
END TradeQuantity
FROM #demand_intervals d
CROSS JOIN #supply_intervals s
WHERE d.IntervalEnd > s.IntervalStart AND d.IntervalStart < s.IntervalEnd
OPTION (QueryRuleOff BuildSpool);

Performance is terrible for this query because there is no possible index that will make that join condition fast. The query takes 955 seconds of CPU time and 134 seconds of elapsed time at MAXDOP 8.

This blog post is about batch mode, so we need to turn that nested loop join into a hash join. It is important to recall that a hash join requires at least one equality condition in the join clause.

Exploring the Data

It’s important to Know Your Data while query writing, especially if you want to cheat in a performance competition. Taking a quick look at the Auctions table with the 400k row configuration:

The things that stand out to me are that there are an equal number of supply and demand rows, both supply and demand have nearly the same average value, and the maximum quantity for both supply and demand is under 100. We can exploit the relatively low maximum quantity value to improve performance. A supply interval with a end point that is more than 100 units away from the demand end point cannot possibly intersect it. This is one of those things that feels intuitively correct, but I’ll go ahead and prove it anyway by contradiction.

Suppose that 100 is the maximum interval length, [d_start, d_end] and [s_start, s_end] overlap, s_end is more than 100 units away from d_end.

  1. The distance between end points implies that d_end < s_end – 100
  2. If they overlap, then s_start < d_end
  3. This implies that s_start < d_end < s_end – 100
  4. This implies that s_start < s_end – 100
  5. This implies that s_end – s_start > 100

The final statement is impossible because 100 is the maximum interval length. You can do the same proof in the other direction. Therefore, it should be safe to add some filters to our original query:

SELECT d.ID DemandId, s.ID as SupplyID,
CASE
WHEN d.IntervalEnd >= s.IntervalEnd THEN s.IntervalEnd - CASE WHEN s.IntervalStart >= d.IntervalStart THEN s.IntervalStart ELSE d.IntervalStart END
ELSE d.IntervalEnd - CASE WHEN s.IntervalStart >= d.IntervalStart THEN s.IntervalStart ELSE d.IntervalStart END
END TradeQuantity
FROM #demand_intervals d WITH (TABLOCK)
CROSS JOIN #supply_intervals s WITH (TABLOCK)
WHERE d.IntervalEnd > s.IntervalStart AND d.IntervalStart < s.IntervalEnd
AND s.IntervalEnd >= d.IntervalEnd - 100
AND s.IntervalEnd <= d.IntervalEnd + 100;

Now we can get a proper index seek on the s.IntervalEnd column. Performance is significantly better with this additional filter clause. The total runtimes for all steps, including the omitted temp table creation, are 1030 ms of CPU time and 345 ms of elapsed time. We are of course not using batch mode though:

Buckets

Time for more math. Starting with the following filter conditions:

AND s.IntervalEnd >= d.IntervalEnd – 100
AND s.IntervalEnd <= d.IntervalEnd + 100;

I can divide both sides by 100 and the condition will still be true:

AND s.IntervalEnd / 100 >= d.IntervalEnd / 100 – 1
AND s.IntervalEnd / 100 <= d.IntervalEnd / 100 + 1

You’ll have to take my word for it that the equation is still true if we truncate everything down to integers:

AND FLOOR(s.IntervalEnd / 100) >= FLOOR(d.IntervalEnd / 100) – 1
AND FLOOR(s.IntervalEnd / 100) <= FLOOR(d.IntervalEnd / 100) + 1

Now we’re dealing with integers, so we can express this as the following:

FLOOR(s.IntervalEnd / 100) IN (FLOOR(d.IntervalEnd / 100) – 1, FLOOR(d.IntervalEnd / 100) + 0, FLOOR(d.IntervalEnd / 100) + 1)

Rewriting once again:

FLOOR(s.IntervalEnd / 100) = FLOOR(d.IntervalEnd / 100) – 1
OR FLOOR(s.IntervalEnd / 100) = FLOOR(d.IntervalEnd / 100) + 0
OR FLOOR(s.IntervalEnd / 100) = FLOOR(d.IntervalEnd / 100) + 1

That’s still not eligible for a hash join, but it is if we change it to a UNION ALL:

... JOIN ON ...
FLOOR(s.IntervalEnd / 100) = FLOOR(d.IntervalEnd / 100) - 1

UNION ALL

... JOIN ON ...
FLOOR(s.IntervalEnd / 100) = FLOOR(d.IntervalEnd / 100) + 0

UNION ALL

... JOIN ON ...
FLOOR(s.IntervalEnd / 100) = FLOOR(d.IntervalEnd / 100) + 1

We have successfully transformed a single BETWEEN join filter into three joins with equality filters. All three of those joins are eligible for batch mode.

A visual representation may be helpful. Suppose we have two demand intervals that end in 150.2 and 298.2. We need to match those intervals with 8 supply intervals with end points that range from 1.1 to 399.1. To do so, we can divide the intervals into buckets of 100 width and join on matching bucket values as well as the immediately adjacent bucket values:

The demand interval that ends in 150.2 has a bucket value of 1, so it is matched with supply buckets 0, 1, and 2. In this way, we can guarantee that any supply interval that’s 100 or fewer units away ends up getting matched with the right demand intervals. The supply buckets with a background of red are only matched to the demand interval of 150.2, blue is only matched to the demand interval of 298.2, and purple is matched to both. The buckets are overly inclusive of course. The actual rows that might match based on distance have their font color changed in the same way, but filtering out extra rows later will be no problem. The important thing is that we have an equality condition to work with. Now that we can finally perform hash joins, it’s time to work out all of the details.

Gotta Go Fast

We need to perform three operations:

  1. Calculate the maximum interval length.
  2. Calculate running totals for supply and demand, their bucket values, and load them into temporary objects.
  3. Query the temp tables to load the final results into another table.

For step 1, we simply need an index on the Quantity column to make the query fast. We also need to get an exclusive lock and hold it until we’re finished with all of the steps. Otherwise the max length could change in the middle of our code’s execution.

For step 2, the best performing options are either a MAXDOP 1 query with an ordered index scan and a batch mode window aggregate or a parallel query with a batch mode sort, parallel batch mode window aggregate, and parallel insert. An NCCI on the Auctions table is helpful in getting batch mode. In order to make the code go as fast as possible, I elected to use the parallel option at MAXDOP 4 even though it uses significantly more CPU time than the MAXDOP 1 option. DOP is at 4 instead of 8 due to contention caused by the NESTING_TRANSACTION_FULL latch. Here’s an estimated plan picture in case you don’t know what I mean by all of that:

For the third query, I’m using a UNION ALL of 3 joins that are eligible for batch mode like I said earlier. I added some query hints to reduce compile time.

As a step 0 that I forgot to mention earlier, create the following indexes on the Auctions table:

CREATE INDEX IX_Auctions__Code_ID__INCLUDE_Quantity ON Auctions (Code, ID) INCLUDE (Quantity);

CREATE INDEX IX_Auctions__Quantity ON Auctions (Quantity);

CREATE NONCLUSTERED COLUMNSTORE INDEX NCCI_Auctions ON Auctions (Code) with (MAXDOP = 1);

You can find the full code below for all three steps:

SET NOCOUNT ON;

DROP TABLE IF EXISTS #PairingsCursor2;
CREATE TABLE #PairingsCursor2
(
DemandID INT NOT NULL,
SupplyID INT NOT NULL,
TradeQuantity DECIMAL(19, 6) NOT NULL
);

DROP TABLE IF EXISTS #demand_intervals;
CREATE TABLE #demand_intervals (
ID INT NOT NULL,
IntervalStart DECIMAL(19, 6) NOT NULL,
IntervalEnd DECIMAL(19, 6) NOT NULL,
IntervalBucket BIGINT NOT NULL
);

DROP TABLE IF EXISTS #supply_intervals;
CREATE TABLE #supply_intervals (
ID INT NOT NULL,
IntervalStart DECIMAL(19, 6) NOT NULL,
IntervalEnd DECIMAL(19, 6) NOT NULL,
IntervalBucket BIGINT NOT NULL
);

DECLARE @MaxQuantityRange DECIMAL(19, 6);

BEGIN TRANSACTION;

SELECT @MaxQuantityRange = MAX(Quantity) - MIN(Quantity)
FROM Auctions WITH (TABLOCKX);

INSERT INTO #demand_intervals WITH (TABLOCK)
SELECT ID, rt - Quantity IntervalStart, rt IntervalEnd, CAST(rt / @MaxQuantityRange AS BIGINT) AS IntervalBucket
FROM
(
SELECT a.ID, Quantity, SUM(Quantity) OVER (ORDER BY a.ID ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) rt
FROM dbo.Auctions a WITH (TABLOCK)
WHERE Code = 'D'
) q
OPTION (USE HINT('ENABLE_PARALLEL_PLAN_PREFERENCE'), MAXDOP 4);

INSERT INTO #supply_intervals WITH (TABLOCK)
SELECT ID, rt - Quantity IntervalStart, rt IntervalEnd, CAST(rt / @MaxQuantityRange AS BIGINT) AS IntervalBucket
FROM
(
SELECT a.ID, Quantity, SUM(Quantity) OVER (ORDER BY a.ID ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) rt
FROM dbo.Auctions a WITH (TABLOCK)
WHERE Code = 'S'
) q
OPTION (USE HINT('ENABLE_PARALLEL_PLAN_PREFERENCE'), MAXDOP 4);

/*
-- prevents temp table caching, slight performance overhead in last query, but avoids "expensive" stats gathering for uncached object scenario
CREATE STATISTICS s0 ON #demand_intervals (IntervalBucket) WITH SAMPLE 0 ROWS;
CREATE STATISTICS s1 ON #supply_intervals (IntervalBucket) WITH SAMPLE 0 ROWS;
*/

INSERT INTO #PairingsCursor2 WITH (TABLOCK)
SELECT d.ID DemandId, s.ID as SupplyID,
CASE
WHEN d.IntervalEnd >= s.IntervalEnd THEN s.IntervalEnd - CASE WHEN s.IntervalStart >= d.IntervalStart THEN s.IntervalStart ELSE d.IntervalStart END
ELSE d.IntervalEnd - CASE WHEN s.IntervalStart >= d.IntervalStart THEN s.IntervalStart ELSE d.IntervalStart END
END TradeQuantity
FROM #demand_intervals d WITH (TABLOCK)
INNER JOIN #supply_intervals s WITH (TABLOCK) ON d.IntervalBucket = s.IntervalBucket
WHERE d.IntervalEnd > s.IntervalStart AND d.IntervalStart < s.IntervalEnd

UNION ALL

SELECT d.ID DemandId, s.ID as SupplyID, CASE WHEN d.IntervalEnd >= s.IntervalEnd THEN s.IntervalEnd - CASE WHEN s.IntervalStart >= d.IntervalStart THEN s.IntervalStart ELSE d.IntervalStart END
ELSE d.IntervalEnd - CASE WHEN s.IntervalStart >= d.IntervalStart THEN s.IntervalStart ELSE d.IntervalStart END
END TradeQuantity
FROM #demand_intervals d WITH (TABLOCK)
INNER JOIN #supply_intervals s WITH (TABLOCK) ON s.IntervalBucket = d.IntervalBucket - 1
WHERE d.IntervalEnd > s.IntervalStart AND d.IntervalStart < s.IntervalEnd

UNION ALL

SELECT d.ID DemandId, s.ID as SupplyID, CASE WHEN d.IntervalEnd >= s.IntervalEnd THEN s.IntervalEnd - CASE WHEN s.IntervalStart >= d.IntervalStart THEN s.IntervalStart ELSE d.IntervalStart END
ELSE d.IntervalEnd - CASE WHEN s.IntervalStart >= d.IntervalStart THEN s.IntervalStart ELSE d.IntervalStart END
END TradeQuantity
FROM #demand_intervals d WITH (TABLOCK)
INNER JOIN #supply_intervals s WITH (TABLOCK) ON s.IntervalBucket = d.IntervalBucket + 1
WHERE d.IntervalEnd > s.IntervalStart AND d.IntervalStart < s.IntervalEnd
OPTION (USE HINT('ENABLE_PARALLEL_PLAN_PREFERENCE'), MAXDOP 8, HASH JOIN, CONCAT UNION, FORCE ORDER, NO_PERFORMANCE_SPOOL); -- reduce compile time

COMMIT TRANSACTION;

For the first execution, CPU time is generally around 1245 ms and elapsed time is around 500 ms. For subsequent executions, CPU time is around 634 ms and elapsed time is around 168 ms. Here is an actual execution plan for step 3:

Not surprisingly, the join for matching buckets returns significantly more rows than the joins for adjacent buckets.

REEEEEEEEEEEEECOMPILES

You may be wondering why the first execution takes so much longer than subsequent executions. The problem is the creation of temp table statistics. For some reason, SQL Server issues four StatMan queries per statistic that it creates:

Some of them run at MAXDOP 1 and some of them run at MAXDOP 8 and all of that time adds up. After the first execution the creation of statistics no longer happens, I assume due to statistics caching for temporary tables. Even adding an explicit FULLSCAN create statistics statement doesn’t avoid this problem for some reason. You can create 0 row statistics if you like (see the commented out code), but I’m going to declare a moral victory here instead of digging into it further. If the biggest performance problem for your query is statistics creation on a pair of 200k row temp tables then you probably have pretty efficient code.

With that said, table variables are an interesting alternative here. I get consistent runtimes of 550 CPU ms and 230 ms of elapsed time using table variables. The deferred compilation feature introduced in SQL Server 2019 is important to get the right query plan. We lose a bit of runtime for the MAXDOP 1 table variable inserts and the final insert is slightly less efficient as well due to the missing statistics. Still, the runtimes are consistent between the first and second executions and overall CPU usage is down. Also you have to admit that seeing a batch mode table variable scan with a correct cardinality estimate (without RECOMPILE) is pretty cool:

Query Scaling

The performance of the batch mode query greatly depends on the maximum interval length. Query runtime increases in a roughly linear fashion as the max length increases:

The linear pattern is broken for the final test case because all of the rows are in a single bucket at that point, so two out of three of the joins don’t do much of anything. Is this type of query safe to use in production? Sure, as long as you’re okay with significantly worse performance if a single outlier row happens to appear. Possible defenses against that include additional constraints on the data, knowing your data well, or writing a fallback algorithm if the maximum interval length is too long for the batch mode bucketizing approach to perform well.

Final Thoughts

This blog post shows how batch mode hash joins can be used as an efficient solution to find intersecting intervals, provided that the maximum interval length is sufficiently small. The important thing is everyone involved had fun. Thanks for reading!

How #Temporary Tables Can Cause Plan Cache Pollution In SQL Server

I Love #Temp Tables


I solve a lot of problems with #temp tables, indeed I do. And I hate people who are reflexively anti-#temp table.

If you’re into jokes (I’m definitely not into jokes; SQL is serious business), you could even call them #foolish.

Get it?

Ahem 🎤👈

However (!) I learned a lesson recently about how using them in certain ways can cause weird plan cache pollution. When you’re hitting the issue, the optional_spid column in dm_exec_plan_attributes  will be populated with a non-zero value. You can use this query to quickly check for that happening on your system:

SELECT 
    pa.attribute, 
    pa.value, 
    decp.refcounts,
    decp.usecounts,
    decp.size_in_bytes,
    decp.memory_object_address,
    decp.cacheobjtype,
    decp.objtype,
    decp.plan_handle
FROM sys.dm_exec_cached_plans AS decp
CROSS APPLY sys.dm_exec_plan_attributes (decp.plan_handle)  AS pa
WHERE pa.attribute = N'optional_spid'
AND   pa.value > 0;

Let’s talk about those!

Creating Across Stored Procedure Executions


Check out this piece of code:

CREATE OR ALTER PROCEDURE
    dbo.no_spid
AS
BEGIN
    SET NOCOUNT ON;

    CREATE TABLE #t (id int);
    INSERT #t (id) VALUES (1);

    EXEC dbo.a_spid; --Hi

END;
GO 


CREATE OR ALTER PROCEDURE
    dbo.a_spid
AS
BEGIN
    SET NOCOUNT ON;

    CREATE TABLE #t (id int);
    INSERT #t (id) VALUES (2);

END;
GO

In the first proc, we create a #temp table, and insert a row, then execute another proc, where we create a #temp table with the same name and definition and insert a row.

Using the above query, we’ll see this:

polluted

And if we run sp_BlitzCache, we’ll indeed see multiple plans for a_spid, though no_spid seems to get plans associated with it because the plans are hashed to the same value. Heh. That plan cache… 🙄

diamonds are forever

Referencing Across Stored Procedure Executions


Check out this code:

CREATE OR ALTER PROCEDURE
    dbo.internal
(
    @c bigint
)
AS
BEGIN
SET NOCOUNT ON;

    CREATE TABLE #t(id int);
    INSERT #t (id) VALUES (1);

    SELECT 
        @c = COUNT_BIG(*)
    FROM #t AS t
    WHERE 1 = (SELECT 1);

    EXEC dbo.not_internal 0; --Hi

END;
GO 

CREATE OR ALTER PROCEDURE
    dbo.not_internal
(
    @c bigint
)
AS
BEGIN

    INSERT #t (id) VALUES (2);

    SELECT 
        @c = COUNT_BIG(*)
    FROM #t AS t
    WHERE 1 = (SELECT 1);

END;
GO

We’re creating a #temp table in one stored procedure, and then executing another stored procedure that references the same #temp table this time.

Just like above, if we execute the procs across a couple different SSMS tabs, we’ll see this:

scope

And from the plan cache:

heearghh

Same thing as last time. Multiple plans for not_internal. In both cases, the outer stored procedure has an optional_spid of 0, but the inner procedure has the spid that executed it attached.

Dynamic SQL


My fellow blogger Joe Obbish came up with this one, which is really interesting. It’s necessary to point out that this is Joe’s code, so no one asks me why the formatting is so ugly 😃

CREATE OR ALTER PROCEDURE no_optional_spid AS
BEGIN
    CREATE TABLE #obj_count (
    [DB_NAME] SYSNAME NOT NULL,
    OBJECT_COUNT BIGINT NOT NULL
    );

    DECLARE @db_name SYSNAME = 'Crap';

    DECLARE @sql NVARCHAR(4000) = N'SELECT @db_name, COUNT_BIG(*)
    FROM ' + QUOTENAME(@db_name) + '.sys.objects';

    INSERT INTO #obj_count
    EXEC sp_executesql @sql, N'@db_name SYSNAME', @db_name = @db_name;
END;
GO 

CREATE OR ALTER PROCEDURE has_optional_spid AS
BEGIN
    CREATE TABLE #obj_count (
    [DB_NAME] SYSNAME NOT NULL,
    OBJECT_COUNT BIGINT NOT NULL
    );

    DECLARE @db_name SYSNAME = 'Crap';

    DECLARE @sql NVARCHAR(4000) = N'INSERT INTO #obj_count
    SELECT @db_name, COUNT_BIG(*)
    FROM ' + QUOTENAME(@db_name) + '.sys.objects';

    EXEC sp_executesql @sql, N'@db_name SYSNAME', @db_name = @db_name;

END;
GO 

In this case, we have two procs that create a #temp table with the same definition, and insert into them with dynamic SQL. I have a feeling that this would also occur under other circumstances where you use the INSERT…EXEC paradigm, e.g. a stored procedure.

Same deal here, if we look at the same things, except that it’s more helpful to look at the execution_count column in sp_BlitzCache.

BOBBY

And…

SUZIE

Everything has 200 executions, except the internal parameter table scan that does the #temp table insert:

fodder

5k Finisher


This post explores a few scenarios where the *optional_spid* cache pollution does happen. There are likely more, and I’m happy to add scenarios if any readers out there have them.

There are plenty of scenarios where this scenario doesn’t happen, too. I don’t want you to think it’s universal. Using #temp tables with the same name but different definitions, or without the cross-referencing, etc. won’t cause this issue to happen.

I tried a bunch of stuff that I thought would cause the problem, but didn’t.

So yeah. Love your #temp tables, too.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

LOB Data, Recompile, And Too Much tempdb Usage In SQL Server

Stumper


I got a very interesting issue about sp_HumanEvents recently, and I couldn’t quite figure out why things were going wonky.

The problem was that at some point when monitoring blocking long-term, something was taking up a whole lot of tempdb.

I’m gonna say up front that the problem was using a recompile hint with LOB variable assignment in a loop. That doesn’t mean you should avoid any one of those things, but be careful when you use them in concert.

This post wouldn’t be possible without my friends Joe and Paul, who helped me track down the issue and with the internals of it.

Thumper


Rather than make you go through running and causing blocking, the issue is reproduced through this demo:

DECLARE @lob nvarchar(max) = N'x';
DECLARE @x xml = N'<x>' + REPLICATE(@lob, 1024 * 1024) + N'</x>';
DECLARE @loop integer = 0;
DECLARE @t table (x nvarchar(1));

WHILE @loop < 5
BEGIN

    INSERT 
        @t
    (
        x
    )
    SELECT 
        x = @x.value('(./x/text())[1]', 'nvarchar(1)')
    OPTION (RECOMPILE);

    DELETE @t;

    SELECT 
        ddtsu.internal_objects_alloc_page_count, 
        ddtsu.internal_objects_dealloc_page_count,
        internal_object_reserved_page_count = 
        (
            SELECT 
                SUM(internal_object_reserved_page_count)
            FROM tempdb.sys.dm_db_file_space_usage
        )
    FROM sys.dm_db_task_space_usage AS ddtsu
    WHERE ddtsu.session_id = @@SPID;

    SET @loop += 1;
END;

The trick here is to run it with the recompile hint available, and then quoted out.

Jumper


With recompile available, this is the result:

denial

The internal objects keep going up, except the deallocated number. That’s the column in the middle.

With recompile quoted out, the numbers are a lot different.

reprisal

All numbers stay stable.

Lumper


As Mr. White points out in his post, assigning to a variable prevents parameter embedding.

  • When it’s allowed, tempdb objects get cleaned up at the end of the statement.
  • When it’s not, it gets cleaned up at the end of the batch.

This has been reported to Microsoft for some analysis. Hopefully there’s some remedy for it in the future.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

When Query Spills Cause tempdb Contention And Performance Issues In SQL Server

Temptables


I was helping a client troubleshoot a scalability issue recently, where during one point in their workload tempdb went out of control.

What was puzzling them was that nothing in the stored procedure did anything with temporary objects, cursors, or any of the other obvious things that go to el baño público. And the query plan itself was a pretty simple little loopy thing.

It wasn’t until I grabbed a quick sample of actual execution plans with Extended Events that the problem became more obvious.

Despite the fact that the query was only passing around 7 rows, we were getting a one row estimate, and there was a spilling Sort operator.

Chance Of Cloudballs


There were a couple compounding issues here, of course, like that their database is hosted in the cloud.

It doesn’t matter which cloud you talk about, this is a problem they all have and I see people face constantly.

I/O sucks, up there.

It was a lot harder to get this to repro on my local hardware, because I have very nice drives.

Let’s take a look!

Bleak


If I run this query, it’ll spill a little bit. I have the MAXDOP 1 hint on there to artificially slow it down. Without that, the fast query and I/O make the contention harder to hit.

SELECT TOP (1000) 
    u.Id 
FROM dbo.Users AS u
ORDER BY u.Reputation
OPTION(MAXDOP 1);
SQL Server Query Plan
just like me

I stuck it in a stored procedure to make calling it from ostress a little bit easier.

No Return


After running for a bit, eventually we hit GAM contention:

SQL Server tempdb Contention
tuesday

We can see all the classic signs of tempdb contention, arising from spills.

It’s worth noting that enabling in-memory tempdb seems to alleviate this issue for the most part. I’d show you a picture of that, but it’s not worth 1000 words.

Fixing The Spill


Normally, I wouldn’t bother fixing a spill of that size, but in this case it was worthwhile because it was harming concurrency.

The quick fix was to add a MIN_GRANT_PERCENT hint. The long-term fix is to add an index to support the Sort operation.

Good ol’ indexes, putting data in order.

There are many reasons why a spill may occur. Poor estimates are quite common, but the reason for that poor estimate has about a billion potential root causes.

Even if your query gets enough memory, sometimes you may still see spills because certain operators had to share memory.

It’s a wild world.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

SARGability Week: Using Temp Tables To Fix Non-SARGable Query Performance Problems In SQL Server

International


This post is especially interesting because it highlights the oddball performance situations you can run into when you write the type of All-In-One™ queries that the optimizer is typically not very good at handling, and also rely on expressions calculated at runtime as predicates.

I mean, it’s especially interesting if you’re into this sort of thing. If you’re not into this sort of thing, you’ll probably find it as interesting as I find posts about financial responsibility or home cooking.

I’ve seen query patterns like this while working with clients, and they’ve always ended poorly.

Anyway, on with the post!

Skeletons


To make sure we have a good starting point, and you can’t tell me that “scanning the clustered index is bad”, let’s create an index:

CREATE INDEX p
ON dbo.Posts 
    (OwnerUserId, Score DESC)
INCLUDE 
    (PostTypeId)
WHERE PostTypeId IN (1, 2);

Now let’s take a look at this query, and what gets weird with it.

WITH top_questions AS
(
    SELECT
        p.OwnerUserId,
        QuestionScore = 
            p.Score,
        tq = 
            ROW_NUMBER() OVER
            (
                PARTITION BY
            	    p.OwnerUserId
            	ORDER BY p.Score DESC
            )
    FROM dbo.Posts AS p
    WHERE p.PostTypeId = 1
    AND   p.Score > 1
),
     top_answers AS
(
    SELECT
        p.OwnerUserId,
        AnswerScore = 
            p.Score,
        ta = 
            ROW_NUMBER() OVER
            (
                PARTITION BY
            	    p.OwnerUserId
            	ORDER BY p.Score DESC
            )
    FROM dbo.Posts AS p
    WHERE p.PostTypeId = 2
    AND   p.Score > 1
)
SELECT TOP (1000)
    tq.OwnerUserId,
    tq.QuestionScore,
    ta.AnswerScore
FROM top_questions AS tq
JOIN top_answers AS ta
    ON  tq.OwnerUserId = ta.OwnerUserId
    AND tq.tq = ta.ta
ORDER BY
    tq.QuestionScore DESC,
    ta.AnswerScore DESC;

The non-SARGable portion is, of course, generating and joining on the row_number function. Since it’s an expression that gets calculated at runtime, we have to do quite a bit of work to execute this query.

Community Board


The query plan for this is all over the place, and also bad. Parallel merge joins were a mistake.

SQL Server Query Plan
planetary

The portions of the query plan that are particularly interesting — again, if you’re into this sort of thing — is that there are four Repartition Streams operators, and all of them spill. Like I said above, this is the sort of thing you open yourself up to when you write queries like this.

In all, the query runs for about 50 seconds. This can be avoided by hinting a hash join, of course, for reasons explained here.

But good luck figuring out why this thing runs for 50 seconds looking at a cached, or estimated execution plan, which doesn’t show you spills or operator times.

Breakup


One way to avoid the situation is to materialize the results of each CTE in a #temp table, and join those together.

WITH top_questions AS
(
    SELECT
        p.OwnerUserId,
        QuestionScore = 
            p.Score,
        tq = 
            ROW_NUMBER() OVER
            (
                PARTITION BY
            	    p.OwnerUserId
            	ORDER BY p.Score DESC
            )
    FROM dbo.Posts AS p
    WHERE p.PostTypeId = 1
    AND   p.Score > 1
)
SELECT
    *
INTO #top_questions
FROM top_questions;

WITH top_answers AS
(
    SELECT
        p.OwnerUserId,
        AnswerScore = 
            p.Score,
        ta = 
            ROW_NUMBER() OVER
            (
                PARTITION BY
            	    p.OwnerUserId
            	ORDER BY p.Score DESC
            )
    FROM dbo.Posts AS p
    WHERE p.PostTypeId = 2
    AND   p.Score > 1
)
SELECT
    *
INTO #top_answers
FROM top_answers;

SELECT TOP (1000)
    tq.OwnerUserId,
    tq.QuestionScore,
    ta.AnswerScore
FROM #top_questions AS tq
JOIN #top_answers AS ta
    ON tq.OwnerUserId = ta.OwnerUserId
    AND tq.tq = ta.ta
ORDER BY
    tq.QuestionScore DESC,
    ta.AnswerScore DESC;

Breakdown


The end result takes about 10 seconds, and has no exchange spills.

SQL Server Query Plan
fully

Infinito


For completeness, hinting the query with a hash join results in just about the same execution time as the temp table rewrite at 10 seconds. There are also very strong benefits to using Batch Mode. The query as originally written, and with no hints, finishes in about two seconds with no exchange spills, and I absolutely love that.

In tomorrow’s post, we’ll look at how we can sometimes adjust index key column order to solve SARGability issues.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Local Variables Get Bad Cardinality Estimates In Cursors In SQL Server, Too

Work Off


While helping a client out with weird performance issues, we isolated part of the code that was producing a whole bunch of bad plans.

At the intersection of bad ideas, there was a cursor looping over a table gathering some data points with a local variable in the where clause.

For more background on that, check out these posts:

One might expect the query inside the cursor to be subject to some special rules, but alas, nothing good comes of it.

Eau de Sample


To repro a little bit, we need to create a certainly suboptimal index.

CREATE INDEX p ON dbo.Posts
    (OwnerUserId);

If you have a lot of single key column indexes, you’re probably doing indexing wrong.

Full demo code is at the end because it’s a bit long, but the end result is five query plans that all share the same bad estimate based off the density vector.

The worst of them looks like this:

SQL Server Query Plan
Occurling

And the best of them looks like this:

SQL Server Query Plan
Gruntled

Over and Over


If you’re getting bad guesses like this over and over again in any loop-driven code, local variables might just be to blame.

That guess of 10 rows of course comes from  this calculation:

SELECT 
    density = 
        (
            1 / 
            CONVERT
            (
                float, 
                COUNT(DISTINCT p.OwnerUserId)
            )
        ) *
        COUNT_BIG(*)
FROM Posts AS p

Which, with a little bit of rounding, gets us to the estimate we see in the query plan:

SQL Server Query Plan
hectic

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Demo Code


SET STATISTICS XML OFF;
SET NOCOUNT ON;

DECLARE 
    @CurrentId int;

DROP TABLE IF EXISTS #UserIds; 

CREATE TABLE #UserIds 
(
    UserId int PRIMARY KEY CLUSTERED
);

INSERT
    #UserIds WITH(TABLOCK)
(
    UserId
)
SELECT
    u.Id
FROM dbo.Users AS u
WHERE u.Reputation > 850000
ORDER BY u.Reputation DESC;


DECLARE counter CURSOR 
    LOCAL STATIC
FOR
SELECT 
    UserId 
FROM #UserIds;
 
OPEN counter;

FETCH NEXT FROM counter 
    INTO @CurrentId;

WHILE @@FETCH_STATUS = 0
BEGIN 
   
    SET STATISTICS XML ON;
    
    SELECT
        p.PostTypeId,
        c = COUNT_BIG(*)
    FROM dbo.Posts AS p
    WHERE p.OwnerUserId = @CurrentId
    GROUP BY p.PostTypeId
    ORDER BY c DESC;
    
    SET STATISTICS XML OFF;
 
FETCH NEXT FROM counter 
    INTO @CurrentId;
END;

CLOSE counter;
DEALLOCATE counter;

 

Why Expressions Are Better Than Local Variables For Performance In SQL Server Queries

Often Enough


I see people writing stored procedures that use local variables either because they read that all functions in a where clause are dangerous, or because they need to perform a calculation on one of the parameters that get passed in.

In nearly all cases, you’re better off… not doing that.

Here are a couple examples of why.

Hard Yes


Passing in a function like GETDATE, and even passing it to another function like DATEADD, the optimizer can interpret them to their current values and make a pretty decent guess based on them.

SQL Server Query Plan
goo

This is a Good Enough™ guess.

Hard Pass


Once you assign that function to a value, everything gets awkward.

SQL Server Query Plan
bang bang

That’s a Pretty Bad™ guess, but not the end of the world. What’s bad is that the guess never changes even if you change the span of time.

Look what happens if we just add one day instead of one year.

SQL Server Query Plan
one day at a time

We get the exact same guess as before — 821,584 rows. Bummer.

Storied Tradition


It’s nearly the same thing in stored procedures. What I see more often is people there is people passing in one parameter for a start date, and then using a local variable to figure out an end date.

CREATE OR ALTER PROCEDURE dbo.dangit_bobby
(
    @start_date datetime
)
AS
BEGIN
SET NOCOUNT, XACT_ABORT ON;

DECLARE
    @later datetime = DATEADD(DAY, 1, GETDATE());

    SELECT
        records = 
            COUNT_BIG(*)
    FROM dbo.express_yourself AS ey
    WHERE ey.some_date >= @start_date
    AND   ey.some_date <  @later;

END;
GO

EXEC dbo.dangit_bobby 
    @start_date = '2021-05-19';
GO
SQL Server Query Plan
FAR OFF DUDE

Advanced Calculus


Let’s change how we use the parameter, and put it into some date math in the where clause instead.

CREATE OR ALTER PROCEDURE dbo.dangit_bobby
(
    @start_date datetime
)
AS
BEGIN
SET NOCOUNT, XACT_ABORT ON;

    SELECT
        records = 
            COUNT_BIG(*)
    FROM dbo.express_yourself AS ey
    WHERE ey.some_date >= @start_date
    AND   ey.some_date <  DATEADD(DAY, 1, @start_date);

END;
GO 
EXEC dbo.dangit_bobby 
    @start_date = '2021-05-19';
GO

We get a Much Closer™ estimate. What a lovely day.

SQL Server Query Plan
go for both

Hardest Part


It’s often tempting to take shortcuts like this, especially if you need to reuse the same calculation multiple times.

Now, look, if you stare closely enough at these plans, you’ll notice that they all finish quickly enough. That’s because I have a small table with a wonderful index on the column I care about.

The point here is to show you how bad estimates can turn up even in ideal circumstances when you do things The Wrong Way™.

As long as you’re not wrapping columns in functions like this, you’re probably okay.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Things SQL Server vNext Should Address: Table Variable Modification Performance

Canard


People still tell me things like “I only put 100 rows in table variables”, and think that’s the only consideration for their use.

There are definitely times when table variables can be better, but 100 rows is meaningless.

Even if you put one row in a table variable it can fudge up performance because SQL Server doesn’t know what’s in your table variable. That’s still true in SQL Server 2019, even if the optimizer knows how many rows are in your table variable.

The problem that you can run into, even with just getting 100 rows into a table variable, is that it might take a lot of work to get those 100 rows.

Bernard


I’ve blogged before about workarounds for this problem, but the issue remains that inserts, updates, and deletes against table variables aren’t naturally allowed to go parallel.

The reason why is a bit of a mystery to me, since table variables are all backed by temp tables anyway. If you run this code locally, you’ll see what I mean:

SET NOCOUNT ON;
SET STATISTICS IO ON;
DECLARE @t table(id int);
SELECT * FROM @t AS t;
SET STATISTICS IO OFF;

Over in the messages tab you’ll see something like this:

Table '#B7A53B3E'. Scan count 1, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.

Now, look, I’m not asking for update or delete portions of the query plan to go parallel, but it might be nice if other child operators could go parallel. That’s how things go with regular tables and #temp tables. It would be nice if inserts could go parallel, but hey

Ardbeg


The problem this solves is one that I see often, usually from vendor code where the choice of which temporary object to use was dependent on individual developer preference, or they fell for the meme that table variables are “in memory” or something. Maybe the choice was immaterial at first with low data volume, and over time performance slowly degraded.

If I’m allowed to change things, it’s easy enough to replace @table variables with #temp tables, or use a workaround like from the above linked post about them to improve performance. But when I’m not, clients are often left begging vendors to make changes, who aren’t receptive.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.