Software Vendor Mistakes With SQL Server: Using Date Functions On Columns Or Local Variables

Reusable Logic


In most programming languages, it’s quite sensible to create a variable or parameter, use some predefined logic to assign it a value, and then keep reusing it to prevent having to execute the same code over and over again.

But those languages are all procedural, and have a bit of a different set of rules and whatnot. In SQL Server, there are certainly somewhat procedural elements.

  • Functions
  • Control-flow logic
  • Cursors
  • While loops
  • Maybe the inner side of Nested Loops joins

You may be able to name some more, if you really get to thinking about it. That should be a common enough list, though.

Reusable Problems


SQL Server has a wonderful optimizer. It’s capable of many things, but it also has some problems.

Many of those problems exist today for “backwards compatibility”. In other words: play legacy games, win legacy performance.

Lots of people have found “workarounds” that rely on exploiting product behavior, and taking that away or changing it would result in… something else.

That’s why so many changes (improvements?) are hidden behind trace flags, compatibility levels, hints, batch mode, and other “fences” that you have to specifically hop to see if the grass is greener.

One of those things is the use of local variables. The linked post details how lousy those can be.

In this post, I show how you’re better off using the date math expressions instead.

And in this post, I show how you’re better off doing date math on parameters rather than on columns.

Let’s bring all that together!

Reusable Solutions


In SQL Server, context is everything. By context, I mean the way different methods of query execution are able to accept arguments from others.

You’ll sometimes hear this referred to as scope, too. Usually people will say inner context/scope and outer context/scope, or something similar.

What that means is something like this, if we’re talking about stored procedures:

CREATE PROCEDURE
    dbo.InnerContext
(
    @StartDate datetime,
    @EndDate datetime
)
AS
BEGIN
SET NOCOUNT, XACT_ABORT ON;

    SELECT
        C.PostId,
    	Score = 
    	    SUM(C.Score)
    FROM dbo.Comments AS C
    JOIN dbo.Votes AS V
        ON C.PostId = V.PostId
    WHERE C.CreationDate >= @StartDate
    AND   c.CreationDate <  @EndDate
    GROUP BY c.PostId;

END;
GO 

CREATE PROCEDURE
    dbo.OuterContext
(
    @StartDate datetime,
    @EndDate datetime
)
AS
BEGIN
SET NOCOUNT, XACT_ABORT ON;

IF @StartDate IS NULL
BEGIN
   SELECT
       @StartDate = GETDATE();
END;

IF @EndDate IS NULL
BEGIN
   SELECT
       @EndDate = DATEADD(DAY, 30, GETDATE());
END;

EXEC dbo.InnerContext
    @StartDate = @StartDate,
    @EndDate = @EndDate;

END;

If you’re okay using dynamic SQL, and really, you should be because it’s awesome when you’re not bad at it, you can do something like this:

CREATE PROCEDURE
    dbo.OuterContext
(
    @StartDate datetime,
    @EndDate datetime
)
AS
BEGIN
SET NOCOUNT, XACT_ABORT ON;

IF @StartDate IS NULL
BEGIN
   SELECT
       @StartDate = GETDATE();
END;

IF @EndDate IS NULL
BEGIN
   SELECT
       @EndDate = DATEADD(DAY, 30, GETDATE());
END;

DECLARE 
    @sql nvarchar(MAX) = N'
    /*dbo.OuterContext*/
    SELECT
        C.PostId,
    	Score = 
    	    SUM(C.Score)
    FROM dbo.Comments AS C
    JOIN dbo.Votes AS V
        ON C.PostId = V.PostId
    WHERE C.CreationDate >= @StartDate
    AND   c.CreationDate <  @EndDate
    GROUP BY c.PostId;		
    ';

EXEC sys.sp_executesql
    @sql,
  N'@StartDate datetime, 
    @EndDate datetime',
    @StartDate,
    @EndDate;

END;

Which will achieve the same thing.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Software Vendor Mistakes With SQL Server: Using Functions In Join Or Where Clauses

Easy Thinking


To start with, let’s classify functions into two varietals:

  • Ones built-in to SQL Server
  • Ones that developers write

Of those, built-in functions are pretty much harmless when they’re in the select list. Classifying things a bit further for the ones user writes, we have:

Out of the four up there, only the last one doesn’t have a reference link. Why? Because I don’t write C# — I’m not that smart — if you need someone smart about that, go read my friend Josh’s blog. He’s quite capable.

If you’re too lazy to go read the three reference links:

  • Scalar User Defined Functions generally wreck performance
  • Multi-Statement Table Valued Functions stand a high chance of generally wrecking performance
  • Inline Table Valued Functions are okay as long as you don’t do anything awful in them

Smart Thinking


The important thing to understand is that using any of these functions, let’s call it below the belt, can really mess things up for query performance in new and profound ways compared to what they can do in just the select list.

To be more specific for you, dear developer reader, let’s frame below the belt as anything underneath the from clause. Things here get particularly troublesome, because much of the activity here is considered relational, whereas stuff up above is mostly just informational.

Why is the relational stuff a much bigger deal than the informational stuff? Because that’s where all the math happens in a query plan, and SQL Server’s optimizer decides on all sorts of things at compile-time, like:

  • Which indexes to use
  • Join order
  • Join types
  • Memory grants
  • Parallelism
  • Seeks and Scans
  • Aggregate types
  • Much, much more!

Those things are partially based on how well it’s able to estimate the number of rows that join and where conditions will produce.

Sticking functions in the way of those join and where conditions is a bit like putting a blindfold on SQL Server’s optimization and cardinality estimation process and asking it to swing a bowling ball sharp saber at a piece of confetti at 100 paces.

In other words, don’t complain when your query plans suck and your queries run slow. You’re doing the hobbling, you dirty bird.

Future Thinking


If you want your customers, users, or whatever you want to call them, to be reliably happy in the future, even as their database sizes grow beyond your wildest imagination, and your application gets used in ways that would make Caligula blush, you need to start by obeying the first law of database physics: thou shalt not get in the way of the optimizer.

Going back to a couple other laws of  database physics that cannot be ignored:

  • Anything that makes your job easier makes the optimizer’s job harder
  • Store data the way you query it, and query data the way you store it

If it makes you feel better, stick a few thous and shalls or shalt nots or whences or whenceforths in there. It might make you happier.

It will make your customers, users, or whatever you want to call them happier, if you listen to me.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Software Vendor Mistakes With SQL Server: Writing Functions That Already Exist

Reinventing The Sequel


In my time consulting, I’ve seen dbo-prefixed functions that have internal-function names on a number of occasions, acting like a wrapper to the existing functions that come with SQL Server.

While those are fairly ridiculous, and easy to point out the ridiculousness of, there are some other functions that I see pretty often that are maybe a little less obvious. I’m not talking about functions that have been added more recently that might not be supported by all customer installations.

All Supported Versions


One of the more popular versions of this malady that I see looks something like this:

CREATE OR ALTER FUNCTION 
    dbo.ufn_FmtDt
(
    @d datetime
)
RETURNS char(11)
AS
BEGIN
    RETURN CONVERT(char(11), @d)
END;
GO

There are two big reasons this is bad:

  • SQL Server doesn’t store dates as strings, at all, ever
  • You can generate the same internal representation by converting to a date

You gain nothing by encapsulating code like this, at least not since SQL Server 2000 or so.

Do It Again


Using some lessons we learned from earlier in this series, we can keep reasonable up to date with SQL Server’s progress by rewriting the function to something like this:

CREATE OR ALTER FUNCTION 
    dbo.ifn_FmtDt
(
    @d datetime
)
RETURNS TABLE
AS
RETURN 
   
   SELECT d = 
       CONVERT(date, @d);
GO

And now when you need to chop the times off all those dates in your reporting queries, they won’t be artificially held back.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Software Vendor Mistakes With SQL Server: Not Using Inline Table Valued Functions

Swiss Army


In the last couple posts, I’ve talked about the problems caused by two types of T-SQL functions: Scalar User Defined Functions, and Multi-Statement Table Valued Functions.

I’ve also hinted around about a third type of function, called an Inline Table Valued Function. These are different because there are no built-in issues with them as a feature.

They’re only as bad as the query you put in them, and often rewriting T-SQL Scalar User Defined Functions, and Multi-Statement Table Valued Functions as Inline Table Valued Functions can fix a lot of query performance issues.

In the videos below, I’m going to show you how to rewrite T-SQL Scalar User Defined Functions, and a really cool thing they can do on top of just replacing the bad kinds of functions.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Software Vendor Mistakes With SQL Server: Multi Statement Table Valued Functions

Available Units


In T-SQL, there are two kinds of “table valued functions”. One of them returns a select as a derived table (Inline Table Valued Function), and the other returns a @table variable as a result (Multi-Statement Table Valued Functions).

I’ve written before about the problems you’ll run into with T-SQL’s @table variables, so I’m not going to go back over that here.

I will take this time to point out that they don’t behave any better when used in a function than they do when used independent of a function. If you need to stop here to watch that video, go ahead. I’ll be waiting for you.

To learn more about how T-SQL’s Multi-Statement Table Valued Functions can mess up SQL Server query performance, check out the video below. And as always, if this is the kind of thing you love learning about, hit the link below to get 75% off all of my training material.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Software Vendor Mistakes With SQL Server: Scalar User Defined Functions

The Very Worst Idea


Scalar UDFs answered a very good question: How do I ruin query performance so that two generations of consultants can make a living off of SQL Server?

In the videos below, which are part of my paid training, I’m going to show you how T-SQL Scalar UDFs ruin query performance, and one way of fixing them. If you’ve got lots of these little devils hanging around your codebase, you’ll wanna pay close attention, here to see how:

  • T-SQL Scalar UDFs force queries to run single threaded
  • T-SQL Scalar UDFs run once per row that they process
  • T-SQL Scalar UDFs hide all the work they actually do in query plans and other metrics

There’s a ton more available in the full paid training courses, so hit the link below to get 75% off the whole thing.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Software Vendor Mistakes With SQL Server: Misusing Common Table Expressions

Stinko


In this post, I’m gonna show you how stringing together a bunch of CTEs can cause performance problems with one of my paid training videos. If you like it, hit the link below to get 75% off the entire bundle.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Signs You Need More Memory In Your SQL Server

Guessotron


Most of these signs have to do with wait stats. One could venture out and say that if you have way less memory than data, you need more memory, but… If the server is sitting around bored, who cares?

If we’re going to spend money on memory, let’s make sure it’ll get used. When I’m talking to people with performance problems that memory would solve, here are some of the top reasons.

You’re In The Cloud Where Storage Sucks

Okay, okay, storage can totally suck other places, too. I’ve seen some JBOD setups that would make you cry, and some of them weren’t in the cloud. Where you need to differentiate a little bit here is that memory isn’t going to help slow writes directly. If you add a bunch more memory and free up some network bandwidth for writes by focusing the reads more from the buffer pool, it might.

Look, just avoid disk as much as possible and you’ll be happy.

You’re Using Column Store And/Or Batch Mode

Good column store compression can often rely on adequate memory, but you also need to account for the much larger memory grants that batch mode queries ask for. As more and more workloads move towards SQL Server 2019 and beyond, query memory needs are going to go up because Batch Mode On Row Store will become more common.

You’re Waiting On RESOURCE_SEMAPHORE A Lot

This wait shows up when a bunch of queries are contending for memory grants, but SQL Server has given out all it can. If you run into these a lot, it’s a pretty good sign you need more memory. Especially if you’ve already tuned queries and indexes a bunch, or you’re dealing with a vendor app where they refuse to fix anything.

Other things that might help? The MAX_GRANT_PERCENT hint or Resource Governor

You’re Waiting On RESOURCE_SEMAPHORE_QUERY_COMPILE A Lot

This is another “queue” wait, but it’s for query compilation rather than query execution. Having more memory can certainly help this quite a bit, but so can simplifying queries so that the amount of memory SQL Server has to throw at compiling them chills out a little. You can start by reconsidering those views nested 10 levels deep and the schema design that leads you to needing a 23 table join to construct one row.

You’re Waiting On PAGEIOLATCH_SH Or PAGEIOLATCH_EX A Lot

These waits show up when data pages your query needs aren’t already there. The more you see these, the more latency you’re adding to your workload by constantly shuffling out to disk to get them. Of course, there’s other stuff you can do, like clean up unused and overlapping indexes, compress your indexes, etc. But not everyone is comfortable with or able to do that.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Signs You Need Batch Mode To Make Your SQL Server Queries Faster

My Name Is


I speak with a lot of DBAs and developers who have either heard nothing about column store and batch mode, or they’ve only heard the bare minimum and aren’t sure where it can help them.

Here’s a short list of reasons I usually talk through with people.

Your Reporting Queries Hit Big Tables

The bigger your tables get, the more likely you are to benefit, especially if the queries are unpredictable in nature. If you let people write their own, or design their own reports, nonclustered column store can be a good replacement for nonclustered row store indexes that were created specifically for reporting queries.

In row store indexes, index key column order matters quite a bit. That’s not so much the case with column store. That makes them an ideal data source for queries, since they can scan and select from column independently.

Your Predicates Aren’t Always Very Selective

Picture the opposite of OLTP. Picture queries that collect large quantities of data and (usually) aggregate it down. Those are the ones that get along well with column store indexes and batch mode.

If most of your queries grab and pass around a couple thousand rows, you’re not likely to see a lot of benefit, here. You wanna target the ones with the big arrows in query plans.

Your Main Waits Are I/O And CPU

If you have a bunch of waits on blocking or something, this isn’t going to be your solve.

When your main waits are CPU, it could indicate that queries are overall CPU-bound. Batch mode is useful here, because for those “big” queries, you’re passing millions of rows around and making SQL Server send each one to CPU registers. Under batch mode, you can send up to 900 at a time. Just not in Standard Edition.

When your main waits are on I/O — reading pages from disk specifically — column store can be useful because of the compression they offer. It’s easy to visualize reading more compact structures being faster, especially when you throw in segment and column elimination.

Your Query Plans Have Some Bad Choices In Them

SQL Server 2019 (Enterprise Edition) introduced Batch Mode On Row Store, which let the optimizer heuristically select queries for Batch Mode execution. With that, you get some cool unlocks that you used to have to trick the optimizer into before 2019, like adaptive joins, memory grant feedback, etc.

While those things don’t solve every single performance issue, they can certainly help by letting SQL Server be a little more flexible with plan choices and attributes.

The Optimizer Keeps Choosing Cockamamie Parallel Sort Merge Plans That Make No Sense And Force You To Use Hash Join Hints All The Time

🤦‍♂️

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Performance Issues With NOT EXISTS Queries In SQL Server

Born Day


I’m going to start this post off sort of like Friday’s post:

Look, I really like EXISTS and NOT EXISTS. I do. They solve a lot of problems.

This post isn’t a criticism of them at all, nor do I want you to stop using them. I would encourage you to use them more, probably.

If you keep your head about you, you’ll do just fine.

The difference here is specific to NOT EXISTS, though, and it has to do with join reordering.

Or rather, the lack of join reordering.

Let’s get after it.

Happy Kids


When we write our query like so, things are fine.

The Users and Badges tables are relatively small, and a parallel hash join query makes short work of the situation.

SELECT
    c = COUNT_BIG(*)
FROM dbo.Users AS u
WHERE u.Reputation > 1000
AND   
(
    NOT EXISTS
        (
            SELECT
                1/0
            FROM dbo.Badges AS b
            WHERE b.UserId = u.Id
        )
    AND  NOT EXISTS
             (
                 SELECT
                     1/0
                 FROM dbo.Comments AS c
                 WHERE c.UserId = u.Id
             )      
);

This query finishes in a shart under a second.

SQL Server Query Plan
promised

Notice that since no rows pass the first join, the Comments table is left unscathed.

BUT THEN™

Bad Times


If we write the query like this, the optimizer leaves things alone, and we get a much worse-performing query.

SELECT
    c = COUNT_BIG(*)
FROM dbo.Users AS u
WHERE u.Reputation > 1000
AND   
(
    NOT EXISTS
        (
            SELECT
                1/0
            FROM dbo.Comments AS c
            WHERE c.UserId = u.Id
        )   
    AND NOT EXISTS
            (
                SELECT
                    1/0
                FROM dbo.Badges AS b
                WHERE b.UserId = u.Id
            )   
);

This one clocks in around 6 seconds, and complains of an excessive memory grant.

SQL Server Query Plan
hello xml!

The big time suck here is spent hitting the Comments table, which is significantly larger than the Badges table.

Totally Wired


The order that you write joins and where clause elements in generally doesn’t matter much, but in the case of NOT EXISTS, it can make a huge difference.

I realize that there are only two NOT EXISTS clauses in these examples, and that hardly makes for a compelling “always” statement. But I did a lot of experimenting with more tables involved, and it really doesn’t seem like the optimizer does any reordering of anti-semi joins.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.