Extra Documentation for KB Article 4462481

SQL Server KB 4462481 is a bit light on details:

Assume that you use Microsoft SQL Server data warehousing. When you run parallel insert queries to insert data into a clustered columnstore index, the query operation freezes.

To quote one of the Top Men of SQL Server product support:

Hope this helps!

How to freeze your query


Consider a query execution that meets all of the following criteria:

  1. A parallel INSERT INTO… SELECT into a columnstore table is performed
  2. The SELECT part of the query contains a batch mode hash join
  3. The query can’t immediate get a memory grant, hits the 25 second memory grant timeout and executes with required memory

The query may appear to get stuck. It no longer uses CPU and the parallel worker threads appear to wait on each other. Let’s go through an example on SQL Server 2017 RTM. First create all of the following tables:

CREATE PARTITION FUNCTION PART_FUN_REP_SGM(BIGINT)AS RANGE LEFTFOR VALUES (1, 2, 3);CREATE PARTITION SCHEME PART_SCHEME_REP_SGMAS PARTITION PART_FUN_REP_SGMALL TO ( [PRIMARY] );DROP TABLE IF EXISTS dbo.SOURCE_DATA_FOR_CCI;CREATE TABLE dbo.SOURCE_DATA_FOR_CCI (PART_KEY BIGINT NOT NULL,ID BIGINT NOT NULL,FAKE_COLUMN VARCHAR(4000) NULL) ON PART_SCHEME_REP_SGM (PART_KEY);INSERT INTO dbo.SOURCE_DATA_FOR_CCI WITH (TABLOCK)SELECT TOP (1048576) 1, ROW_NUMBER()OVER (ORDER BY (SELECT NULL)) % 16000, NULLFROM master..spt_values t1CROSS JOIN master..spt_values t2OPTION (MAXDOP 1);INSERT INTO dbo.SOURCE_DATA_FOR_CCI WITH (TABLOCK)SELECT TOP (1048576) 2, ROW_NUMBER()OVER (ORDER BY (SELECT NULL)) % 16000, NULLFROM master..spt_values t1CROSS JOIN master..spt_values t2OPTION (MAXDOP 1);DROP TABLE IF EXISTS dbo.LARGE_BATCH_MODE_MEMORY_REQUEST;CREATE TABLE dbo.LARGE_BATCH_MODE_MEMORY_REQUEST (ID VARCHAR(4000),INDEX CCI CLUSTERED COLUMNSTORE);INSERT INTO dbo.LARGE_BATCH_MODE_MEMORY_REQUESTWITH (TABLOCK)SELECT TOP (2 * 1048576) CAST(ROW_NUMBER()OVER (ORDER BY (SELECT NULL)) AS VARCHAR(8000))+ 'DYEL'FROM master..spt_values t1CROSS JOIN master..spt_values t2;DROP TABLE IF EXISTS dbo.CCI_SLOW_TO_COMPRESS_TARGET_1;CREATE TABLE dbo.CCI_SLOW_TO_COMPRESS_TARGET_1 (ID BIGINT NULL,INDEX CCI CLUSTERED COLUMNSTORE);DROP TABLE IF EXISTS dbo.CCI_SLOW_TO_COMPRESS_TARGET_2;CREATE TABLE dbo.CCI_SLOW_TO_COMPRESS_TARGET_2 (ID BIGINT NULL,INDEX CCI CLUSTERED COLUMNSTORE);

Consider the following query:

INSERT INTO dbo.CCI_SLOW_TO_COMPRESS_TARGET_1WITH (TABLOCK)SELECT LEFT(t1.ID, 1)FROM LARGE_BATCH_MODE_MEMORY_REQUEST t1INNER JOIN LARGE_BATCH_MODE_MEMORY_REQUEST t2ON t1.ID = t2.IDUNION ALLSELECT IDFROM dbo.SOURCE_DATA_FOR_CCIOPTION (MAXDOP 2);

Here’s what the plan looks like:a37_planI set Max Server Memory to 8000 MB and executed two queries with a maximum allowed memory grant of 60% via Resource Governor. The first query finished in about 40 seconds. The second query hasn’t finished after 30 minutes. During those 30 minutes the second query has only used 1184 ms of CPU time. The COLUMNSTORE_BUILD_THROTTLE wait type shows up in sys.dm_os_waiting_tasks:a37_waitExecution context id 2 is waiting on execution context id 1 with a wait type of HTDELETE. Execution context id 1 has a wait type of COLUMNSTORE_BUILD_THROTTLE. I don’t think that this wait is supposed to show up for parallel inserts. It can show up by design when creating or rebuilding a columnstore index in parallel:

When a columnstore index is being built, the memory grant estimate is based on a segment size of one million rows. The first segment of the index is built using a single thread so the real, required per-thread memory grant is found. Then the memory grants are given per thread and the other segments are built multi-threaded. Although all the threads for the operation are allocated at the start of the build, only one thread is used for the first segment and all the others incur a COLUMNSTORE_BUILD_THROTTLE wait.

The important point is that a wait type of COLUMNSTORE_BUILD_THROTTLE means that worker thread is waiting on another thread to do something. But so does a wait time of HTDELETE. There are only two worker threads and both of them are waiting on another thread to do something. As a result, the query makes no progress. I’ve let similar queries run for 72 hours before giving up and killing them.

How to unfreeze your query


Upgrading to SQL Server 2017 CU11 or higher fixes the issue. With that said, if you’re seeing this issue that means that you have columnstore insert queries waiting at least 25 seconds for a memory grant. That should be considered to be a problem even if the queries didn’t get stuck. If you find yourself in this situation, I strongly consider increasing memory available to SQL Server or limiting memory grants via Resource Governor or some other method.

Final Thoughts


If you have parallel insert queries into CCis that appear to be stuck with no CPU usage and long COLUMNSTORE_BUILD_THROTTLE wait times check your version of SQL Server. Upgrading to at least CU11 will resolve the immediate issue, but consider a more complete fix will include addressing the RESOURCE_SEMAPHORE waits that must be present on your system.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.