Category Archives: SQL Server 2017

Building Perfect Development Lab With SQL Server in Containers

I am heavily relying on virtualization in my home lab. I prefer to build separate SQL Server environments for different projects keeping them clean and isolated from each other. I can work with multiple SQL Server versions side by side without worrying how some components in my setup affect others.

This setup is also great for presentations and trainings I deliver. I just need to reset VM to the “starting” checkpoint and everything would be in the known and healthy state. Believe me, this saves you from a lot of pain and embarrassment when you are staying in front of people. Troubleshooting of non-working demos is not fun when you are on the stage.

There is the problem with this setup, however. It is much easier and faster to checkpoint existing VM rather than provision the new one for the new setup. In the end, unless you have self-discipline I lack, you may end up with something like shown in Figure 1 below. This is the screen shot of one of my VMs checkpoints.

01. Mess in VM configuration

On the surface, this setup may provide you required isolation between environments. However, there is the problem with maintainability. Consider how much efforts it would take to apply new SP or CU to all of your checkpoints. In the end, you will be in the situation when you always want to rebuild your lab and you never have time to do it properly.  Thus, I decided to look at other options and evaluate Docker and SQL Server in containers when I recently had to setup the new travel laptop. I found that containers are better suited to my needs comparing to VM-based virtualization and I want to share my experience with you.

As the disclaimer, I am going to focus on the infrastructure side and maintainability and not going to discuss underlying technologies and containerization. You can read about it here if you want.

While both, containers and VMs rely on virtualization, they are conceptually different on the infrastructure side. Figure 2 illustrates the general virtual machine. .

02. VM and Checkpoints

Each virtual machine may consist of multiple checkpoints/snapshots, which fork it to the multiple states. However, only one state may be active at any giving point of time. Neither could you run multiple instances of the single VM simultaneously – you’ll need to clone them creating separate VMs for that.

Containers, on the other hand, operate differently as shown in Figure 3. The environment consists of multiple images, which conceptually similar to checkpoints/snapshots. However, every image may have multiple containers (running or stopped), which are the active instances/environments. It is worth noting that Docker API allows you to checkpoint a container, but I see little benefits of doing that.

03. Images and Containers

That conceptual difference force you to use different methods to solve problems. Think about the situation when you need to setup individual development environments for each member of the team. With VM-based virtualization, you will have to create separate VMs. With containers, you need to create the base image and start multiple containers from this image.

Both approaches have pros and cons. The containers will use less disk space than VMs – you will have only one image on disk. They are also easier to maintain and refresh – you need to build the single image. VMs, on the other hand, provide true separation. If one developer is working on the complex feature in its own branch, there is no dependency on the shared image – you can refresh other VMs keeping his one intact.

Nothing, obviously, prevents you from having multiple images achieving the same degree of separation with containers. You just need to manage them a bit differently. I, personally, believe that containers will be easier to maintain and lead to the cleaner setup in the long run.

Anyway, let’s do the setup. Mind you, I do not consider myself as the Docker expert – I am sure that there are the better ways to achieve the goals. Nevertheless, I want to share several tips and tricks I did in my environment.

I am going to use SQL Server 2017 in Windows containers setting everything up on Windows 10 machine (you need to have Anniversary Update Build 1607 or later) . I am sure that you can use similar approach in different OS and/or with SQL Server in Linux.

As the first step, you need to install Docker. You can download it from here – you can read the instruction – it is straightforward. Just make sure to choose Windows Containers during setup. After Docker setup is done, you may test your setup using Hello World image as outlined here. I did not do it – after all, we are about to run SQL Server anyway.

As the next step, let’s pull SQL Server 2017 Developer Windows image. Make sure that Docker is running, open PowerShell and run the following command.

docker pull microsoft/mssql-server-windows-developer:latest

It will take some time to download but then you are pretty much done – you have SQL Server “installed”. The following command will display you the list of images you have installed on your machine:

docker image ls

Figure 4 shows the output you would see.

04. SQL Server Image

As the side note, you can remove images using one of the following commands (obviously do not run it now). In some cases, for example if you had an error when you built the image, you may end up with the images that do not have names and you’ll need to use ID to clean them up.

docker image rm <name>

docker image rm <image_id>

Now, let’s create the container with SQL Server 2017 by running the following command. This will create the container with name sqlserver.

docker run --name=sqlserver -d -p 1433:1433 -e sa_password=Pas!w0rd12345 -e ACCEPT_EULA=Y microsoft/mssql-server-windows-developer

The -p switch sets up the port mapping between local instance and the container. In our case, both ports are the same and SQL Server in container will work (and look like) as if you  have SQL Server default instance installed locally. All requests host received on port 1433 would be redirected to the container. Obviously, you need to change the port mapping if you already have SQL Server installed on your box and/or if you have multiple containers running. Also, do not forget to open the port in the firewall if you accessing the image remotely.

Now, you can connect to SQL Server by using localhost or localhost,1433 as the server name and sa/Pas!w0rd12345 for the authentication.

Alternatively, you can find IP address of the container. First, let’s run the following command to see list of active containers in the system

docker container ls -all

Figure 5 shows the output – as you see, we have just one running container in the system. You can use either name or container_id during container management.

05. Running Containers

Next, let’s run the command below to get container information.

docker inspect sqlserver

Figure 6 illustrates the partial output of the statement – the configuration is presented in JSON format. You can look at IPAddress element there.

06. Inspect Container Configuration

You can also run the following command to parse the output of docker inspect obtaining IP address without any additional information:

docker inspect --format='{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' sqlserver

In my environment, I prefer to connect to containers using IP Addresses rather than port mapping. By default, Docker assigns random IP Address every time you start container, which is inconvenient. You can workaround it by assigning specific IP to the container when you start it.

Let’s locate the subnet used by the Docker. You can do it by using the commands below. In Windows environment and with the current docker build, you should look at nat network.

docker network list

docker network inspect nat

Figure 7 shows the output of the commands. You can pick any IP address from the available range.

07. Inspect Docker Networks

Now, let’s stop and remove our container as shown below before restarting it.

docker container stop sqlserver

docker container rm sqlserver

On the side note, here are a few other commands that you can use for container management

  • docker container start allows you to start stopped container.
  • docket container pause and docker container unpause allows you to suspend and resume container execution.

Now, we can create another container with the command below. In addition to IP Address, there are a couple of other useful parameters. -m indicates how much memory you want to allocate to container. –cpu-count shows number of cores to allocate.

Docker run --name internals -d --ip 192.168.113.2 -p 1433:1433 -m 2048m --cpu-count 2 -e sa_password=s@1 -e ACCEPT_EULA=Y microsoft/mssql-server-windows-developer

As I already mentioned, you can start multiple containers from the single image. Just use different IP, Name and port mapping for each of them.

Building the Image

Obviously, in the real life, we do not work with vanilla SQL Server installation. We need to customize it by changing SQL Server settings and logins, creating and/or restoring the databases and do other actions. There are a couple of ways how you can do that.

The first approach is customizing existing container manually and creating the image from it using docker container commit command. After that, you can start the new containers from created image the same way as we already discussed. We will cover a couple ways to move data to and from containers later.

There is the better way, however. You can automate this process by utilizing docker build command. The process is very simple. You just need to define DockerFile, which contains the reference to the main image and specifies the build actions. You can copy scripts and database backups into the image, run SQLCMD, BCP and PowerShell scripts there – you, pretty much, have the full control. Internally, Docker runs every command inside deployment containers (creating and destroying them during the process) saving the final one as the target image.

Let’s look at the simple DockerFile below. As you can see, it specifies our SQL Server 2017 image as the base for the build. Next, it copies files from /Scripts folder into the image/deployment container and run Build.ps1 PowerShell script there.

# Building SQL Server Internals Precon V3 Environment
# Dmitri Korotkevitch
# http://aboutsqlserver.com
# dk@aboutsqlserver.com

FROM microsoft/mssql-server-windows-developer:latest

MAINTAINER Dmitri Korotkevitch "aboutsqlserver.com"

COPY Build.ps1 /

COPY Scripts/*.sql /

RUN PowerShell /Build.ps1

Below is the code from Build.ps1 script. As you can see, it creates a couple of folders and then runs the scripts using SQLCMD utility.

<# 
Initialization script for Docker image
#>

New-Item -path C:\ -name DB -ItemType directory
New-Item -path C:\DB -name Backups -ItemType directory
New-Item -path C:\ -name SQL -ItemType directory

Invoke-Sqlcmd -InputFile 00.ServerSetting.sql `
-Database master `
-Server . `

Invoke-Sqlcmd -InputFile 01.DBCreation.sql `
-Database master `
-Server . `

Invoke-Sqlcmd -InputFile 02.ObjectCreation.sql `
-Database master `
-Server . `

Remove-Item /*.sql

First script in my example changed several server settings.

use master
go
EXEC sys.sp_configure N'show advanced options', N'1' RECONFIGURE WITH OVERRIDE
GO
EXEC sys.sp_configure N'max degree of parallelism', N'1'
GO
EXEC sys.sp_configure N'optimize for ad hoc workloads', N'0'
GO
EXEC sp_configure 'remote admin connections', 1 ;
go
EXEC sys.sp_configure N'backup compression default', N'1'
GO
sp_configure 'blocked process threshold', 5
GO
RECONFIGURE WITH OVERRIDE
GO
EXEC sys.sp_configure N'show advanced options', N'0' RECONFIGURE WITH OVERRIDE
GO
EXEC xp_instance_regwrite N'HKEY_LOCAL_MACHINE', N'Software\Microsoft\MSSQLServer\MSSQLServer', N'BackupDirectory', REG_SZ, N'C:\DB\Backup'
GO
EXEC xp_instance_regwrite N'HKEY_LOCAL_MACHINE', N'Software\Microsoft\MSSQLServer\MSSQLServer', N'DefaultData', REG_SZ, N'C:\DB'
GO
EXEC xp_instance_regwrite N'HKEY_LOCAL_MACHINE', N'Software\Microsoft\MSSQLServer\MSSQLServer', N'DefaultLog', REG_SZ, N'C:\DB'
GO

The second script creates an empty database.

use master
go

create database [SQLServerInternals]
on primary (name=N'SQLServerInternals', filename=N'C:\DB\SqlServerInternals.mdf', size=10MB, filegrowth = 10MB),
filegroup [FASTSTORAGE] (name=N'SqlServerInternals_FAST', filename=N'C:\DB\SqlServerInternals_FAST.ndf', size=100MB, filegrowth = 100MB), 
filegroup [FG2016] (name=N'SqlServerInternals_2016', filename=N'C:\DB\SqlServerInternals_2016.ndf', size=10MB, filegrowth = 10MB),
filegroup [FG2017] (name=N'SqlServerInternals_2017', filename=N'C:\DB\SqlServerInternals_2017.ndf', size=10MB, filegrowth = 10MB),
filegroup [FG2018] (name=N'SqlServerInternals_2018', filename=N'C:\DB\SqlServerInternals_2018.ndf', size=10MB, filegrowth = 10MB)
log on (name=N'SQLServerInternals_log', filename=N'C:\DB\SqlServerInternals.ldf', size=256MB, filegrowth = 256MB);

alter database [SQLServerInternals] set recovery simple;
alter database [SQLServerInternals] modify filegroup [FASTSTORAGE] default;
go

The third creates and populates the objects.

/****************************************************************************/
/*                      SQL Server Internals v3                             */
/*                        Training Materials                                */
/*                                                                          */
/*                 Written by Dmitri V. Korotkevitch                        */
/*                    http://aboutsqlserver.com                             */
/*                      dk@aboutsqlserver.com                               */
/****************************************************************************/
/*                      Objects Creation Script                             */
/****************************************************************************/

set nocount on
go

use SQLServerInternals
go

if exists(
    select * 
    from sys.tables t join sys.schemas s on t.schema_id = s.schema_id
    where s.name = 'dbo' and t.name = 'LargeRow'
)
    drop table dbo.LargeRow;
go

create table dbo.LargeRow
(
    ID int not null,
    IntField int not null, 
    CharField char(2000) not null,
)
go

create unique clustered index IDX_LargeRow_ID
on dbo.LargeRow(ID)
with (/*drop_existing = on, */data_compression = none)
go
...

Obviously, you have all flexibility here. For example, you can copy and restore database backup instead of recreating it with scripts.

You can run docker build command as shown below from the same folder where DockerFile is located. You may notice that I am using -m parameter – by default, Docker allocates very limited amount of resources to deployment containers during the build. You may consider increasing the memory for the container to avoid any potential issues if you run heavy scripts.

docker build -t internalsimage . -m 2g

There is the small caveat here – security. Docker runs the build using Windows Administrator/sysadmin user, which will be removed after process is completed. It should not be a problem for majority of the cases, unless some of your code changes security context with EXECUTE AS OWNER clause and/or relies on the proper database ownership. As the workaround, you can create another sysadmin SQL Login in the beginning of the build process and use it when you run the scripts or restoring the databases later.

The beauty of this setup is maintainability. Consider the situation when you want to refresh the environment after Microsoft published newer SQL Server image in the repo. The only thing you need to do is changing the base image reference in DockerFile and re-run the build process. All images will be recreated with minimum work involved.

You can use your own images as the base images during the build. Consider the example with individual development environments we discussed earlier. Let’s assume that we want to refresh the image with production data on the weekly basis and support the situation when we need to keep some old images to run containers for complex feature development.

In this case, you may want to create the setup as shown in Figure 8 with one image that contains all configuration settings without the database using it as the base image for weekly refreshes.

08. Possible Images/Containers Configuration for Dev Teams

A Few Other Tips

There are a couple of other commands I want to mention.

The docker exec command allows you to run a command within the container. Listing below starts interactive PowerShell session in internals container. You can create folders, run PS commands and do whatever you need to do there.

docker exec -it internals PowerShell

You can move files in between host and container using docker cp command. This is the perfect way to restore the database from the backup in the container.

# Copy files
# From host to container
docker cp mydb.bak internals:c:\backups\

# From container to host
docker cp internals:c:\backups\mydb.bak c:\hostfolder\

Wrapping Up

While Docker-based setup works great in majority of the cases, there are a few caveats I need to mention besides security context issue during the docker build. You would also have some limitations if you rely on Windows Authentication. You cannot make containers the members of domain. You can use Group Managed Service accounts (gMSA) to handle Windows Authentication but, obviously, this setup is much more complicated. Some details are available here.

Next, this setup is not necessarily the best for performance-related demos and/or load testing. I found that under heavy OLTP load, the bottleneck shifts towards network layer invalidating the results. Also, I did not find a way to connect to in-container SQL Server performance counters with PerfMon application. You can workaround it with sys.dm_os_performance_counters view, but it is less convenient.

Neither would it work very well for High Availability-related scenarios. HA in containers world be achieved in the different way and VM-based virtualization is the better choice if you need to test AlwaysOn Availability Groups and Failover Clusters.

Fortunately, you are not limited to the single technology. You can use VMs for some and containers for other scenarios. The choice is yours.

Docker PS cheatsheet and docker build demo is available for download from here.

The Hidden Danger of Readable Secondaries in SQL Server AlwaysOn Availability Groups

A long time ago in a galaxy far, far away, I had to troubleshoot interesting performance issue in SQL Server. Suddenly, the CPU load on the server started to climb up. Nothing changed in terms of workload. The system still processed the same amount of requests. The execution plans of the critical queries stayed the same. Nevertheless, the CPU usage grew up slowly and steadily by a few percent per hour.

Eventually, we nailed it down. The problem occured in very busy OLTP system with very volatile data. We noticed that system performed much more I/O (logical and physical) than before. It was very strange, because nothing should have changed that day. Finally, we found that we have large number of deleted rows in the database that had not been cleaned up by ghost cleanup task.

It was the most impactful for the few “queue” tables in the system that handled hundreds of inserts and deletes per second. The tables had millions of data pages despite that they stored just a handful of rows. The tables supposed to be small and there were bunch of queries that performed Clustered Index Scan over them burning a lot of CPU resources to perform mullions of logical and physical reads.

The output of sys.dm_db_index_physical_stats looked similar to Figure 1 below. Very little record_count with very large number of page_count and version_ghost_record_count. The latter one indicates how many ghosted rows cannot be cleaned up due to row-versioning transactions in the system.

01. Output of sys.dm_db_index_physical_stats

We looked at the active transaction and discovered, that one of our readable secondaries ran terrible SELECT that cross-join several hundred-million row tables. Unfortunately, SQL Server allowed it to execute and this select was running for more than 12 hours deferring ghost and version store clean-up on primary node. The bottom line – non-optimized workload on the readable secondary node can affect your primary. Heh.

You can read why it happens in my recent blog post at Apress.com and in my upcoming book on locking, blocking and concurrency (should be published in October). However, I also wanted to mention it here and provide you the demo scripts and video that demonstrates it.

Remember about this overhead. Secondary nodes are great to scale your read workload but they are by no means “set it and forget about it” kind of solution.

Blocking Monitoring Framework: Capture and Analyze SQL Server Blocking and Deadlock Information With Event Notifications

If you worked with SQL Server for a while, you should have noticed how landscape changed over the years. We are dealing with the different problems now. Five years ago, majority of the issues I saw were related to non-optimized queries. There were the huge scans with a lot of  physical I/O and bad performance.

You do not see them as often nowadays. It is very cheap to solve the problems by upgrading the server. Put a couple hundred GBs of RAM and cache all the data; add more CPUs and problems magically disappear. The root-cause has not been fixed but who cares?

Surprisingly, there is one category of the issues that did not went away – concurrency. It even becomes worse. Modern servers handle more users and problems that did not exist with 50 concurrent users may put the server to its knees with 5000 users. I’ve been constantly involved in the troubleshooting of various concurrency issues and, in fact, I see more and more of them overtime.

Troubleshooting of the blocking and concurrency issues is, in the nutshells, a simple process. You need to identify the processes involved in blocking conditions or deadlocks and analyze why those processes acquire the locks on the same resources. In majority of cases, you need to analyze queries and their execution plans identifying possible inefficiencies that led to excessive number of locks being acquired.

Collecting this information is not a trivial task. The information is exposed through DMVs (you can download the set of scripts here); however, it requires you to run the queries at time when blocking occurred. Fortunately, SQL Server allows you to capture blocking and deadlock conditions with the blocked process report and deadlock graph, analyzing them later.

There is the caveat though. Neither blocked process report nor deadlock graph provide you execution plans of the statements. Nor do they always include affected statements in the plain text. You may need to query plan cache and other DMVs to get this information and longer you wait lesser is the chance that the information is available. Moreover, SQL Server may generate enormous number of blocked process reports in cases of prolonged blocking and complex blocking chains, which complicates the analysis.

This analysis may become very time consuming, especially if you need to deal with the large number of servers. Over the years, I have created the set of routines, which simplify it for me. I have been thinking to publish my scripts for a while, but I’ve never had time to polish them enough for public consumption. Until now – and I am very happy to share my collection with all of you. So allow me introduce the Blocking Monitoring Framework, which I am using with majority of my servers!

This framework is using Event Notifications. It captures blocked process report and deadlock graphs and parses them at time when event occurred and all data is still available in the system. All information is persisted in the set of tables for the further analysis.

The first version is available for download. I also promise you that I am going to support and enhance it in the future publishing the new versions on the regular basis.

Please feel free to contact me in case of any questions. I would also appreciate if you provide me any blocked process reports and deadlock graphs that framework was unable to parse. I will address the issues as quickly as I could.

Blocking Monitoring Framework

Locking in Microsoft SQL Server (Part 21) – Intro Into Transaction Management and Error Handling

As strange as it sounds, I have never considered transaction management to be the topic that requires explanation. However, it seems that some aspects of it are confusing and may benefit from the separate, long overdue, blog post.

Transaction Types

There are three types of transactions in SQL Server – explicit, autocommitted and implicit.

Explicit transactions are explicitly controlled by the code. You can start them by using BEGIN TRAN statement. They will remain active until you explicitly call COMMIT or ROLLBACK in the code.

In case, when there are no active transactions present, SQL Server would use autocommitted transactions – starting transactions and committing them for each statement it executes. Autocommitted transactions work on per-statement rather than per-module level. For example, when a stored procedure consists of five statements; SQL Server would have five autocommitted transactions executed. Moreover, if this procedure failed in the middle of execution, SQL Server would not roll back previously committed autocommitted transactions. This behavior may lead to logical data inconsistency in the system.

For the logic that includes multiple data modification statements, autocommitted transactions are less efficient than explicit transactions due to the logging overhead they introduce. In this mode, every statement would generate transaction log records for implicit BEGIN TRAN and COMMIT operations, which leads to the large amount of transaction log activity and degrade performance of the system.

There is another potential performance hit caused by excessive number of autocommitted transactions. As you know, SQL Server implements Write-Ahead Logging to support transaction durability hardening log records on-disk synchronously with data modifications. Internally, however, SQL Server batches log write operations and caches log records in memory in small 60KB structures called log buffers. COMMIT log record forces SQL Server to flush log buffers to disk introducing synchronous I/O operation.

Figure 1 illustrates this condition. INSERT_1, UPDATE_1 and DELETE_1 operations run in autocommitted transactions generating additional log records and forcing log buffer to flush on each COMMIT. Alternatively, INSERT_2, UPDATE_2 and DELETE_2 operations run in implicit transaction, which leads to more efficient logging.

01. Transaction Logging with Autocommitted and Explicit Transactions

You can run the code below to see this overhead in action. It performs INSERT/UPDATE/DELETE sequence 10,000 times in the loop in autocommitted and explicit transactions respectively, measuring execution time and transaction log throughput with sys.dm_io_virtual_file_stats view.

create table dbo.TranOverhead
(
    Id int not null,
    Col char(50) null,
    constraint PK_TranOverhead
    primary key clustered(Id)
);

-- Autocommitted transactions
declare
    @Id int = 1,
    @StartTime datetime = getDate(),
    @num_of_writes bigint,
    @num_of_bytes_written bigint

select @num_of_writes = num_of_writes, @num_of_bytes_written = num_of_bytes_written
from sys.dm_io_virtual_file_stats(db_id(),2);

while @Id < 10000
begin
    insert into dbo.TranOverhead(Id, Col) values(@Id, 'A');

    update dbo.TranOverhead set Col = 'B' where Id = @Id;

    delete from dbo.TranOverhead where Id = @Id;

    set @Id += 1;
end;

select 
    datediff(millisecond, @StartTime, getDate()) as [Exec Time ms: Autocommitted Tran]
    ,s.num_of_writes - @num_of_writes as [Number of writes]
    ,(s.num_of_bytes_written - @num_of_bytes_written) / 1024 as [Bytes written (KB)]
from
    sys.dm_io_virtual_file_stats(db_id(),2) s;
go

-- Explicit Tran
declare
    @Id int = 1,
    @StartTime datetime = getDate(),
    @num_of_writes bigint,
    @num_of_bytes_written bigint

select @num_of_writes = num_of_writes, @num_of_bytes_written = num_of_bytes_written
from sys.dm_io_virtual_file_stats(db_id(),2);

while @Id < 10000
begin
    begin tran
        insert into dbo.TranOverhead(Id, Col) values(@Id, 'A');

        update dbo.TranOverhead set Col = 'B' where Id = @Id;

        delete from dbo.TranOverhead where Id = @Id;
    commit
    set @Id += 1;
end;

select 
    datediff(millisecond, @StartTime, getDate()) as [Exec Time ms: Explicit Tran]
    ,s.num_of_writes - @num_of_writes as [Number of writes]
    ,(s.num_of_bytes_written - @num_of_bytes_written) / 1024 as [Bytes written (KB)]
from
    sys.dm_io_virtual_file_stats(db_id(),2) s;

In my environment and this test, explicit transactions are about two times faster and generated three less log activity than autocommitted ones as shown in Figure 2.

02. Explicit and Autocommitted Transaction Performance

As the side note, SQL Server 2014 and above allows you to improve transaction log throughput by using delayed durability. In this mode, SQL Server does not flush log buffers when COMMIT log records are generated. This reduces the number of disk writes at cost of potential small data loss in case of disaster.

SQL Server also supports implicit transactions, which you can enable with SET IMPLICIT_TRANSACTION ON statement. When this option is enabled, SQL Server starts the new transaction when there is no active explicit transactions present. This transaction stays active until you explicitly issue COMMIT or ROLLBACK statement.

Implicit transactions may make transaction management more complicated and they are rarely used in production. However, there is the caveat – SET ANSI_DEFAULT ON option also automatically enables implicit transactions. This behavior may lead to unexpected concurrency issues in the system.

Error Handling

The error handling in SQL Server is the tricky subject especially with transactions involved. SQL Server would handle exceptions differently depending on error severity, active transaction context and several other factors.

Let’s look how exceptions affect control flow during execution. Listing below creates two tables- dbo.Customers and dbo.Orders – and populates them with the data. Note the existence of foreign key constraint defined in dbo.Orders table.

create table dbo.Customers
(
    CustomerId int not null,
    constraint PK_Customers
    primary key(CustomerId)
);

create table dbo.Orders
(
    OrderId int not null,
    CustomerId int not null,

    constraint FK_Orders_Customerss
    foreign key(CustomerId)
    references dbo.Customers(CustomerId)
);
go

create proc dbo.ResetData
as
begin
    begin tran
        delete from dbo.Orders;
        delete from dbo.Customers;
        insert into dbo.Customers(CustomerId) values(1),(2),(3);
        insert into dbo.Orders(OrderId, CustomerId) values(2,2);
    commit
end;

exec dbo.ResetData;

Let’s run three DELETE statements in one batch as shown below. The second statement will trigger a foreign key violation error. @@ERROR system variable provides the error number for the last T-SQL statement executed (0 means no errors).

delete from  dbo.Customers where CustomerId = 1; -- Success
select @@ERROR as [@@ERROR: CustomerId = 1];
delete from  dbo.Customers where CustomerId = 2; -- FK Violation
select @@ERROR as [@@ERROR: CustomerId = 2];
delete from  dbo.Customers where CustomerId = 3; -- Success
select @@ERROR as [@@ERROR: CustomerId = 3];
go

select * from dbo.Customers;

Figure 3 illustrates the output of the code. As you can see, SQL Server continues execution after non-critical foreign key violation error deleting a row with CustomerId=3 afterwards.

03. Running Three Autocommitted Transactions in a Batch

The situation would change when you use TRY..CATCH block as shown below.

exec dbo.ResetData;
go

begin try
    delete from  dbo.Customers where CustomerId = 1; -- Success
    delete from  dbo.Customers where CustomerId = 2; -- FK Violation
    delete from  dbo.Customers where CustomerId = 3; -- Not executed
end try
begin catch
    select 
        ERROR_NUMBER() as [Error Number]
        ,ERROR_LINE() as [Error Line]
        ,ERROR_MESSAGE() as [Error Message];
end catch
go

select * from dbo.Customers;

As you can see in Figure 4, the error was caught in the CATCH block and the third deletion statement has not been executed.

04. Running Three Autocommitted Transactions in TRY..CATCH block

Remember that non-critical exceptions do not automatically rollback explicit or implicit transactions regardless if TRY..CATCH block is present. You still need to commit or rollback transaction after the error.

Depending on severity of the error, transaction in which error occurred may be committable or become uncommittable and doomed. SQL Server would not allow you to commit uncommittable transactions and you must roll it back to complete it.

The XACT_STATE() function allows you to analyze the state of transaction and it returns one of three values:

  • 0 indicates that there is no active transactions present.
  • 1 indicates that there is an active and committable transaction present. You can perform any actions and data modifications committing transactions afterwards.
  • -1 indicates that there is an active uncommittable transaction present. You cannot commit such transaction.

There is very important SET option- XACT_ABORT– which allows you to control error-handling behavior in the code. When this option is set to ON, SQL Server treats every run-time error as severe, making transaction uncommittable. This prevents you from accidentally committing transactions when some data modifications failed with non-critical errors.

When XACT_ABORT is enabled, any error would terminate the batch when TRY..CATCH block is not present. For example, if you run the code from the second code sample above again using SET XACT_ABORT ON, the third DELETE statement would not be executed and only the row with CustomerId=1 will be deleted. Moreover, SQL Server would automatically rollback doomed uncommitted transaction after the batch completes.

The code below shows this behavior. The stored procedure dbo.GenerateError sets XACT_ABORT to ON and generates an error within the active transaction. @@TRANCOUNT variable returns the nested level of transaction (more on it later) and non-zero value indicate that transaction is active.

create proc dbo.GenerateError
as
begin
    set xact_abort on
    begin tran
        delete from dbo.Customers where CustomerId = 2; -- Error
        select 'This statement will never be executed';
end
go

exec dbo.GenerateError;
select 'This statement will never be executed';
go

-- Another batch
select XACT_STATE() as [XACT_STATE()], @@TRANCOUNT as [@@TRANCOUNT];
go

Figure 5 illustrates the output of the code. As you can see, batch execution has been terminated and transaction has been automatically rolled back at the end of the batch.

505. XACT_ABORT behavior (no TRY..CATCH block)

TRY..CATCH block, however, will allow you to capture the error even with XACT_ABORT set to ON.

begin try
    exec dbo.GenerateError;
    select 'This statement will never be executed';
end try
begin catch
    select 
        ERROR_NUMBER() as [Error Number]
        ,ERROR_PROCEDURE() as [Procedure]
        ,ERROR_LINE() as [Error Line]
        ,ERROR_MESSAGE() as [Error Message];

    select 
        XACT_STATE() as [XACT_STATE()]
        ,@@TRANCOUNT as [@@TRANCOUNT];

    if @@TRANCOUNT > 0
        rollback;
end catch

As you can see in Figure 6, exception has been trapped in the CATCH block with transaction still remain active there.

06. XACT_ABORT Behavior (with TRY..CATCH block)

Consistent error handling and transaction management strategies are extremely important and allow to avoid data consistency errors and improve data quality in the system. I would recommend the following approach as the best practice:

  • Always use explicit transactions in the code during data modifications. This would guarantee data consistency in transactions that consists of multiple operations. It is also more efficient comparing to individual autocommitted transactions.
  • Set XACT_ABORT to ON before data modifications. This would guarantee “all-or-nothing” behavior of the transaction preventing SQL Server from ignoring non-severe errors and committing partially completed transactions.
  • Use proper error handling with TRY..CATCH blocks and explicitly rollback transactions in case of exceptions. This helps to avoid unforeseen side effects in case of the errors.

The choice between client-side and server-side transaction management depends on application architecture. Client-side management is required when data modifications are done in the application code, for example changes are generated by ORM frameworks. On the other hand, stored procedure-based data access tier may benefit from server-side transaction management.

The code below provides the example of the stored procedure that implements server-side transaction management.

create proc dbo.PerformDataModifications
as
begin
    set xact_abort on
    begin try
        begin tran
            /* Perform required data modifications */
        commit
    end try
    begin catch
        if @@TRANCOUNT > 0 -- Transaction is active
            rollback;
        /* Addional error-handling code */
        throw;  -- Re-throw error. Alternatively, SP may return the error code
    end catch;
end;

Nested Transactions

SQL Server technically supports nested transactions; however, they are primarily intended to simplify transaction management during nested stored procedure calls. In practice, it means that the code needs to explicitly commit all nested transactions and the number of COMMIT calls should match the number of BEGIN TRAN calls. The ROLLBACK statement, however, rolls back entire transaction regardless of the current nested level.

The code below demonstrates this behavior. As I already mentioned, system variable @@TRANCOUNT returns the nested level of the transaction.

select @@TRANCOUNT as [Original @@TRANCOUNT];
begin tran
    select @@TRANCOUNT as [@@TRANCOUNT after the first BEGIN TRAN];
    begin tran
        select @@TRANCOUNT as [@@TRANCOUNT after the second BEGIN TRAN];
    commit
    select @@TRANCOUNT as [@@TRANCOUNT after nested COMMIT];    
    begin tran
        select @@TRANCOUNT as [@@TRANCOUNT after the third BEGIN TRAN];
    rollback
select @@TRANCOUNT as [@@TRANCOUNT after ROLLBACK];
rollback; -- This ROLLBACK generates the error

You can see the output of the code in Figure 7 below.

07. Nested Transactions

You can save the state of transaction and create a savepoint by using SAVE TRANSACTION statement. This will allow you to partially rollback a transaction returning to the most recent savepoint. The transaction will remain active and needs to be completed with explicit COMMIT or ROLLBACK statement later.

It is worth noting that uncommittable transactions with XACT_STATE() = -1 cannot be rolled back to savepoint. In practice, it means that you cannot rollback to savepoint after an error if XACT_ABORT is set to ON.

The code below illustrates savepoints in action. The stored procedure creates the savepoint when it runs in active transaction and rolls back to this savepoint in case of committable error.

create proc dbo.TryDeleteCustomer
(
    @CustomerId int 
)
as
begin
    -- Setting XACT_ABORT to OFF for rollback to savepoint to work
    set xact_abort off

    declare
        @ActiveTran bit

    -- Check if SP is calling in context of active transaction
    set @ActiveTran = IIF(@@TranCount > 0, 1, 0);

    if @ActiveTran = 0
        begin tran;
    else 
        save transaction TryDeleteCustomer;

    begin try
        delete dbo.Customers where CustomerId = @CustomerId;

        if @ActiveTran = 0
            commit;
        return 0;
    end try
    begin catch
        if @ActiveTran = 0 or XACT_STATE() = -1
        begin
            -- Rollback entire transaction
            rollback tran; 
            return -1; 
        end
        else begin
                -- Rollback to savepoint 
            rollback tran TryDeleteCustomer; 
            return 1; 
        end
    end catch;
end;
go

-- Test
declare
    @ReturnCode int

exec dbo.ResetData; 

begin tran
    exec @ReturnCode = TryDeleteCustomer @CustomerId = 1;
    select 
        1 as [CustomerId]
        ,@ReturnCode as [@ReturnCode]
        ,XACT_STATE() as [XACT_STATE()];
    
    if @ReturnCode >= 0
    begin
        exec @ReturnCode = TryDeleteCustomer @CustomerId = 2;
        select 
            2 as [CustomerId]
            ,@ReturnCode as [@ReturnCode]
            ,XACT_STATE() as [XACT_STATE()];    
    end
if @ReturnCode >= 0
    commit;
else 
    if @@TRANCOUNT > 0
        rollback;
go

select * from dbo.Customers;

The test triggered foreign key violation during the second dbo.TryDeleteCustomer call. This is non-critical error and, therefore, the code is able to commit after it as shown in Figure 8.

08. Transaction Has Been Committed After Rollback to Savepoint

It is worth noting that this example is shown for demonstration purposes only. From efficiency standpoint, it would be better to validate referential integrity and existence of the orders before deletion occurred rather than catching exception and rolling back to savepoint in case of an error.

I hope that those examples provided you the good overview of transaction management and error handling strategies in the system. If you want to dive deeper, I would strongly recommend you to read the great article by Erland Sommarskog, which provides you much more details on the subject.

Source code is available for download.

Table of Context

Splitting and Merging Non-Empty Partitions in Columnstore Indexes

Last week we discussed how to implement Sliding Window pattern in the tables with columnstore indexes. As I promised, today we are going to focus on partition management in such tables.

As you will remember, SQL Server does not allow you to split and merge non-empty columnstore partitions. I personally do not consider it as the huge limitation – perhaps because I rarely have reasons to do it in columnstore tables. However, there are still some cases when it may be required.

As one of examples, think about a table, partitioned by date, which may be populated with incorrect data from the future. Perhaps due to some issues in ETL processes. Some rows may be placed to incorrect (future) partition, which would prevent its split.

The common, by the book approach recommends dropping columnstore index, splitting or merging partitions and recreating the index afterwards. As you can imagine, it would lead to extremely inefficient process with huge amount of unnecessary overhead on large tables. After all, you have to drop and recreate columnstore index, converting table to Heap, while just subset of the partitions needs to be rebuilt. Fortunately, you can minimize the overhead with simple workaround:

  1. Switch partition(s) to split or merge to the separate staging table
  2. Split or merge partition(s) in the main table. You can do that because partitions will be empty after the previous step
  3. Drop columnstore index in the staging table, split/merge partition(s) there and recreate the index afterwards
  4. Switch partition(s) back from staging to the main table.

Let’s look at the process in details. I am going to recreate the tables I used last week with the script below.

create partition function pfOrders(datetime2(0))
as range right for values
('2016-11-01','2016-12-01','2017-01-01','2017-02-01'
,'2017-03-01','2017-04-01','2017-05-01','2017-06-01'
,'2017-07-01','2017-08-01','2017-09-01','2017-10-01'
,'2017-11-01','2017-12-01','2018-01-01');

create partition scheme psOrders 
as partition pfOrders
all to ([Primary]);

create table dbo.Orders
(
        OrderDate datetime2(0) not null,
        OrderId int not null,
        Placeholder char(100),
)
on psOrders(OrderDate);

create table dbo.OrderLineItems
(
        OrderDate datetime2(0) not null,
        OrderId int not null,
        OrderLineItemId int not null,
        Placeholder char(100),
)
on psOrders(OrderDate);
go

-- Left-most and right-most are empty
;with N1(C) as (select 0 union all select 0) -- 2 rows
,N2(C) as (select 0 from N1 as t1 cross join N1 as t2) -- 4 rows
,N3(C) as (select 0 from N2 as t1 cross join N2 as t2) -- 16 rows
,N4(C) as (select 0 from N3 as t1 cross join N3 as t2) -- 256 rows
,N5(C) as (select 0 from N4 as t1 cross join N4 as t2) -- 65,536 rows
,Ids(Id) as (select row_number() over (order by (select null)) from N5)
insert into dbo.Orders(OrderDate, OrderId)
        select dateadd(day,Id % 390,'2016-11-01'), ID
        from Ids;

insert into dbo.OrderLineItems(OrderDate,OrderId, OrderLineItemId)
        select OrderDate, OrderId, OrderId
        from dbo.Orders;
go

create clustered columnstore index CCI_Orders on dbo.Orders
on psOrders(OrderDate);

create clustered columnstore index CCI_OrderLineItems on dbo.OrderLineItems
on psOrders(OrderDate);
go

select 
        t.object_id, i.index_id, i.name as [Index], p.partition_number,
        p.[Rows], p.data_compression_desc, 
        fg.name as [Filegroup],
        sum(a.total_pages) as TotalPages, 
        sum(a.used_pages) as UsedPages, 
        sum(a.data_pages) as DataPages,
        sum(a.total_pages) * 8 as TotalSpaceKB, 
        sum(a.used_pages) * 8 as UsedSpaceKB, 
        sum(a.data_pages) * 8 as DataSpaceKB
from 
        sys.tables t with (nolock) join sys.indexes i with (nolock) on
                t.object_id = i.object_id
        join sys.partitions p with (nolock) on 
                i.object_id = p.object_id AND i.index_id = p.index_id
        join sys.allocation_units a with (nolock) on 
                p.partition_id = a.container_id
        join sys.filegroups fg with (nolock) on 
                a.data_space_id = fg.data_space_id
where
        t.object_id = object_id(N'dbo.Orders')
group by
        t.object_id, i.index_id, i.name, p.partition_number,
        p.[Rows], p.data_compression_desc, fg.name
order by 
        i.index_id, p.partition_number;

Figure 1 demonstrates data distribution across partitions in the dbo.Orders table. As you can see, we have 13 non-empty partitions started from November 2016 up to end of November 2017.

01.Initial Data Distribution

Let’s assume that we would like to merge November and December 2016 together, which are partitions 2 and 3 in the tables.

As the first step, we will create the staging tables. They need to be partitioned in the same way as the main tables. It is better to use separate partition function to decouple merge and split operations in main and staging tables from each other. Remember that SQL Server would hold schema modification (Sch-M) lock on all tables that use partition function for the duration of merge and split operations.

Partition function on the staging tables does not need to include all partitions from the main tables. It just need to include partition(s) you are about it split or merge and two adjacent partitions on the left and right sides. This will guarantee that partitions in the main and staging tables are aligned before and after the operation. It would also support partition switch without requirement of creating extra CHECK constraints on the staging tables. As you can see in the code below, pfOrdersStaging function consists of four partitions:

  • Everything prior November 2016. Will be empty.
  • November 2016
  • December 2016
  • Everything on or after January 1st 2017. Will be empty
create partition function pfOrdersStaging(datetime2(0))
as range right for values
('2016-11-01','2016-12-01','2017-01-01');

create partition scheme psOrdersStaging
as partition pfOrdersStaging
all to ([Primary]);

create table dbo.OrdersStaging
(
        OrderDate datetime2(0) not null,
        OrderId int not null,
        Placeholder char(100),
)
on psOrdersStaging(OrderDate);

create clustered columnstore index CCI_OrdersStaging on dbo.OrdersStaging
on psOrdersStaging(OrderDate);

create table dbo.OrderLineItemsStaging
(
        OrderDate datetime2(0) not null,
        OrderId int not null,
        OrderLineItemId int not null,
        Placeholder char(100),
)
on psOrdersStaging(OrderDate);

create clustered columnstore index CCI_OrderLineItemsStaging on dbo.OrderLineItemsStaging
on psOrdersStaging(OrderDate);

After the staging tables are created, we can switch two partitions from the main tables there.

alter table dbo.Orders switch partition 2 
to dbo.OrdersStaging partition 2;

alter table dbo.OrderLineItems switch partition 2 
to dbo.OrderLineItemsStaging partition 2;

alter table dbo.Orders switch partition 3
to dbo.OrdersStaging partition 3;

alter table dbo.OrderLineItems switch partition 3
to dbo.OrderLineItemsStaging partition 3;

Now we can merge the partitions in the main tables. SQL Server will allow us to proceed because they are empty after the switch.

alter partition function pfOrders() merge range ('2016-12-01');

Finally, we can merge partitions in our staging tables and switch them back to the main tables as shown below.

drop index CCI_OrdersStaging on dbo.OrdersStaging;
drop index CCI_OrderLineItemsStaging on dbo.OrderLineItemsStaging;
go

-- Offline with Sch-M
alter partition function pfOrdersStaging()
merge range ('2016-12-01');
go

create clustered columnstore index CCI_OrdersStaging on dbo.OrdersStaging
on psOrdersStaging(OrderDate);

create clustered columnstore index CCI_OrderLineItemsStaging on dbo.OrderLineItemsStaging
on psOrdersStaging(OrderDate);
go

-- Switching partitions back
alter table dbo.OrdersStaging switch partition 2 
to dbo.Orders partition 2;

alter table dbo.OrderLineItemsStaging switch partition 2 
to dbo.OrderLineItems  partition 2;

As you can see in Figure 2, now partition 2 stores the data for both November and December  of 2016.

02.Data Distribution After Merge

We can use the same approach for partition split. Code below demonstrates how to split November 2017 partition using 2017-11-15 as the new range value.

-- Spliting November 2017 partition (#13)
-- Recreating Staging objects first
drop table if exists dbo.OrdersStaging;
drop table if exists dbo.OrderLineItemsStaging;
if exists(select * from sys.partition_schemes where name = 'psOrdersStaging') drop partition scheme psOrdersStaging;
if exists(select * from sys.partition_functions where name = 'pfOrdersStaging') drop partition function pfOrdersStaging;
go

create partition function pfOrdersStaging(datetime2(0))
as range right for values
('2017-11-01','2017-12-01');

create partition scheme psOrdersStaging
as partition pfOrdersStaging
all to ([Primary]);

create table dbo.OrdersStaging
(
        OrderDate datetime2(0) not null,
        OrderId int not null,
        Placeholder char(100),
)
on psOrdersStaging(OrderDate);

create clustered columnstore index CCI_OrdersStaging on dbo.OrdersStaging
on psOrdersStaging(OrderDate);

create table dbo.OrderLineItemsStaging
(
        OrderDate datetime2(0) not null,
        OrderId int not null,
        OrderLineItemId int not null,
        Placeholder char(100),
)
on psOrdersStaging(OrderDate);

create clustered columnstore index CCI_OrderLineItemsStaging on dbo.OrderLineItemsStaging
on psOrdersStaging(OrderDate);

-- Switching partitions to the staging table
alter table dbo.Orders switch partition 13
to dbo.OrdersStaging partition 2;

alter table dbo.OrderLineItems switch partition 13
to dbo.OrderLineItemsStaging partition 2;
go

-- Splitting partition in the main table 
alter partition scheme psOrders 
next used [PRIMARY];

alter partition function pfOrders()
split range ('2017-11-15');
go

-- Now spllitting partition in the Staging table
drop index CCI_OrdersStaging on dbo.OrdersStaging;
drop index CCI_OrderLineItemsStaging on dbo.OrderLineItemsStaging;
go

-- Offline with Sch-M
alter partition scheme psOrdersStaging 
next used [PRIMARY];

alter partition function pfOrdersStaging()
split range ('2017-11-15');
go

create clustered columnstore index CCI_OrdersStaging on dbo.OrdersStaging
on psOrdersStaging(OrderDate);

create clustered columnstore index CCI_OrderLineItemsStaging on dbo.OrderLineItemsStaging
on psOrdersStaging(OrderDate);

-- Switching partitions back
alter table dbo.OrdersStaging switch partition 2 
to dbo.Orders partition 13;

alter table dbo.OrdersStaging switch partition 3 
to dbo.Orders partition 14;

alter table dbo.OrderLineItemsStaging switch partition 2 
to dbo.OrderLineItems  partition 13;

alter table dbo.OrderLineItemsStaging switch partition 3
to dbo.OrderLineItems  partition 14;

As you can see, with just a few extra operators, we were able to remove the overhead of rebuilding columnstore index on entire table.

Unfortunately, this is offline approach. The data in the main tables would become inconsistent as soon as you switch partitions to the staging tables. Fortunately, if data is static, you can address it with some coding.

Instead of switching partitions to the staging tables, you can copy the data there. After split or merge is done in the staging tables, you can empty source partitions by switching them to another, temporary tables, and switch data back from the staging tables.

The code below illustrates this approach.

-- If data in the table is static 
-- Let's merge November-December 2016 and January 2017 
drop table if exists dbo.OrdersStaging;
drop table if exists dbo.OrderLineItemsStaging;
if exists(select * from sys.partition_schemes where name = 'psOrdersStaging') drop partition scheme psOrdersStaging;
if exists(select * from sys.partition_functions where name = 'pfOrdersStaging') drop partition function pfOrdersStaging;
go

create partition function pfOrdersStaging(datetime2(0))
as range right for values
('2016-11-01','2017-01-01','2017-02-01');

create partition scheme psOrdersStaging
as partition pfOrdersStaging
all to ([Primary]);

create table dbo.OrdersStaging
(
        OrderDate datetime2(0) not null,
        OrderId int not null,
        Placeholder char(100),
)
on psOrdersStaging(OrderDate);

create clustered columnstore index CCI_OrdersStaging on dbo.OrdersStaging
on psOrdersStaging(OrderDate);

create table dbo.OrderLineItemsStaging
(
        OrderDate datetime2(0) not null,
        OrderId int not null,
        OrderLineItemId int not null,
        Placeholder char(100),
)
on psOrdersStaging(OrderDate);

create clustered columnstore index CCI_OrderLineItemsStaging on dbo.OrderLineItemsStaging
on psOrdersStaging(OrderDate);

-- Copying data from Main to Staging tables
insert into dbo.OrdersStaging(OrderDate,OrderId,Placeholder)
        select OrderDate, OrderId, Placeholder
        from dbo.Orders
        where $Partition.pfOrders(OrderDate) in (2,3);

insert into dbo.OrderLineItemsStaging(OrderDate,OrderId,OrderLineItemId,Placeholder)
        select OrderDate, OrderId, OrderLineItemId, Placeholder
        from dbo.OrderLineItems
        where $Partition.pfOrders(OrderDate) in (2,3);
go

-- Merging partitions in Staging tables
drop index CCI_OrdersStaging on dbo.OrdersStaging;
drop index CCI_OrderLineItemsStaging on dbo.OrderLineItemsStaging;
go

alter partition function pfOrdersStaging()
merge range ('2017-01-01');
go

create clustered columnstore index CCI_OrdersStaging on dbo.OrdersStaging
on psOrdersStaging(OrderDate);

create clustered columnstore index CCI_OrderLineItemsStaging on dbo.OrderLineItemsStaging
on psOrdersStaging(OrderDate);
go

-- Creating temporary table to switch original
-- partitions from the main table
create partition function pfOrdersTmp(datetime2(0))
as range right for values
('2016-11-01','2017-01-01','2017-02-01');

create partition scheme psOrdersTmp
as partition pfOrdersTmp
all to ([Primary]);

create table dbo.OrdersTmp
(
        OrderDate datetime2(0) not null,
        OrderId int not null,
        Placeholder char(100),
)
on psOrdersTmp(OrderDate);

create clustered columnstore index CCI_OrdersTmp on dbo.OrdersTmp
on psOrdersTmp(OrderDate);

create table dbo.OrderLineItemsTmp
(
        OrderDate datetime2(0) not null,
        OrderId int not null,
        OrderLineItemId int not null,
        Placeholder char(100),
)
on psOrdersTmp(OrderDate);

create clustered columnstore index CCI_OrderLineItemsTmp on dbo.OrderLineItemsTmp
on psOrdersTmp(OrderDate);

-- Final steps. Let's do it in transaction
-- All operations are on metadata level
set xact_abort on
begin tran
        -- Switching original partitions out
        alter table dbo.Orders switch partition 2 
        to dbo.OrdersTmp partition 2;

        alter table dbo.OrderLineItems switch partition 2 
        to dbo.OrderLineItemsTmp partition 2;

        alter table dbo.Orders switch partition 3
        to dbo.OrdersTmp partition 3;

        alter table dbo.OrderLineItems switch partition 3
        to dbo.OrderLineItemsTmp partition 3;

        -- Merge
        alter partition function pfOrders()
        merge range ('2017-01-01');

        -- Switching partitions from the staging table
        alter table dbo.OrdersStaging switch partition 2 
        to dbo.Orders partition 2;

        alter table dbo.OrderLineItemsStaging switch partition 2 
        to dbo.OrderLineItems  partition 2;
commit;
go

-- Dropping temporary tables
drop table dbo.OrderLineItemsTmp;
drop table dbo.OrdersTmp;
drop partition scheme psOrdersTmp;
drop partition function pfOrdersTmp;

As you can see, this implementation is relatively simple and considering that data in columnstore indexes is often static, you may be able to utilize it to perform partition maintenance transparently to the users.

Source code is available for download.

Implementing Sliding Windows Data Purge Pattern with Columnstore Indexes

It’s been a while since my last blog post. Many things happened, including another great PASS Summit where I presented the session of Data Partitioning (by the way, you can download the slides and demos from my Presentation page). As usual, I ran out of time 🙂

As many of you know, Data Partitioning is very big and complex subject, which is impossible to cover even with half-day session. One of the topics I left out of scope is how it plays with columnstore indexes. Even though partitioned columnstore tables are similar to B-Tree ones, there are some differences between them.

The biggest difference resides in partition function split and merge behavior. With B-Tree indexes, you can split and merge non-empty partitions. SQL Server would split or merge the data automatically, granted with the schema-modification (Sch-M) table lock held in place. Other sessions would be unable to access the table but at least split and merge would work.

This is not the case with columnstore indexes where you would get the error when you try to split or merge non-empty partitions. There are several reasons for this limitation. Without diving very deep into columnstore index internals, I could say that the only option of doing split of merge internally is rebuilding columnstore index on affected partitions. Fortunately, you can split and merge empty columnstore partitions, which allow you to workaround the limitation and also implement Sliding Window pattern and use partitioning to purge the data.

The Sliding Window pattern stands for scenario when we want to retain the data for some period of time. Consider the situation when we need to keep last 12 months of data. In the beginning of each month, we may purge the data that is older than 12 months, basically having a window on the data that slides purging the oldest data, based on a given schedule.

Data partitioning is great in this scenario. It allows to keep the purge process on metadata- and minimally logged-levels by switching the old-data partition to staging table truncating it afterwards. This helps to avoid log-intensive DELETE operations and reduce blocking in the system.

Let’s look at implementation. First, I would create partition function and scheme considering that now is November 2017 and we would need to retain 12-month of data starting November 2016.

create partition function pfOrders(datetime2(0))
as range right for values
('2016-11-01','2016-12-01','2017-01-01','2017-02-01'
,'2017-03-01','2017-04-01','2017-05-01','2017-06-01'
,'2017-07-01','2017-08-01','2017-09-01','2017-10-01'
,'2017-11-01','2017-12-01','2018-01-01');

create partition scheme psOrders 
as partition pfOrders
all to ([Primary]);

This code will create 16 partitions. The left-most partition will host the data prior 2016-11-01 and it will be empty. The two right-most partitions will host the data for December 2017 and everything starting on 2018-01-01. They also will be empty at this point.

There is the simple reason why we want to have two right-most partitions empty. Let’s assume that we will run the monthly purge process in December and December’s partition will already store some data. As part of the purge process, we would like to create the new partition for the “next month” data. It is beneficial to perform this and split partition function on empty partition. With B-Tree tables, it is nice to have implementation. It allows to avoid any data scans and movements reducing the time for schema modification lock being held. For columnstore indexes, this is actually the must have part of design – you would be unable to split non-empty partition at all.

If you ever implemented Sliding Window pattern with B-Tree indexes, you would be aware of empty right-most partition. However, you might notice that in our example, there is also empty left-most partition present. As I already mentioned, columnstore indexes would not allow you to merge non-empty partitions and you need to have an extra empty partition to perform the merge after old data is purged.

Let’s look at the process in details. As the first step, I will create two tables that will share the partition function and scheme. We will populate them with some data and create the columnstore indexes on the tables.

create table dbo.Orders
(
	OrderDate datetime2(0) not null,
	OrderId int not null,
	Placeholder char(100),
)
on psOrders(OrderDate);

create table dbo.OrderLineItems
(
	OrderDate datetime2(0) not null,
	OrderId int not null,
	OrderLineItemId int not null,
	Placeholder char(100),
)
on psOrders(OrderDate);

-- Left-most and right-most are empty
;with N1(C) as (select 0 union all select 0) -- 2 rows
,N2(C) as (select 0 from N1 as t1 cross join N1 as t2) -- 4 rows
,N3(C) as (select 0 from N2 as t1 cross join N2 as t2) -- 16 rows
,N4(C) as (select 0 from N3 as t1 cross join N3 as t2) -- 256 rows
,N5(C) as (select 0 from N4 as t1 cross join N4 as t2) -- 65,536 rows
,Ids(Id) as (select row_number() over (order by (select null)) from N5)
insert into dbo.Orders(OrderDate, OrderId)
	select dateadd(day,Id % 390,'2016-11-01'), ID
	from Ids;

insert into dbo.OrderLineItems(OrderDate,OrderId, OrderLineItemId)
	select OrderDate, OrderId, OrderId
	from dbo.Orders;

create clustered columnstore index CCI_Orders on dbo.Orders
on psOrders(OrderDate);

create clustered columnstore index CCI_OrderLineItems on dbo.OrderLineItems
on psOrders(OrderDate);

The next listing returns data distribution and allocation information for dbo.Orders table.

select 
	t.object_id, i.index_id, i.name as [Index], p.partition_number,
	p.[Rows], p.data_compression_desc, 
	fg.name as [Filegroup],
	sum(a.total_pages) as TotalPages, 
	sum(a.used_pages) as UsedPages, 
	sum(a.data_pages) as DataPages,
	sum(a.total_pages) * 8 as TotalSpaceKB, 
	sum(a.used_pages) * 8 as UsedSpaceKB, 
	sum(a.data_pages) * 8 as DataSpaceKB
from 
	sys.tables t with (nolock) join sys.indexes i with (nolock) on
		t.object_id = i.object_id
	join sys.partitions p with (nolock) on 
		i.object_id = p.object_id AND i.index_id = p.index_id
	join sys.allocation_units a with (nolock) on 
		p.partition_id = a.container_id
	join sys.filegroups fg with (nolock) on 
		a.data_space_id = fg.data_space_id
where
	t.object_id = object_id(N'dbo.Orders')
group by
	t.object_id, i.index_id, i.name, p.partition_number,
	p.[Rows], p.data_compression_desc, fg.name
order by 
	i.index_id, p.partition_number;

Figure 1 illustrates the output of the listing. As you can see, data is distributed across 13 partitions with left-most and two right-most partitions empty.

01. Data Distribution And Allocation Information

Let’s try to split non-empty partition, just for the demo purposes

alter partition scheme psOrders next used [Primary];

alter partition function pfOrders()
split range ('2017-11-15');

As expected, it would not work with the error message shown below.

02.Split Non-Empty Partition

The merge of non-empty partition would also fail.

03.Merge Non-Empty Partition

Fortunately, we can split an empty partition. Again, let’s assume that we want to start purging process in December.  I would insert a couple rows to December’s partitions making them non-empty.

insert into dbo.Orders(OrderDate, OrderId) values('2017-12-02',100000);
insert into dbo.OrderLineItems(OrderDate,OrderId, OrderLineItemId) values('2017-12-02',100000,100000);

Let’s create another empty partition by splitting partition function. The right-most partition is still empty and operation would succeed. As result, we will have separate empty partitions for January, 2018 and another partition that will store all data starting 2018-02-01. We will split the right-most empty partition again during our January’s 2018 purge process.

alter partition scheme psOrders next used [Primary];

alter partition function pfOrders()
split range ('2018-02-01');

The process of purging old data would consist of several steps. First, we need to create the staging tables to which we will switch old November 2016 partitions. Those tables need to be empty, have exactly the same schema with the main tables and reside on the same filegroup.

create table dbo.OrdersTmp
(
	OrderDate datetime2(0) not null,
	OrderId int not null,
	Placeholder char(100),
) on [Primary];

create clustered columnstore index CCI_OrdersTmp on dbo.OrdersTmp
on [Primary];

create table dbo.OrderLineItemsTmp
(
	OrderDate datetime2(0) not null,
	OrderId int not null,
	OrderLineItemId int not null,
	Placeholder char(100),
)
on [Primary];

create clustered columnstore index CCI_OrderLineItemsTmp on dbo.OrderLineItemsTmp
on [Primary];

After tables were created, we can switch November 2016 partitions there truncating staging tables afterwards. Remember that in the main tables, November 2016 is the second left-most partition.

alter table dbo.Orders switch partition 2 
to dbo.OrdersTmp;

alter table dbo.OrderLineItems switch partition 2 
to dbo.OrderLineItemsTmp;

truncate table dbo.OrdersTmp;
truncate table dbo.OrderLineItemsTmp;

If you look at dbo.Orders allocation information again, you would see that two left-most partitions are now empty as shown in Figure 4

04. Data Distribution And Allocation Information After Purge

Now they can be merged, which will complete our implementation.

alter partition function pfOrders()
merge range('2016-11-01');

As you can see, implementation of Sliding Window pattern with columnstore indexes is very similar to B-Tree tables. The only differences are:

  • You must have empty right-most partition pre-allocated to perform the split. I’d like to reiterate that even though it is not required with B-Tree indexes, such empty partition would reduce I/O overhead and table locking during split operation there.
  • You must have another empty left-most partition to perform the merge. This is not required nor needed with B-Tree indexes.

I’d like to thank Ned Otter (blog) for his suggestions and feedback. They convinced me to blog on the subject.

Source code is available for download.

Next (2017-11-29): Splitting and Merging Non-Empty Partitions in Columnstore Indexes