Author Archives: Dmitri Korotkevitch

Locking in Microsoft SQL Server (Part 20) – Range lock (RangeS-U) deadlock due to IGNORE_DUP_KEY index option

As you know, SQL Server uses range locks to protect the range of the index keys. This usually happens in SERIALIZABLE isolation level. This level prevents phantom and non-repeatable reads phenomena and it guarantees that queries executed in transaction will always obtain the same set of data every time they were executed.

As the simple example, consider the table with ID column and two rows with ID = 1 and ID = 10. In SERIALIZABLE isolation level, the query that selects data from the table should always obtain those and only those two rows if you run it multiple times. SQL Server uses shared range lock protecting ID key range interval of (1..10), which guarantees that other transactions would not be able to update or delete existing rows nor insert any new rows into the interval.

In contrast, REPEATABLE READ isolation level uses row locks rather than range locks. They would prevent modifications of existing rows- 1 and 10- but would not prevent other transactions from inserting rows into the interval and introducing phantom read phenomena.

The range locks are usually acquired only in SERIALIZABLE isolation level; however, there is another, pretty much undocumented case, when SQL Server can use those locks. It happens even in READ UNCOMMITTED and READ COMMITTED SNAPSHOT modes when you have nonclustered indexes that have IGNORE_DUP_KEY=ON option. In that case rows with the duplicated index keys would not raise an error but rather being ignored. SQL Server would not insert then into the table.

This behavior leads to very hard to explain cases of blocking and even deadlocks in the system. Let’s look at the example and create the table with a few rows as shown below. As you see, nonclustered index on the table has IGNORE_DUP_KEY option enabled.

create table dbo.IgnoreDupKeysDeadlock
(
    CICol int not null,
    NCICol int not null
);

create unique clustered index IDX_IgnoreDupKeysDeadlock_CICol
on dbo.IgnoreDupKeysDeadlock(CICol);

create unique nonclustered index IDX_IgnoreDupKeysDeadlock_NCICol
on dbo.IgnoreDupKeysDeadlock(NCICol)
with (ignore_dup_key = on);

insert into dbo.IgnoreDupKeysDeadlock(CICol, NCICol)
values(0,0),(5,5),(10,10),(20,20);

Now let’s start transaction in READ UNCOMMITTED mode and insert the row into the table checking the locks session acquired.

set transaction isolation level read uncommitted
begin tran
    insert into dbo.IgnoreDupKeysDeadlock(CICol,NCICol) values(1,1);

    select request_session_id, resource_type, resource_description
        ,resource_associated_entity_id, request_mode, request_type, request_status
    from sys.dm_tran_locks
    where request_session_id = @SPID;

As you can see in Figure 1, INSERT statement acquired and held two exclusive (X) locks on the rows inserted into clustered and nonclustered indexes. It also obtained Range (RangeS-U) lock on nonclustered index. RangeS-U means that the key range is protected with the shared (S) lock and SQL Server uses update (U) scan within the range.

01. Locks Held by the Session

01. Locks Held by the Session

You may ask the obvious question – why the range lock is required? The reason is the way how SQL Server handles modifications of the data. The data is always inserted into or updated in the clustered index first followed by nonclustered index updates. With IGNORE_DUP_KEY=ON, SQL Server should prevent the situation when the duplicated keys were inserted into nonclustered index simultaneously after clustered index insert was done and, therefore, clustered index insert needs to be rolled back. Thus, it locks nonclustered index key range before any data modifications preventing other sessions from inserting any rows there.

You can confirm it by looking at lock_acquired xEvent events as it is show in Figure 2. As you can see, the RangeS-U lock is acquired before exclusive (X) locks on the resources.

02. lock_acquired Events During Insert

02. lock_acquired Events During Insert

The key problem here, however, is that RangeS-U lock behaves the same way as in SERIALIZABLE isolation level and it is held until the end of transaction. Moreover,  RangeS-U locks are incompatible with each other.  That can lead to very unpleasant and hard to understand deadlocks.

Let’s run the code shown below in another session. The first INSERT would succeed (it is in the different key range in the index). The second, however, would be blocked due to RangeS-U/RangeS-U lock incompatibility.

set transaction isolation level read uncommitted
begin tran
    -- Succeed
    insert into dbo.IgnoreDupKeysDeadlock(CICol,NCICol)	values(12,12);
    -- Blocked
    insert into dbo.IgnoreDupKeysDeadlock(CICol,NCICol) values(2,2);
commit

If we checked the locks held by the both sessions now, we would see the picture shown in Figure 3. You can see that session 2 successfully acquired the first range lock but the second range lock request is blocked due to incompatible range lock on the same key interval held by the session 1.

03. Locks Held by Both Sessions

03. Locks Held by Both Sessions

Finally, if we run another INSERT in the session 1 into the range locked by the session 2, it would be also blocked with the typical deadlock condition.

    insert into dbo.IgnoreDupKeysDeadlock(CICol,NCICol) values(11,11);
commit

Figure 4 shows you the deadlock graph.

04. Deadlock Graph

04. Deadlock Graph

Unfortunately, there is very little you can do about that. The only way to address the problem is removing IGNORE_DUP_KEY option from the index handling duplicates in the different ways. It may or may not work for you.

Finally, it is worth mentioning, that SQL Server does not use range locks in case of clustered indexes with IGNORE_DUP_KEY=ON option. Clustered indexes are modified first and SQL Server could detect duplicated keys at this stage without any extra range locking required.

Source code is available for download.

Table of Content.

LOB and Row-Overflow Storage in In-Memory OLTP in SQL Server 2016

I think many of us felt quite excited and the same time disappointed with In-Memory OLTP in SQL Server 2014. It was the great and promising technology but it had way too many limitations. The biggest one for me, perhaps, was inability to use data types that required off-row storage. It was possible to address that by changing database schema, implementing data partitioning or performing other tricks; however, it required complex development efforts and often made In-Memory OLTP migration cost ineffective.

SQL Server 2016 removes this and many other limitations. Now we can create tables with (max) columns and with data rows that exceed 8,060 bytes. There is the catch, however. Off-row storage in In-Memory OLTP works very differently comparing to on-disk tables and incorrect design decisions could and would affect performance of the system. Today we will look at that in details.

As all of us know, In-Memory OLTP does not use the data pages. Well, it uses data pages in nonclustered range indexes but the data rows are stored as the separate in-memory objects. They are linked into the row chains through the regular memory pointers. Every index in In-Memory OLTP adds another pointer and creates another chain of the rows.

There are two types of indexes supported in In-Memory OLTP – hash and nonclustered (range) indexes. I do not want to dive into all the details but hash index, in the nutshell, consists of in-memory hash table where each element stores the pointer to the data row chain with the same hash value of the key. You can see the simplified version in Figure 1, which shows the table with two hash indexes on Name and City columns and assumes that hash function generates the hash based on the first letter of the string.

01. Hash Indexes

01. Hash Indexes

Each data row has two timestamps that indicate its lifetime. They store the Global Transaction Timestamp of the transactions that inserted and deleted them. For example, you can see two “Ann” rows in the diagram. The first one, with City = “New York” has been created by a transaction with timestamp of 10. The city was updated to Cincinnati by transaction with timestamp of 50, which deleted the old and created the new versions of the row.

The second In-Memory OLTP index type – range index is very similar to B-Tree index defined on on-disk table. The range index consists of the data pages on root, intermediate and leaf levels. On root and intermediate levels, every index row points to the data page on the next level. On the leaf level, index rows point to the actual data rows with the same value of index key. The data pages in the index reference each other through another array-life structure called the Mapping Table as it illustrated in Figure 2. For example, the index row Kevin on the root page references the first element (PID = 1) in the mapping table, which, in turn, stores the pointer to intermediate data page with the highest key value of Kevin.

02. Nonclustered (range) indexes

02. Nonclustered (Range) Indexes

One of very key elements in this schema is that every index, in the nutshell, is covering. Every memory pointer references the actual data row structure regardless of how many columns were included to the index keys.

Every In-Memory OLTP object uses memory and is called a memory consumer. Memory Consumers allocate memory from the structures called varheaps – one varheap per In-Memory OLTP object. You can think about varheaps as the data structures that respond to and track memory allocation requests and can grow and shrink in size when needed. You can track detail memory-allocation information per-memory consumer using sys.dm_db_xtp_memory_consumers view.

Let’s look at the example and create the table with one hash and one nonclustered indexes and look at memory consumers as shown below.

create table dbo.MemoryConsumers
(
    ID int not null
        constraint PK_MemoryConsumers
        primary key nonclustered hash with (bucket_count=1024),
    Name varchar(256) not null,
    index IDX_Name nonclustered(Name)
)
with (memory_optimized=on, durability=schema_only);

select 
    i.name as [Index], i.index_id, a.xtp_object_id, a.type_desc, a.minor_id
    ,c.memory_consumer_id, c.memory_consumer_type_desc as [mc type]
    ,c.memory_consumer_desc as [description], c.allocation_count as [allocs]
    ,c.allocated_bytes, c.used_bytes
from 
    sys.dm_db_xtp_memory_consumers c join
        sys.memory_optimized_tables_internal_attributes a on
            a.object_id = c.object_id and a.xtp_object_id = c.xtp_object_id
    left outer join sys.indexes i on
            c.object_id = i.object_id and 
            c.index_id = i.index_id and
            a.minor_id = 0 
where
    c.object_id = object_id('dbo.MemoryConsumers');

You can see the output of the query in Figure 3. The xtp_object_id column represents internal In-Memory OLTP object_id, which is different than SQL Server object_id.

03. Memory Consumers (In-Row Storage Only)

03. Memory Consumers (In-Row Storage Only)

As you can see, the table has three memory consumers. The Range Index Heap stores internal and leaf pages of nonclustered index. The Hash Index Heap stores the hash table of the index. Finally, the Table Heap stores actual table rows. Figure 4 illustrates that.

04. Memory Consumers

04. Memory Consumers

Now let’s see what will happen if we alter our table and add row-overflow and LOB columns using ALTER TABLE statement shown below. This alteration is offline operation, which rebuilds the table in the background.

alter table dbo.MemoryConsumers add
    RowOverflowCol varchar(8000),
    LOBCol varchar(max);

Now, if you get the list of memory consumers using the query from the first listing again, you would see the output as shown in Figure 5.

05. Memory Consumers with Off-Row Storage

05. Memory Consumers with Off-Row Storage

As you can see, both off-row columns introduce their own Range Index Heap and Table Heap memory consumers. In addition, LOB column adds LOB Page Allocator memory consumer (more about it later). The minor_id column indicates the column_id in the table to which memory consumers belong.

As you can guess from the output, SQL Server 2016 stores both, row-overflow and LOB columns in the separate internal tables. Those tables consist of 8-byte artificial primary key implemented as nonclustered index and off-row column value. The main row references off-row column through that artificial key, which is generated when row is created. It is worth repeating that this reference is done though the artificial value rather than the memory pointer.

This approach allows In-Memory OLTP to decouple off-row columns from the main row using different lifetime for them. For example, if you update the main row data without touching off-row columns, SQL Server would not generate new versions of off-row column rows avoiding large memory allocations. Vice versa, when only off-row data is modified, the main row stays intact.

In-Memory OLTP stores LOB data in the memory provided by LOB Page Allocator. That consumer is not limited to 8,060-byte row allocations and can allocate large amount of memory to store the data. The rows in the Table Heap of LOB columns contains pointers to the row data in LOB Page Allocator.

Let’s assume that we run several DML statements with imaginary Global Transaction Timestamp values as shown below.

-- Global Transaction Timestamp: 100
insert into dbo.MemoryConsumers(ID, Name, RowOverflowCol, LobCol)
values
(1,'Ann','A1',replicate(convert(varchar(max),'1'),100000))
(2,'Bob','B1',replicate(convert(varchar(max),'2'),100000));

-- Global Transaction Timestamp: 110
update dbo.MemoryConsumers set RowOverflowCol = 'B2' where ID = 2;

-- Global Transaction Timestamp: 120
update dbo.MemoryConsumers set Name= 'Greg' where ID = 2;

-- Global Transaction Timestamp: 130
update dbo.MemoryConsumers set LobCol = replicate(convert(varchar(max),'3'),100000) where ID = 1;

-- Global Transaction Timestamp: 140
delete from dbo.MemoryConsumers where ID = 1;

Figure 6 illustrates the state of the data and links between the rows. It is omitting hash table and nonclustered index structures in the main table along with internal pages of nonclustered indexes for off-row columns for simplicity sake.

06. In-Row and Off-Row Rows - Decoupled

06. In-Row and Off-Row Rows – Decoupled

Decoupling of in-row and off-row data reduces the overhead of creating extra row versions during data modifications. However, it will add additional overhead when you insert and delete the data. SQL Server should create several row objects on insert stage and update end timestamp of multiple rows during deletion. It also needs to maintain nonclustered indexes for off-row column tables.

There is also considerable overhead in terms of memory usage. Every non-empty off-row value adds 50+ bytes of the overhead regardless of its size. Those 50+ bytes consist of three artificial ID values (in-row, off-row in data row and leaf-level of the range index) and off-row data row structure. It is even larger in case of LOB columns where data is stored in LOB Page Allocator.

One of the key points to remember that decision which columns go off-row is made based on the table schema. This is very different from on-disk tables, where such decision is made on per-row basis and depends on the data row size. With on-disk tables, data is stored in row when it fits on the data page.

In-Memory OLTP works in the different way. (Max) columns are always stored off-row. For other columns, if the data row size in the table definition can exceed 8,060 bytes, SQL Server pushes largest variable-length column(s) off-row. Again, it does not depend on amount of the data you store there.

Let’s look at the example and create two tables of the similar schema. One of the tables has twenty varchar(3) columns while another uses twenty varchar(max) columns. Let’s populate those tables with 100,000 rows with 1-character value in each column. The code is shown in listing below.

create table dbo.DataInRow
(
    ID int not null
        constraint PK_DataInRow
        primary key nonclustered hash(ID)
        with (bucket_count = 262144)
    ,Col1 varchar(3) not null
    ,Col2 varchar(3) not null
    ,Col3 varchar(3) not null
    ,Col4 varchar(3) not null
    ,Col5 varchar(3) not null
    ,Col6 varchar(3) not null
    ,Col7 varchar(3) not null
    ,Col8 varchar(3) not null
    ,Col9 varchar(3) not null
    ,Col10 varchar(3) not null
    ,Col11 varchar(3) not null
    ,Col12 varchar(3) not null
    ,Col13 varchar(3) not null
    ,Col14 varchar(3) not null
    ,Col15 varchar(3) not null
    ,Col16 varchar(3) not null
    ,Col17 varchar(3) not null
    ,Col18 varchar(3) not null
    ,Col19 varchar(3) not null
    ,Col20 varchar(3) not null
)
with (memory_optimized = on, durability = schema_only);

create table dbo.DataOffRow
(
    ID int not null
        constraint PK_DataOffRow
        primary key nonclustered hash(ID)
        with (bucket_count = 262144)
    ,Col1 varchar(max) not null
    ,Col2 varchar(max) not null
    ,Col3 varchar(max) not null
    ,Col4 varchar(max) not null
    ,Col5 varchar(max) not null
    ,Col6 varchar(max) not null
    ,Col7 varchar(max) not null
    ,Col8 varchar(max) not null
    ,Col9 varchar(max) not null
    ,Col10 varchar(max) not null
    ,Col11 varchar(max) not null
    ,Col12 varchar(max) not null
    ,Col13 varchar(max) not null
    ,Col14 varchar(max) not null
    ,Col15 varchar(max) not null
    ,Col16 varchar(max) not null
    ,Col17 varchar(max) not null
    ,Col18 varchar(max) not null
    ,Col19 varchar(max) not null
    ,Col20 varchar(max) not null
)
with (memory_optimized = on, durability = schema_only);


set statistics time on
insert into dbo.DataInRow(ID,Col1,Col2,Col3,Col4,Col5,Col6,Col7,Col8,Col9,Col10,Col11,Col12,Col13,Col14,Col15,Col16,Col17,Col18,Col19,Col20)
    select Num,'0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0'
    from dbo.Numbers
    where Num <= 100000;

insert into dbo.DataOffRow(ID,Col1,Col2,Col3,Col4,Col5,Col6,Col7,Col8,Col9,Col10,Col11,Col12,Col13,Col14,Col15,Col16,Col17,Col18,Col19,Col20)
    select Num,'0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0'
    from dbo.Numbers
    where Num <= 100000;
set statistics time off

Figure 7 illustrates memory consumers in this scenario (in-row at top and off-row at the bottom). As you can see, every varchar(max) column adds another internal table to the mix.

07. Test Table Memory Consumers

07. Test Table Memory Consumers

The execution times of INSERT statements n my environment are 153 and 7,722 milliseconds respectively. With off-row storage, In-Memory OLTP needs to add data to twenty other internal tables, which is 40 times slower comparing to in-row storage.

Let’s look at the total memory usage of both tables using the queries below.

select 
   sum(c.allocated_bytes) / 1024 as [Allocated KB]
    ,sum(c.used_bytes) / 1024 as [Used KB]	
from 
    sys.dm_db_xtp_memory_consumers c join
        sys.memory_optimized_tables_internal_attributes a on
            a.object_id = c.object_id and a.xtp_object_id = c.xtp_object_id
    left outer join sys.indexes i on
            c.object_id = i.object_id and c.index_id = i.index_id
where
    c.object_id = object_id('dbo.DataInRow');

select 
   sum(c.allocated_bytes) / 1024 as [Allocated KB]
    ,sum(c.used_bytes) / 1024 as [Used KB]	
from 
    sys.dm_db_xtp_memory_consumers c join
        sys.memory_optimized_tables_internal_attributes a on
            a.object_id = c.object_id and a.xtp_object_id = c.xtp_object_id
    left outer join sys.indexes i on
            c.object_id = i.object_id and c.index_id = i.index_id
where
    c.object_id = object_id('dbo.DataOffRow');

As you can see in Figure 8, off-row storage uses about 252MB of RAM comparing to 12MB of RAM with in-row storage.

08. Test Tables Memory Usage

08. Test Tables Memory Usage

There is another important implication. Indexes defined on the table are not covering  off-row data. SQL Server needs to traverse nonclustered indexes on off-row column tables to obtain their values. Conceptually, it looks very similar to Key Lookup operations in on-disk tables done in reverse direction – from clustered to nonclustered indexes. Even though the overhead is significantly smaller comparing to on-disk tables, it is still the overhead you’d like to avoid.

You can see this overhead by running SELECT statements shown below. In case of off-row data, SQL Server will have to go through every internal table for each row.

select count(*)
from dbo.DataInRow
where Col1='0' and Col2='0' and Col3='0' and Col4='0' and Col5='0' and Col6='0' and Col7='0' and Col8='0' and Col9='0' and Col10='0' and Col11='0' and Col12='0' and Col13='0' and Col14='0' and Col15='0' and Col16='0' and Col17='0' and Col18='0' and Col19='0' and Col20='0';

select count(*)
from dbo.DataOffRow
where Col1='0' and Col2='0' and Col3='0' and Col4='0' and Col5='0' and Col6='0' and Col7='0' and Col8='0' and Col9='0' and Col10='0' and Col11='0' and Col12='0' and Col13='0' and Col14='0' and Col15='0' and Col16='0' and Col17='0' and Col18='0' and Col19='0' and Col20='0';

Execution time in my environment is 73 milliseconds for in-row data and 1,662 milliseconds for off-row data.

Finally, let’s look what happen when we delete the data from the tables by using the code below.

delete from dbo.DataInRow;
delete from dbo.DataOffRow;

As before, in-row storage is significantly faster – 28 milliseconds vs. 1,290 milliseconds. As you can guess In-Memory OLTP had to update end timestamp in every row in off-row tables.

You should remember about this behavior and avoid off-row storage unless you have legitimate reasons to use such columns. It is clearly the bad idea to define text columns as (n)varchar(max) just in case – when you do not store large amount of data there. As you see, off-row storage comes at very high cost.

Source code is available for download.

Slide deck on the topic is also available.

How To Move Data and Log Files To Different Drives Keeping the Database Online in Non-Enterprise Editions of SQL Server

We, SQL Server professionals, like Enterprise Edition. It has many bells and whistles that make our life easier and less stressful. We wish to have Enterprise Edition installed on every server. Unfortunately, customers do not always share our opinions – they want to save money. More often than not, they choose to go with the Standard Edition, which is significantly less expensive.

From performance standpoint, Standard Edition would suffice in many cases. Even though it lacks several nice features, it would work just fine even in large and busy systems. I dealt with many multi-TB installations that handled thousands transactions per second using Standard Edition of SQL Server.

Nevertheless, Standard edition lacks many of availability features offered in Enterprise Edition. Most important is index management. You cannot rebuild indexes keeping the table online. There are some tricks that can help reducing index rebuild time; however, it would not help much with the large tables.

This limitation has another interesting implication. In Standard Edition you cannot rebuild the indexes moving data to another filegroup transparently to the users. One of the cases when such an ability is very important is changing the database disk layout when you are upgrading disk subsystem. Obviously, it is very easy to do offline – this is just the matter of copying database files. However, even with the fast disk subsystem, that can take hours in multi-TB databases, which could violate your availability SLA.

This is especially critical with the Cloud installations where I/O subsystem is usually the biggest bottleneck due to the bad I/O performance. The situation, however, is starting to change. Both, Microsoft Azure and Amazon AWS now offer fast SSD-based I/O solutions for very reasonable price. Unfortunately, the old installations were usually deployed to the old and slow disks and upgrading to the new drives will often lead to the hours of the downtime.

Fortunately, you can move data to the different disk arrays almost transparently to the users even in non-Enterprise Editions of SQL Servers. There are two ways how to accomplish it. The first one is very simple and can be done if system uses database mirroring. It requires failovers and secondary server downtime, which could lead to the data loss in case of disaster.

The second approach works without the mirroring. It is slow, it generates large amount of transaction log records, it introduces huge index fragmentation; however, it keeps database online most of the time. There is still the downtime involved; although, it could be limited to just a few minutes. It will work in any SQL Server version and edition – well, to be frank, I have not tried it in SQL Server 2000 yet.

Let’s look at both of those approaches in details.

Moving database files with mirroring Involved

Database mirroring and, as matter of fact, Always On Availability Groups rely on the stream of transaction log records. Secondary servers apply the changes in the data files using file and page IDs as the reference. With exception of database file-related operations, for example file creation, primary and secondary servers do not need to store database files in the same location – it is possible to use different disk and folder structure on the servers.

You can rely on this behavior if you need to move database files to the different drives. You can run ALTER DATABASE MODIFY FILE(FILENAME=..) command on the secondary server, which will allow you to update data and log files paths in the system catalogs. Everything will continue run normally – those changes would not take place until the next database restart.

Unfortunately, you cannot take database that participate in the mirroring session offline and you need to shut down entire instance of SQL Server. After that, you can physically move database files to the new location. On the primary server, the database mirroring will switch to the DISCONNECTED state. The database will continue to be available to the clients; however, it remains unprotected – all changes will be lost in case of disaster. You need to remember that file copy operation can take hours and you need to evaluate if you can take such a risk. It is also worth to mention that transaction log on the primary would not truncate and continue to grow even after log backups – SQL Server needs to retain the log records until they sent to the secondary server.

After the file copy operation is completed, you can start the instance (the primary database will switch to SYNCHRONIZING state) and wait until all log records have been transmitted to the secondary (SYNCHRONIZED state). Then, you can failover and wash, rinse and repeat the process on the former primary server.

To summarize, this process is very simple and transparent to the client applications. It is the good choice as long as you can afford the instance downtime and possibility of  data loss in case of disaster. If this is not the case, you will have to use much more complicated approach.

When mirroring is not an option..

.. our life is much more complicated and the process will require multiple steps to complete.

  1. We need to create the new data files in the secondary filegroups and shrink existing files by using DBCC SHRINKFILE(EMPTYFILE) command. This will move data from old to the new data files.
  2. Next, we need to repeat the same process with the primary filegroup. You cannot remove primary MDF file from the database; although, you can make it very small and move all data from there.
  3. Next, we need to shrink the transaction log .
  4. Finally, we need to copy MDF and LDF files to the new location. This is offline operation; however, both, MDF and LDF data files are small at this point and downtime is minimal.

Let’s look at the process in details. As the first step, let’s create the test database with two filegroups and populate it with some data. For the demo purposes, I am assuming that C:\OldDrive folder represents old and C:\NewDrive – new disk arrays respectively.

create database DataMovementDemo
on primary
( name = N'DataMovementDemo', filename = N'C:\OldDrive\DataMovementDemo.mdf', size = 100MB, filegrowth = 50MB),
filegroup [Secondary]
( name = N'DataMovementDemo_Secondary1', filename = N'C:\OldDrive\DataMovementDemo_Secondary1.ndf', size = 100MB, filegrowth = 50MB),
( name = N'DataMovementDemo_Secondary2', filename = N'C:\OldDrive\DataMovementDemo_Secondary2.ndf', size = 100MB, filegrowth = 50MB)
log on
( name = N'DataMovementDemo_log', filename = N'C:\OldDrive\DataMovementDemo_log.ldf', size = 500MB, filegrowth = 500MB)
Go

alter database DataMovementDemo set recovery full
go

use DataMovementDemo
go

create table dbo.DataOnPrimary
(
    ID int not null,
    Placeholder char(8000),
    constraint PK_DataOnPrimary
    primary key clustered(ID)
    on [Primary]
);

create table dbo.DataOnSecondary
(
    ID int not null,
    Placeholder char(8000),
    constraint PK_DataOnSecondary
    primary key clustered(ID)
    on [Secondary]
);

;with N1(C) as (select 0 union all select 0) -- 2 rows
,N2(C) as (select 0 from N1 as T1 cross join N1 as T2) -- 4 rows
,N3(C) as (select 0 from N2 as T1 cross join N2 as T2) -- 16 rows
,N4(C) as (select 0 from N3 as T1 cross join N3 as T2) -- 256 rows
,N5(C) as (select 0 from N4 as T1 cross join N4 as T2 ) -- 65,536 rows
,Nums(Num) as (select row_number() over (order by (select null)) from N5)
    insert into dbo.DataOnPrimary(ID)
        select Num from Nums;

insert into dbo.DataOnSecondary(ID)
    select ID from dbo.DataOnPrimary;

We can check the size of the data and log files along with their free space with the code below.

select 
    f.name as [FileName]
    ,fg.name as [FileGroup]
    ,f.physical_name as [Path]
    ,f.size / 128.0 as [CurrentSizeMB]
    ,convert(int,fileproperty(f.name,'SpaceUsed')) / 
        128.0 as [UsedSpaceMB]
    ,f.size / 128.0 - convert(int,fileproperty(f.name,'SpaceUsed')) / 
        128.0 as [FreeSpaceMb]
from 
    sys.database_files f left join sys.filegroups fg on 
        f.data_space_id = fg.data_space_id;

Figure 1 shows the output of the statement.

01. Database file stats after database creation

Moving data files from secondary filegroups

As the first step, you need to create new data files on the target drive. You can keep the same number of files as before, or use this as the opportunity to change the filegroup layout. In general, the number of files in the fielgroup greatly depends on the volatility of the data. Every data file has its own set of allocation map pages, which reduces the contention during page and extent allocations. It is impossible to give the general advice on how many files to create – I usually start with four files per filegroup unless the data is extremely volatile and the filegroup handles hundreds or even thousands of inserts per second. You can monitor and analyze PAGELATCH waits to see if there is the contention and adjust the number of the files accordingly.

In our example, let’s create two data files on C:\NewDrive folder as shown below. Make sure that both files have exactly the same initial size and autogrowth parameters specified in MB. This will help SQL Server to evenly distribute data between them.

alter database DataMovementDemo add file 
( name = N'DataMovementDemo_NewSecondary1', filename = N'C:\NewDrive\DataMovementDemo_NewSecondary1.ndf', size = 250MB, filegrowth = 50MB )
to filegroup [Secondary];

alter database DataMovementDemo add file 
( name = N'DataMovementDemo_NewSecondary2', filename = N'C:\NewDrive\DataMovementDemo_NewSecondary2.ndf', size = 250MB, filegrowth = 50MB )
to filegroup [Secondary];

Figure 2 shows the output from the Listing 2 after new files were created.

02. File stats after new files has been created

The DBCC SHIRINKFILE command works in a very simple manner. It reads allocated extents from the end of the file and moves them to the other files in the filegroup. In case, if  filegroup has multiple files, SQL Server uses proportional fill algorithm choosing to which file those extents need to be moved. The choice depends on amount of free space in the file – more space file has, more data would be copied there.

In case, when the filegroup originally has more than one file, you would like to avoid an overhead of moving data to the file, which yet to be moved. In our case, you want the data from DataMovementDemo_Secondary1 file to be distributed only between DataMovementDemo_NewSecondary1 and DataMovementDemo_NewSecondary2 files avoiding any inserts into DataMovementDemo_Secondary2 file.

Usually, data files in production databases do not have excessive amount of free space. When this is the case, you can simply prevent unnecessary data movements by restricting auto-growth of the old files. However, if those files have large amount of free space, you can also consider to shrink them and release this space first. There is the catch though. If free space is located in the beginning of the data file, shrink operation would start data movement and introduce the overhead. You need to make decision how to proceed on case by case basis.

The next listing shows how you can restrict the auto-growth for the file.

declare
    @MaxFileSizeMB int
    ,@SQL nvarchar(max)
	
-- Obtaining current file size 	
select @MaxFileSizeMB = size / 128 + 1
from sys.database_files
where name = 'DataMovementDemo_Secondary2';

set @SQL = N'alter database DataMovementDemo 
modify file(name=N''DataMovementDemo_Secondary2'',maxsize=' + 
    convert(nvarchar(32),@MaxFileSizeMB) + N'MB);';

exec sp_executesql @SQL;

Now we are ready to process the first data file. Listing below shows the code that performs data movement and removes an empty file from the filegroup afterwards. Both operations are transparent to the users and client applications. It is worth mentioning that you can use the code from the second listing above to monitor the progress of the operation. You can also look at percent_complete column in sys.dm_exec_requests view.

dbcc shrinkfile(DataMovementDemo_Secondary1, emptyfile);
alter database DataMovementDemo remove file DataMovementDemo_Secondary1;

If you checked the status of the files after operation is completed, you would see the results as shown in Figure 3. As you see, the data from the data file has been distributed between other files in the filegroup.

03. File stats after the first file has been processed

You can use exactly the same approach to move data from the DataMovementDemo_Secondary2 file. Listing shows the code and Figure 4 shows the database files after the process is completed. As you see, the secondary filegroup now resides on the new drive.

dbcc shrinkfile(DataMovementDemo_Secondary2, emptyfile);
alter database DataMovementDemo remove file DataMovementDemo_Secondary2;

04. File stats after the second file movement

The word of caution. As I already mentioned, DBCC SHRINKFILE  generates enormous amount of transaction log records. Make sure that transaction log is truncating especially if the database uses FULL recovery model.

Moving primary data file

Even though many of us know about the best practice of keeping PRIMARY filegroup empty, it rarely followed. Majority of production databases keep the data in PRIMARY filegroup, which usually consist of the single MDF file.

Unfortunately, you cannot remove nor change the primary data file in the database. Moreover, you cannot shrink the file below the size of the data currently stored in the file, even if a filegroup has the other data files.

Fortunately, you can still use DBCC SHRINKFILE(EMPTYFILE) command on MDF data file. It would move data to the other files in the filegroup and failing on the final stage of the execution with the error message shown in Figure 5. Nevertheless, the majority of the data from the MDF data file would be moved to the other files.

05. DBCC SHRINKFILE(EMPTYFILE) error on the primary data file

Listing below shows the code that performs this action. It creates the second data file in PRIMARY filegroup and moves the data from MDF file there. Figure 6 shows the file stats after it is completed – after DBCC SHRINKFILE(EMPTYFILE) error.

alter database DataMovementDemo add file 
( name = N'DataMovementDemo_Primary2', filename = N'C:\NewDrive\DataMovementDemo_Primary2.ndf', size = 550MB, filegrowth = 50MB )
to filegroup [Primary];
go

-- It will error in the end
dbcc shrinkfile(DataMovementDemo, emptyfile);

06.File stats after DBCC SHRINKFILE(EMPTYFILE) error

As you see, MDF data file is pretty much empty. You can release the empty space from the file using DBCC SHRINKFILE(DataMovementDemo, 10) command. Figure 7 illustrates the situation after it is completed and MDF file became very small.

07. File stats after removal free space from MDF file

Unfortunately, this approach introduces two or more unevenly sized data files in the PRIMARY filegroup, which makes proportional fill algorithm less efficient. It may or may not be a problem in your system, depending on how volatile is the data. You can address it after you move MDF file to the new drive by creating other data files in PRIMARY filegroup and shrinking and emptying the file you just created. This will distribute the data in all files in the filegroup evenly.

Finally, it is worth mentioning that in some cases, especially when MDF file is very large, DBCC SHRINKFILE(EMPTYFILE) command can error in the middle of the execution stating that it cannot move some of the data pages that belong to the system objects. You can address it by re-running DBCC SHRINKFILE using the current data size as the target (e.g. releasing the empty space from the file). This will move those data pages within the file and you can re-run DBCC SHRINKFILE(EMPTYFILE) command afterwards.

Shrinking transaction log

The decision how to handle transaction log depends on its size, and backup and high availability strategies you have in place. Transaction log size affects time, which file copy operation will require and, therefore, the system downtime. Obviously, the simplest solution is avoid shrinking transaction log if the log file is not very large and downtime is acceptable.

In case, if you need to reduce the downtime, there are no options but shrinking the log file. It is usually not a problem in case if database uses the SIMPLE recovery model. However, with FULL recovery model situation is a bit more complicated.

As the first step in this process, you need to truncate the log by performing the log backup. This operation does not decrease the size of the log file but rather reduce the size of the active/used portion of the log. Keep in mind that open transactions, backlogs in high availability log record queues and a few others factors can prevent transaction log from being truncated.

Next, you can shrink the log file using DBCC SHRINKFILE command with the very small size- 50MB, for example- as the target. Your results may vary. Internally, SQL Server splits the log to the multiple blocks called Virtual Log Files and re-uses them in the circular matter. Shrink operation would release the empty space from the tail of the log; however, the resulting file size depends on the active VLF offsets in the file.  It is entirely possible that shrink command would not reduce the file size if active VLFs are close to the end of the file.

Let’s illustrate that with the example and run DBCC LOGINFO command, which shows the status of the VLFs in the log file. Figure 8 illustrates the partial output from our test database. Status value of 2 indicates that VLF is active and cannot be truncated. As you can see, it is in the middle of the file.

08. DBCC LOGINFO output

As you saw in Figure 7, the log file is using just 61MB out of 1.5GB in the file. However, if you run DBCC SHRINKFILE(DataMovementDemo_Log, 65) command, you’d see that it did not shrink beyond 1000MB as shown in Figure 9.

09. File stats after shrinking transaction log

If you run DBCC LOGINFO again, you would see that SQL Server removed the empty VLFs from the end of the file and stopped when it reached the active VLF there. Figure 10 illustrates that. It is also worth mentioning that the first VLF in the file also became active during the shrink.

10. DBCC LOGINFO output after shrinking transaction log

At this point you have the two options, assuming that size of the log file is still unacceptable. You can wait until SQL Server truncates the log making last VLF inactive and repeat the shrink operation afterwards. This will eventually happen with the regular workflow. You can even force this by generating transaction log records by creating the table with one CHAR(8000) column and inserting multiple rows there in the separate transactions and batches. Do not forget to force log truncation with BACKUP LOG operations and use DBCC LOGINFO to monitor the progress.

Alternative option is switching database to the SIMPLE recovery model using ALTER DATABASE SET RECOVERY SIMPLE command. This will perform log truncation and will allow you to shrink the log to the minimal size immediately. Obviously, this approach will require you to disable transaction log-based high availability technologies and recreate backup chain afterwards.

While, on the surface, switching database to SIMPLE recovery model introduces unnecessary complications, it could be the good opportunity to rebuild transaction log file. Large number of VLFs negatively affect system performance and can slow down database recovery time. Unfortunately, default settings in New Database dialog in Management Studio leads to that situation. At least in SQL Server prior 2016.

You can rebuild transaction log after you moved the file to the new drive by manually growing it in 4000MB chunks – do not use 4GB chunks due to the bug in some of SQL Server versions. Every chunk will generate 16 250MB VLF files, which works well for the most configurations.  After that, change log auto-growth to be in MB – I found that 1000MB chunks are good for majority of the cases.

Moving MDF and LOG files to the new drive

Finally, it is the time to move MDF and LDF files to the new drive. Unfortunately, it is offline operation. Fortunately, at this point, both files should be very small and downtime should be minimal.

As the first step, you need to change location of the files using ALTER DATABASE MODIFY FILE command. This will change location of the files in the system catalogs, and will take an effect after the database restart.

Next, you can take database offline using ALTER DATABASE .. SET OFFLINE WITH ROLLBACK IMMEDIATE command. This will disconnect all users from the database rolling back the active transactions. You can copy the files and take database back online using ALTER DATABASE .. SET ONLINE command as shown below.

use master
go

alter database DataMovementDemo modify file
( name = N'DataMovementDemo', filename = N'C:\NewDrive\DataMovementDemo.mdf');
alter database DataMovementDemo modify file
( name = N'DataMovementDemo_Log', filename = N'C:\NewDrive\DataMovementDemo_Log.ldf');
go

alter database DataMovementDemo set offline with rollback immediate
go

-- COPY FILES 

alter database DataMovementDemo set online;

As you see in Figure 11 – our database is now residing on the new drive.

11. Final file layout

Almost done! Again, do not forget to rebuild the log file and switch database to FULL recovery model if needed.

Dealing with index fragmentation

There is one final step though. As you already know, DBCC SHRINKFILE command works on the extent level. It moves allocated extents from the end of the file to the new place without any considerations to which objects those extents belong. As you can guess, this leads to the huge index fragmentation, which you need to address at the final stage of the process.

Obviously, you do not want to acquire Schema Modification (SCH-M) locks blocking access to the tables during index rebuild operations. It makes index reorg the better choice for this scenario – it is online in any edition of SQL Server.

Listing below shows the script you can use to generate ALTER INDEX REORG commands for every clustered and nonclustered indexes from specific filegroup. It returns result as XML string – you can simply cut and paste it and run as another batch.

;with FGObjects(SchemaName, TableName, IndexName, RowNum, Cnt)
as
(
    select 
        s.Name, t.Name, i.Name
        ,ROW_NUMBER() over(order by t.object_id, i.index_id) as RowNum
        ,COUNT(*) over() as Cnt
    from
        sys.indexes i join sys.filegroups f on
            i.data_space_id = f.data_space_id
        join sys.all_objects t on
            i.object_id = t.object_id
        join sys.schemas s on
            t.schema_id = s.schema_id
    where
        i.index_id >= 1 and
        t.type = 'U' and -- User Created Tables
        i.data_space_id = f.data_space_id and
        f.name = 'PRIMARY' -- Filegroup
)
select 
    'alter index ' as [text()]
    ,[IndexName] as [text()]
    ,' on ' + SchemaName + '.' as [text()]
    ,[TableName] as [text()]
    ,' reorganize;' + CHAR(13) + CHAR(10) as [text()]
    ,'raiserror(''' as [text()]
    ,RowNum as [text()]
    ,'/' as [text()]
    ,Cnt as [text()]
    ,' is done'',0,1) with nowait;' + CHAR(13) + CHAR(10) as [text()]
    ,'go' + CHAR(13) + CHAR(10) as [text()]
from FGObjects
for xml path('');

As you see, the process of moving database files between different drives could lead to significant amount of work if you want to minimize the downtime. However, it is often the only choice, especially in the Cloud environment where you can get significant performance benefits by utilizing new SSD-based drives. Go for it! 🙂

Source code is available for download.

Thinking Outside of In-Memory Box: Utilizing In-Memory OLTP as the Session- or Object-Store

Modern software systems have become extremely complex. They consist of a large number of components and services responsible for various tasks. They must be scalable and redundant and need to be able to handle load growth and survive hardware failures and crashes.

The common approach to solving scalability and redundancy issues is to design the systems in a way that permits to deploy and run multiple instances of individual services. It allows adding more servers and instances as the load grows and helps you survive hardware failures by distributing the load across other active servers. The services are usually implemented in stateless way, and they don’t store or rely on any local data.

Most systems, however, have data that needs to be shared across the instances. For example, front-end web servers often need to maintain web session states. Back-end processing services often need to have shared cache with some data.

Historically, there were two approaches to address this issue. The first one was to use dedicated storage/cache and host it somewhere in the system. Remember the old ASP.Net model that used either a SQL Server database or a separate web server to store session data? The problem with this approach is limited scalability and redundancy. Storing session data in web server memory is fast but it is not redundant. A SQL Server database, on the other hand, can be protected but it does not scale well under the load due to page latch contention and other issues.

Another approach was to replicate content of the cache across multiple servers. Each instance worked with the local copy of the cache while another background process distributed the changesto the other servers. Several solutions on the market provide such capability; however, they are usually expensive. In some cases, the license cost for such software could be in the same order of magnitude as SQL Server licenses.

Fortunately, you can use In-Memory OLTP as the solution. In the nutshell, it looks similar to the ASP.Net SQL Server session-store model; however, In-Memory OLTP throughput and performance improvements address the scalability issues of the old on-disk solution. You can improve performance even further by using non-durable memory-optimized tables. Even though the data will be lost in case of failover, this is acceptable in most cases.

However, the 8,060-byte maximum row size limit introduces challenges to the implementation. It is entirely possible that a serialized object will exceed 8,060 bytes. You can address this by splitting the data into multiple chunks and storing them in multiple rows in memory-optimized table.

You saw an example of a T-SQL implementation in my previous blog post. However, using T-SQL code and an interop engine will significantly decrease the throughput of the solution. It is better to manage serialization and split/merge functional on the client side.

Let’s look at the oversimplified example and see how we can handle that in the client code. The first listing below creates the table that we will use to store the data along with three stored procedures to load and save data to/from the table.

create table dbo.SessionStore
(
     ObjectKey uniqueidentifier not null,
     ExpirationTime datetime2(2) not null,
     ChunkNum smallint not null,
     Data varbinary(8000) not null,
 
     constraint PK_ObjStore 
     primary key nonclustered hash (ObjectKey, ChunkNum)
     with (bucket_count=1048576),

     index IDX_ObjectKey
     nonclustered hash(ObjectKey)
     with (bucket_count=1048576)
)
with (memory_optimized = on, durability = schema_only);
go 

create type dbo.tvpObjData as table
(
     ChunkNum smallint not null
          primary key nonclustered hash
          with (bucket_count = 128),
     Data varbinary(8000) not null
)
with(memory_optimized=on)
go 

create proc dbo.SaveObjectToStore
(
     @ObjectKey uniqueidentifier
     ,@ExpirationTime datetime2(2)
     ,@ObjData dbo.tvpObjData readonly 
)
with native_compilation, schemabinding, exec as owner
as
begin atomic with
(
     transaction isolation level = snapshot
     ,language = N'English'
)
     delete dbo.SessionStore
     where ObjectKey = @ObjectKey

     insert into dbo.SessionStore(ObjectKey, ExpirationTime, ChunkNum, Data)
          select @ObjectKey, @ExpirationTime, ChunkNum, Data
          from @ObjData
end
go

create proc dbo.SaveObjectToStore_Row
(
     @ObjectKey uniqueidentifier
     ,@ExpirationTime datetime2(2)
     ,@ObjData varbinary(8000) 
)
with native_compilation, schemabinding, exec as owner
as
begin atomic with
(
     transaction isolation level = snapshot
     ,language = N'English'
)
     delete dbo.SessionStore
     where ObjectKey = @ObjectKey

     insert into dbo.SessionStore(ObjectKey, ExpirationTime, ChunkNum, Data)
     values(@ObjectKey, @ExpirationTime, 1, @ObjData)
end
go

create proc dbo.LoadObjectFromStore
(
     @ObjectKey uniqueidentifier not null
)
with native_compilation, schemabinding, exec as owner
as
begin atomic
with
(
     transaction isolation level = snapshot
     ,language = N'English'
)
     select t.Data
     from dbo.SessionStore t
     where t.ObjectKey = @ObjectKey and ExpirationTime >= sysutcdatetime()
     order by t.ChunkNum 
end

As you can see, there are two different stored procedures that save data to the table. The first one – dbo.SaveObjectToStore – uses memory-optimized table-valued parameter and can be used in the case, when serialized object data is greater than 8,000 bytes. The second stored procedure – – dbo.SaveObjectToStore_Row – accepts varbinary(8000) parameter and can be used if serialized object is within 8,000-byte range. This is strictly for optimization purposes. Even though memory-optimized table-valued parameters are very fast, they are still slower compating to the regular parameter.

The client code would contain several static classes. The first ObjStoreUtils class provides four methods to serialize and deserialize objects into the byte arrays, and split and merge those arrays to/from 8,000-byte chunks. You can see the code below.

public static class ObjStoreUtils
{
     // Serialize object of type T to the byte array
     public static byte[] Serialize(T obj)
     {
          using (var ms = new MemoryStream())
          {
               var formatter = new BinaryFormatter();
               formatter.Serialize(ms, obj);

               return ms.ToArray();
          }
     }

     // Deserialize byte array to the object 
     public static T Deserialize(byte[] data)
     {
          using (var output = new MemoryStream(data))
          {
               var binForm = new BinaryFormatter();
               return (T)binForm.Deserialize(output);
          }
     }

     /// Split byte array to the multiple chunks
     public static List<byte[]> Split(byte[] data, int chunkSize)
     {
          var result = new List<byte[]>();

          for (int i = 0; i < data.Length; i += chunkSize) { int currentChunkSize = chunkSize; if (i + chunkSize > data.Length)
                    currentChunkSize = data.Length - i;

               var buffer = new byte[currentChunkSize];
               Array.Copy(data, i, buffer, 0, currentChunkSize);

               result.Add(buffer);
          }
          return result;
     }

     // Combine multiple chunks into the byte array
     public static byte[] Merge(List<byte[]> arrays)
     {
          var rv = new byte[arrays.Sum(a => a.Length)];
          int offset = 0;
          foreach (byte[] array in arrays)
          {
               Buffer.BlockCopy(array, 0, rv, offset, array.Length);
               offset += array.Length;
          }
          return rv;
     }
}(

The ObjStoreDataAccess class shown in the next listing, loads and saves binary data to and from the database. It utilizes another static class – DBConnManager, which returns the SqlConnection object to the target database. This class is not shown there.

public static class ObjStoreDataAccess
{
    // Saves data to the database
    public static void SaveObjectData(Guid key,
                DateTime expirationTime, List<byte[]> chunks)
    {
        if (chunks == null || chunks.Count == 0)
            return;

        using (var cnn = DBConnManager.GetConnection())
        {
            using (var cmd = cnn.CreateCommand())
            {
                cmd.CommandType = CommandType.StoredProcedure;
                cmd.Parameters.Add("@ObjectKey",
                    SqlDbType.UniqueIdentifier).Value = key;
                cmd.Parameters.Add("@ExpirationTime",
                    SqlDbType.DateTime2).Value = expirationTime;

                if (chunks.Count == 1)
                {
                    cmd.CommandText = "dbo.SaveObjectToStore_Row";
                    cmd.Parameters.Add("@ObjData", 
                        SqlDbType.VarBinary, 8000).Value = chunks[0];
                }
                else
                {
                    cmd.CommandText = "dbo.SaveObjectToStore";
                    var tvp = new DataTable();
                    tvp.Columns.Add("ChunkNum", typeof(short));
                    tvp.Columns.Add("ChunkData", typeof(byte[]));

                    for (int i = 0; i < chunks.Count; i++)
                        tvp.Rows.Add(i, chunks[i]);

                    var tvpParam = new SqlParameter("@ObjData",
                         SqlDbType.Structured)
                    {
                        TypeName = "dbo.tvpObjData",
                        Value = tvp
                    };

                    cmd.Parameters.Add(tvpParam);

                }
                cmd.ExecuteNonQuery();
            }
        }
    }

    // Load data from the database
    public List<byte[]> LoadObjectData(Guid key)
    {
        using (var cnn = DBConnManager.GetConnection())
        {
            using (var cmd = cnn.CreateCommand())
            {
                cmd.CommandText = "dbo.LoadObjectFromStore";
                cmd.CommandType = CommandType.StoredProcedure;
                cmd.Parameters.Add("ObjectKey",
                    SqlDbType.UniqueIdentifier).Value = key;

                var result = new List<byte[]>();
                using (var reader = cmd.ExecuteReader())
                {
                    while (reader.Read())
                        result.Add((byte[])reader["Data"]);
                }
                return result;
            }
        }
    }
}

Finally, the ObjStoreService class shown below puts everything together and manages the entire process. It implements two simple methods,Load and Save, calling the helper classes defined above.

public static class ObjStoreService
{
    private const int MaxChunkSize = 8000;

    // Saves object in the object store
    public static void Save(Guid key, 
                DateTime expirationTime, object obj)
    {
        var objectBytes = ObjStoreUtils.Serialize(obj);
        var chunks = ObjStoreUtils.Split(objectBytes, MaxChunkSize);

        ObjStoreDataAccess.SaveObjectData(key, expirationTime, chunks);
    }

    // Loads object from the object store
    public static T Load(Guid key) where T: class 
    {
        var chunks = ObjStoreDataAccess.LoadObjectData(key);
        if (chunks.Count == 0) 
            return null;
        var objectBytes = ObjStoreUtils.Merge(chunks);

        return ObjStoreUtils.Deserialize(objectBytes);
    }
}

Obviously, this is oversimplified example, which I used just to illustrate the concept. Production implementation could be significantly more complex, especially if there is the possibility that multiple sessions can update the same object simultaneously. You can implement retry logic using the similar approach with what we did enforcing uniqueness/referential integrity or create some sort of object locking management in the system if this is the case.

It is also worth mentioning that you can compress binary data before saving it into the database. The compression will introduce unnecessary overhead in the case of small objects; however, it could provide significant space savings and performance improvements if the objects are large. I did not include compression code in the example, although you can easily implement it with the GZipStream or DeflateStream classes.

You can download the demo application from “Expert SQL Server In-Memory OLTP” Companion materials. It has slightly different implementation – I denormalized classes a little bit to reduce C# code overhead during the demos when it is running on the same box with SQL Server. However, it is very similar to what you saw in this post.

P.S. I want to thank Vladimir Zatuliveter (zatuliveter at gmail dot com) for his help with the code.

“Expert SQL Server In-Memory OLTP” Has Been Published

It has been very eventful week. On Thursday, September 17th, I’ve presented at 24 Hours of Pass – gave the sneak peek of my PASS Summit 2015 pre-con.  By the way, slide deck and demos are available for download from my Presentations page.

However, the biggest news for me is the release of my second book – “Expert SQL Server In-Memory OLTP”. It is a bit late – we are all waiting for In-Memory OLTP improvements in SQL Server 2016 but still.. I hope some people will find it useful.

I think that Microsoft’ implementation of In-Memory OLTP as quite unique due to the level of integration with the classic SQL Server Engine and its simplicity. As all of us know, it is possible to move data into memory with just a handful of the mouse clicks. However, this simplicity is two-edged sword – it is very easy to make incorrect implementation decisions and hurt  system performance rather than improve it. My goal was to explain how the technology works under the hood and show when and how to develop, deploy and administer the solutions that utilize In-Memory OLTP.

In the nutshell, I’d consider this book as the follow-up on In-Memory OLTP part from my Pro SQL Server Internals book. You would find some familiar content if you read it; however, it is much deeper dive into the technology. I’ve also covered the large number of practical questions – for example, how to benefit from the technology in case, if full in-memory migration is cost ineffective.

You can look at the Table of Content and download companion materials from my Publications page.

Finally, I would like to thank my technical reviewer Sergey Olontsev (MVP, MCM) who is working with the very large In-Memory OLTP implementation on the daily basis. His help was invaluable!

And, of course, it would be impossible to do without all of you! Thank you very much for all your help, feedback and support!

Thinking Outside of In-Memory Box: Addressing 8,060-byte Maximum Row Size Limitation

The 8,060-byte maximum row size limit is, perhaps, one of the biggest roadblocks in widespread In-Memory OLTP adoption. This limitation essentially prevents you from using (max) data types along with CLR and system data types that require off-row storage, such as XML, geometry, geography and a few others. Even though you can address this by changing the database schema and T-SQL code, these changes are often expensive and time consuming.

When you encounter such a situation, you should analyze if LOB data types are required in the first place. It is not uncommon to see a column that never stores more than a few hundred characters defined as (n)varchar(max). Consider an Order Entry system and DeliveryInstruction column in the Orders table. You can safely limit the size of the column to 500-1,000 characters without compromising the business requirements of the system.

Another example is a system that collects some semistructured sensor data from the devices and stores it in the XML column. If the amount of semistructured data is relatively small, you can store it in varbinary(N) column, which will allow you to move the table into memory.

Unfortunately, sometimes it is impossible to change the data types and you have to keep LOB columns in the tables. Nevertheless, you have a couple options to proceed.

The first approach is to split data between two tables, storing the key attributes in memory-optimized and rarely-accessed LOB attributes in on-disk tables. Again, consider the situation where you have an Order Entry system with the Products table defined as shown in Listing below

create table dbo.Products
(
    ProductId int not null identity(1,1),
    ProductName nvarchar(64) not null,
    ShortDescription nvarchar(256) not null,
    Description nvarchar(max) not null,
    Picture varbinary(max) null,

    constraint PK_Products
    primary key clustered(ProductId)
)

As you can guess, in this scenario, it is impossible to change the data types of the Picture and Description columns, which prevents you from making the Products table memory-optimized. However, you can split that table into two, as shown below. The Picture and Description columns are stored in an on-disk table while all other columns are stored in the memory-optimized table. This approach will improve performance for the queries against the ProductsInMem table and will allow you to access it from natively compiled stored procedures in the system.

create table dbo.ProductsInMem
(
    ProductId int not null identity(1,1)
        constraint PK_ProductsInMem
        primary key nonclustered hash
        with (bucket_count = 65536),
    ProductName nvarchar(64) 
        collate Latin1_General_100_BIN2 not null,
    ShortDescription nvarchar(256) not null,

    index IDX_ProductsInMem_ProductName 
    nonclustered(ProductName)
)
with (memory_optimized = on, durability = schema_and_data);

create table dbo.ProductAttributes
(
    ProductId int not null,
    Description nvarchar(max) not null,
    Picture varbinary(max) null,
	
    constraint PK_ProductAttributes
    primary key clustered(ProductId)
);

Unfortunately, it is impossible to define a foreign key constraint referencing a memory-optimized table, and you should support referential integrity in your code. We have already looked at one of the possible approaches in my previous blog post.

You can hide some of the implementation details from the SELECT queries by defining a view as shown below. You can also define INSTEAD OF triggers on the view and use it as the target for data modifications; however, it is more efficient to update data in the tables directly.

create view dbo.Products(ProductId, ProductName, 
    ShortDescription, Description, Picture)
as
    select 
        p.ProductId, p.ProductName, p.ShortDescription
        ,pa.Description, pa.Picture
    from 
        dbo.ProductsInMem p left outer join 
            dbo.ProductAttributes pa on
                p.ProductId = pa.ProductId

As you should notice, the view is using an outer join. This allows SQL Server to perform join elimination when the client application does not reference any columns from the ProductAttributes table when querying the view. For example, if you ran SELECT ProductId, ProductName from dbo.Products, you would see the execution plan as shown in Figure 1. As you can see, there are no joins in the plan and the ProductAttributes table is not accessed.

1. Execution Plan with Join Elimination

You can use a different approach and store LOB data in memory-optimized tables, splitting it into multiple 8,000-byte chunks. You can use the table similar to what is defined below.

create table dbo.LobData
(
    ObjectId int not null,
    PartNo smallint not null,
    Data varbinary(8000) not null,

    constraint PK_LobData
    primary key nonclustered hash(ObjectID, PartNo)
    with (bucket_count=1048576),

    index IDX_ObjectID
    nonclustered hash(ObjectID)
    with (bucket_count=1048576)
)
with (memory_optimized = on, durability = schema_and_data)

Listing below demonstrates how to insert XML data into the table using T-SQL code in interop mode. It uses an inline table-valued function called dbo.SplitData that accepts the varbinary(max) parameter and splits it into multiple 8,000-byte chunks.

create function dbo.SplitData
(
    @LobData varbinary(max)
)
returns table
as
return
(
    with Parts(Start, Data)
    as
    (
        select 1, substring(@LobData,1,8000) 
        where @LobData is not null
		
        union all
		
        select 
            Start + 8000
            ,substring(@LobData,Start + 8000,8000)
        from Parts
        where len(substring(@LobData,Start + 8000,8000)) > 0
    )
    select 
        row_number() over(order by Start) as PartNo
        ,Data
    from
        Parts
)
go

-- Test Data
declare
    @X xml

select @X = 
    (select * from master.sys.objects for xml raw)

insert into dbo.LobData(ObjectId, PartNo, Data)
    select 1, PartNo, Data
    from dbo.SplitData(convert(varbinary(max),@X))

On the side note, dbo.SplitData function uses recursive CTE to split the data. Do not forget that SQL Server limits the CTE recursion level to 100 by default. You need to specify OPTION (MAXRECURSION 0) in the statement that uses the function in case of very large inputs.

Figure 2 shows the contents of the LobData table after the insert.

2. Content of LobData table after insert

You can construct original data using FOR XML PATH method as shown below. Alternatively, you can develop a CLR aggregate and concatenate binary data there.

;with ConcatData(BinaryData)
as
(
    select 
        convert(varbinary(max),
            (
                select convert(varchar(max),Data,2) as [text()]
                from dbo.LobData
                where ObjectId = 1
                order by PartNo
                for xml path('')
            ),2)
)
select convert(xml,BinaryData) 
from ConcatData

The biggest downside of this approach is the inability to split and merge large objects in natively compiled stored procedures due to the missing (max) parameters and variables support. You should use the interop engine for this purpose. However, it is still possible to achieve performance improvements by moving data into memory even when the interop engine is in use.

This approach is also beneficial when memory-optimized tables are used just for the data storage, and all split and merge logic is done inside the client applications. I will show you such an example in my next blog post.

“Pro SQL Server Internals” eBook is on sale until July 22nd

I just want to share that APress is running Independence Day eBook sale now. You can buy the copy of “Pro SQL Server Internals” eBook for $17.76 from their web site. The sale ends on July 22nd, 2015.

You can read the reviews at Amazon.

I would also appreciate if you post a short review at Amazon. It would help me to convince APress to publish the 2nd edition with SQL Server 2016 content.

PASS Summit 2015: I am Speaking!

SQL Server professionals are very strange. They live in their own world. They even have a calendar, which does not make any sense to the regular people.

They call Spring as Abstract Time and spend days in the dark rooms writing abstracts for upcoming PASS Summit. Summer becomes either Time of Work and Happiness or Time of Sadness depending on the Call for Speakers results. Autumn is, obviously, Gathering Time – time when they meet their friends and #sqlfamily in Seattle. And, finally, Winter is Time of Memories and Stories –  the season of neverending tales from the Summit – the season when they are boring other, non-SQL Server folks, to death.

This year is Time of Work and Happiness for me. I just found that two of my abstracts have been selected, including the pre-con session. It is hard to express how proud and honored I feel and I want to thank Program Committee, PASS and all of you for your trust and support. Thank you, it means a lot!

I am delivering the well known pre-con: SQL Server Internals: The Practical Angle. This pre-con explains how SQL Server components work under the hood and how their behavior affect your systems. It is based on my book Pro SQL Server Internals or, to be exact, that pre-con inspired me to write that book in the first place.

The second session, Thinking Outside of the In-Memory Box, talks about In-Memory OLTP and focusing how to address the surface area limitations of the technology in SQL Server 2014 and how to benefit from it even when migrations of existing systems are cost ineffective. There is also the book behind it.

I really hope, I will see some of you in Seattle. I urge you to attend the Summit – especially, if you have never been there. Trust me, it is life changing experience – the one, which is impossible to forget. You will see our community in action, you will make new friends and, of course, you will learn a lot.

Looking forward to see you there!

Thinking Outside of In-Memory Box: Supporting Uniqueness and Referential Integrity in In-Memory OLTP

As with any new technology, adoption of In-Memory OLTP comes at a cost. You need to acquire and/or upgrade to SQL Server 2014, spend time learning the technology and, if you are migrating an existing system, refactor code and test the changes.

Unfortunately, system refactoring can be complex and time consuming. SQL Server 2014 In-Memory OLTP has several important limitations, which can dramatically increase the cost of migration. To name just  few – it does not support off-row storage limiting you to 8,060-byte rows nor support CHECK, UNIQUE, FOREIGN KEY constraints and triggers. All those limitations can be addressed by schema and code refactoring; however, in some cases, that refactoring can require significant amount of efforts.

Today, I would like to start the series of the blog posts discussing how we can address specific limitations that exist in the first release of In-Memory OLTP. I will start with supporting uniqueness and referential integrity in the system. Unfortunately, In-Memory OLTP does not allow you to define foreign keys nor unique indexes and constraints besides a primary key. Such limitation rarely prevents us from adoption of the new technology. Nevertheless, those constraints help to keep the data clean and allow to detect data quality issues and bugs in the code at early stages of development.

To make matter worse, lock-free nature of In-Memory OLTP makes the code approach tricky. In-Memory OLTP transactions do not see any uncommitted changes done by the other transactions. For example, if you ran the code from Figure 1 in default SNAPSHOT isolation level, both transactions would successfully commit without seeing each other changes violating ProductName uniqueness.

Figure 1. Uniqueness Violation in SNAPSHOT Isolation Level

Fortunately, that situation can be addressed by using SERIALIZABLE transaction isolation level. As you remember, In-Memory OLTP validates serializable consistency rules by maintaining transaction scan set. As part of serializable rules validation, In-Memory OLTP checks for the phantom rows making sure that other sessions did not insert any rows that were previously invisible for the active transactions.

Code below creates memory-optimized table and natively compiled stored procedure that inserts data there running in SERIALIZABLE isolation level. Any inserts done through this stored procedure guarantee uniqueness of the ProductName even in multi-user concurrent environment.

The SELECT query builds transaction scan set, which will be used for serializable rule validation. That validation would fail if any other sessions inserted a row with the same ProductName while transaction is still active. Unfortunately, the first release of In-Memory OLTP does not support subqueries and it is impossible to write the code using IF EXISTS construct.

create table dbo.ProductsInMem
(
    ProductId int not null identity(1,1)
        constraint PK_ProductsInMem
        primary key nonclustered hash
        with (bucket_count = 65536),
    ProductName nvarchar(64) 
        collate Latin1_General_100_BIN2 not null,
    ShortDescription nvarchar(256) not null,

    index IDX_ProductsInMem_ProductName nonclustered(ProductName)
)
with (memory_optimized = on, durability = schema_and_data);
create procedure dbo.InsertProduct
(
    @ProductName nvarchar(64) not null
    ,@ShortDescription nvarchar(256) not null
    ,@ProductId int output
)
with native_compilation, schemabinding, execute as owner
as
begin atomic with
(
    transaction isolation level = serializable
    ,language = N'English'
)
    declare
        @Exists bit = 0

    -- Building scan set and checking existense of the product
    select @Exists = 1
    from dbo.ProductsInMem
    where ProductName = @ProductName

    if @Exists = 1
    begin
	;throw 50000, 'Product Already Exists', 1;
	return
    end

    insert into dbo.ProductsInMem(ProductName, ShortDescription)
    values(@ProductName, @ShortDescription);

    select @ProductID = scope_identity()
end

You can validate behavior of the stored procedure by running it in two parallel sessions as shown in Figure 2 below. Session 2 successfully inserts a row and commits the transaction. Session 1, on the other hand, would fail on commit stage with Error 41325.

Figure 2. dbo.InsertProduct Call from Two Parallel Sessions

Obviously, this approach would work and enforce the uniqueness only when you have full control over data access tier and have all INSERT and UPDATE operations performed through the specific set of stored procedures and/or code. INSERT and UPDATE statements executed directly against a table could easily violate uniqueness rules. However, you can reduce the risk by revoking INSERT and UPDATE permissions from the users giving them the EXECUTE permission on the stored procedures instead.

You can use the same technique to enforce referential integrity rules. Code below creates Orders and OrderLineItems tables and two stored procedures InsertOrderLineItems and DeleteOrders enforcing referential integrity between those tables. I am omitting OrderId update scenario, which is very uncommon in the real life.

create table dbo.Orders
(
    OrderId int not null identity(1,1)
        constraint PK_Orders
        primary key nonclustered hash 
        with (bucket_count=1049008),
    OrderNum varchar(32) 
        collate Latin1_General_100_BIN2 not null,
    OrderDate datetime2(0) not null
        constraint DEF_Orders_OrderDate
        default GetUtcDate(),
    /* Other Columns */
    index IDX_Orders_OrderNum
    nonclustered(OrderNum)
)
with (memory_optimized = on, durability = schema_and_data);

create table dbo.OrderLineItems
(
    OrderId int not null,
    OrderLineItemId int not null identity(1,1)
        constraint PK_OrderLineItems
        primary key nonclustered hash 
        with (bucket_count=4196032),
    ArticleId int not null,
    Quantity decimal(8,2) not null,
    Price money not null,
    /* Other Columns */

    index IDX_OrderLineItems_OrderId
    nonclustered hash(OrderId)
    with (bucket_count=1049008)
)
with (memory_optimized = on, durability = schema_and_data);
go

create type dbo.tvpOrderLineItems as table
(
    ArticleId int not null
        primary key nonclustered hash
        with (bucket_count = 1024),
    Quantity decimal(8,2) not null,
    Price money not null
    /* Other Columns */
)
with (memory_optimized = on);
go

create proc dbo.DeleteOrder
(
    @OrderId int not null
)
with native_compilation, schemabinding, execute as owner
as
begin atomic
with 
(
    transaction isolation level = serializable
    ,language=N'English'
)
    -- This stored procedure emulates ON DELETE NO ACTION 
    -- foreign key constraint behavior
    declare
        @Exists bit = 0

    select @Exists = 1
    from dbo.OrderLineItems
    where OrderId = @OrderId

    if @Exists = 1
    begin
        ;throw 60000, 'Referential Integrity Violation', 1;
        return
    end
    
    delete from dbo.Orders where OrderId = @OrderId
end
go

create proc dbo.InsertOrderLineItems
(
    @OrderId int not null
    ,@OrderLineItems dbo.tvpOrderLineItems readonly 
)
with native_compilation, schemabinding, execute as owner
as
begin atomic
with 
(
    transaction isolation level = repeatable read
    ,language=N'English'
)
    declare
        @Exists bit = 0

    select @Exists = 1
    from dbo.Orders
    where OrderId = @OrderId

    if @Exists = 0
    begin
        ;throw 60001, 'Referential Integrity Violation', 1;
        return
    end
    
    insert into dbo.OrderLineItems(OrderId, ArticleId, Quantity, Price)
        select @OrderId, ArticleId, Quantity, Price
        from @OrderLineItems
end

It is worth noting that InsertOrderLineItems procedure is using REPEATABLE READ isolation level. In this scenario, we need to make sure that referenced Order row has not been deleted during the execution and REPEATABLE READ enforces that introducing less overhead than SERIALIZABLE isolation level.

That technique comes with another small benefit. It can demonstrate advantage of having dedicated data access tier to application developers helping to convince them to build one. All of us, database professionals, would like it, don’t we? 🙂

Source code is available for download.

Next: Addressing 8,060-byte Maximum Row Size Limitation

Compressing LOB (XML) Data in the Database

Some time ago we have discussed the several techniques that can help reducing the database size. I have received quite a few responses on that post and several people asked if I could provide more details and examples. Today I will try to follow up on one of the methods, such as reducing the size of LOB data (in particular XML) in the database.

As you know, SQL Server stores the data in regular B-Tree indexes in three different sets of the data pages called allocation units. The main data row structure and fixed-length data are stored in IN-ROW data pages. Variable-length data greater than 8,000 bytes in size is stored in LOB (large object) pages. Such data includes (max) columns, XML, CLR UDT and a few other data types. Finally, variable-length data, which does not exceed 8,000 bytes, is stored either in IN-ROW data pages when it fits into the page, or in ROW-OVERFLOW data pages. You can read more about it in the previous post and here.

Enterprise Edition of SQL Server allows you to reduce the size of the data by implementing data compression. However, data compression is applied to IN-ROW data only and it does not compress ROW-OVERFLOW and LOB data. Any large objects that do not fit into IN-ROW data pages remain uncompressed.

Unfortunately, there is very little we can do to reduce the size of the LOB data. SQL Server does not provide any build-in functional to address it. The only remaining option is compressing it manually – either on the client or with CLR routines. Obviously, compression and decompression adds overhead and, from SQL Server load standpoint, it is preferable to do it on the client side. However, in the large number of cases, you will need to access compressed data from T-SQL, and CLR integration is the only choice. Ideal implementation in that case would combine compression and decompression code in both tiers and use CLR only when it is necessary.

Fortunately, .Net implementation of the compression code is very simple and can be done with DeflateStream or GZipStream classes. Below you can see the code of CLR functions that perform compression and decompression. You can also download entire project with the link at the end of the post.

/// <summary>
/// Compressing the data
/// </summary>
[Microsoft.SqlServer.Server.SqlFunction(IsDeterministic = true, IsPrecise = true,
        DataAccess = DataAccessKind.None)]
public static SqlBytes BinaryCompress(SqlBytes input)
{
    if (input.IsNull)
        return SqlBytes.Null;

    using (MemoryStream result = new MemoryStream())
    {
        using (DeflateStream deflateStream = 
            new DeflateStream(result, CompressionMode.Compress, true))
        {
            deflateStream.Write(input.Buffer, 0, input.Buffer.Length);
            deflateStream.Flush();
            deflateStream.Close();
        }
        return new SqlBytes(result.ToArray());
    } 
}

/// <summary>
/// Decompressing the data
/// </summary>
[Microsoft.SqlServer.Server.SqlFunction(IsDeterministic = true, IsPrecise = true,
        DataAccess = DataAccessKind.None)]
public static SqlBytes BinaryDecompress(SqlBytes input)
{
    if (input.IsNull)
        return SqlBytes.Null;
    
    int batchSize = 32768;
    byte[] buf = new byte[batchSize];

    using (MemoryStream result = new MemoryStream())
    {
        using (DeflateStream deflateStream = 
            new DeflateStream(input.Stream, CompressionMode.Decompress, true))
        {
            int bytesRead;
            while ((bytesRead = deflateStream.Read(buf, 0, batchSize)) > 0)
                result.Write(buf, 0, bytesRead);
        }
        return new SqlBytes(result.ToArray());
    } 
}

You can define the functions in the database with the following code (you either need to get byte sequence of the compiled assembly from the demo script or compile CLR project).

create assembly LOBCompress
authorization dbo
from /*..*/
go

create function dbo.BinaryCompress(@input varbinary (max))
returns varbinary (max)
as external name [LOBCompress].[Compress].[BinaryCompress];
go

create function dbo.BinaryDecompress(@input varbinary (max))
returns varbinary (max)
as external name [LOBCompress].[Compress].[BinaryDecompress];
go

Now let’s see the process in action, create the test table and populate it with some data.

create table dbo.DataWithXML
(
    ID int not null,
    Data xml not null,

    constraint PK_DataWithXML
    primary key clustered(ID)
)
go

declare
    	@X xml

select @X = 
    (
        select *
        from master.sys.objects
        for xml raw, root('Data')
    )

;with n1(c) as (select 0 union all select 0) -- 2 rows
,n2(c) as (select 0 from n1 as t1 cross join n1 as t2) -- 4 rows
,n3(c) as (select 0 from n2 as t1 cross join n2 as t2) -- 16 rows
,n4(c) as (select 0 from n3 as t1 cross join n3 as t2) -- 256 rows
,n5(c) as (select 0 from n4 as t1 cross join n3 as t2) -- 4,096 rows
,ids(id) as (select row_number() over (order by (select null)) from n5)
insert into dbo.DataWithXML(ID,Data)
    select id, @X
    from Ids;

update dbo.DataWithXML
set Data.modify('replace value of (/Data/row/@object_id)[1]
with sql:column("ID")');

Let’s examine the size of the table and average size of XML there using the following queries:

select 
    index_id, partition_number, alloc_unit_type_desc
    ,index_level
    ,page_count
    ,page_count * 8 / 1024 as [Size MB]
from 
    sys.dm_db_index_physical_stats
    (
        db_id() /*Database */
        ,object_id(N'dbo.DataWithXML') /* Table (Object_ID) */
        ,1 /* Index ID */
        ,null /* Partition ID – NULL – all partitions */
        ,'detailed' /* Mode */
    )
go

select avg(datalength(Data)) as [Avg XML Size]
from dbo.DataWithXML;

As you can see in Figure 1, LOB allocation units are using 96MB of space to store the data. The average size of the XML data is 20,897 bytes per row. It is also worth mentioning that actual storage size for XML is 3 data pages per row, which is 24,576 bytes.

1. Uncompressed data: Storage space and XML Size

Let’s see what we can achieve by using compression. The code below creates another table and copies the data there, compressing it on the fly. As you see, the data is stored in varbinary(max) column.

create table dbo.DataWithCompressedXML
(
    ID int not null,
    Data varbinary(max) not null,

    constraint PK_DataWithCompressedXML
    primary key clustered(ID)
)
go

insert into dbo.DataWithCompressedXML(ID,Data)
    select ID, dbo.BinaryCompress(convert(varbinary(max),Data))
    from dbo.DataWithXML

Let’s check the size of the data in compressed table:

select avg(datalength(Data)) as [Uncompressed] 
from dbo.DataWithXML;

select avg(datalength(Data)) as [Compressed] 
from dbo.DataWithCompressedXML
go

select 
    index_id, partition_number, alloc_unit_type_desc
    ,index_level
	,page_count
	,page_count * 8 / 1024 as [Size MB]
from 
    sys.dm_db_index_physical_stats
    (
        db_id() /*Database */
        ,object_id(N'dbo.DataWithCompressedXML') /* Table (Object_ID) */
        ,1 /* Index ID */
        ,null /* Partition ID – NULL – all partitions */
        ,'detailed' /* Mode */
    )

As you can see in Figure 2, we were able to reduce the size of the table from 96 to 10MB decreasing the size of the XML in every row from 20,897 to 2,674 bytes.

2. Compressed data: Storage size and compressed XML size

It is also worth noting that in our case, the size of compressed data is less than 8,000 bytes and SQL Server was able to accommodate all the data using IN-ROW allocation units. Even though the new table is almost 10 times smaller than the old one, it has significantly more IN-ROW data pages in the index. This could introduce some performance side effects in some cases. For example, when system has poorly optimized queries that perform clustered index scans. Again, it could become the issue only if compressed data is less than 8,000 bytes. Otherwise, SQL Server will still store it using the LOB data pages.

Obviously, this implementation requires schema and code changes. We can mitigate it a little bit by abstracting it with the views as it is shown below.

create view dbo.vDataWithXML(ID, Data)
as
    select ID, convert(xml,dbo.BinaryDecompress(Data))
    from dbo.DataWithCompressedXML

We can even create INSTEAD OF trigger (yuck!) on the view to minimize the changes if absolutely needed.

Speaking of the overhead, compressing and decompressing are CPU intensive and there is the performance penalty of calling CLR functions. It is not that noticeable when you need to decompress the single or very few rows; however, it could be very significant on the large data sets. For example, when you need to decompress and shred XML and use some of its elements in a where clause of the query. The code below shows such an example.

set statistics time on
select count(*) 
from dbo.DataWithXML 
where Data.value('(/Data/row/@object_id)[1]','int') = 3;

select count(*) 
from dbo.vDataWithXML 
where Data.value('(/Data/row/@object_id)[1]','int') = 3;
set statistics time off

The second SQL that accesses the compressed table has to decompress XML for every row in the table. The execution times of the statements on my laptop are 116ms and 6,899ms respectively. As you see, the compression definitely comes at cost.

One of the ways to reduce such an overhead, is storing XML elements that are used in the queries in the separate table columns. Unfortunately, by-the-book approach with persisted calculated columns would not always works. It is possible to create and persist such a column using user-defined functions; however, SQL Server would ignore it in some cases. The code below shows the example that creates calculated column that contains the attribute we are using in our queries.

create function dbo.fnGetCompressedObjectId(@Compressed varbinary(max))
returns int
with schemabinding
as
begin
    return (convert(xml,dbo.BinaryDecompress(@Compressed))
                .value('(/Data/row/@object_id)[1]','int'))
end
go	 

alter table dbo.DataWithCompressedXML
add
    ObjId as dbo.fnGetCompressedObjectId(Data)
        persisted
go

-- It is a good practice to rebuild index after alteration
alter index PK_DataWithCompressedXML 
on dbo.DataWithCompressedXML rebuild
go

alter view dbo.vDataWithXML(ID, Data, ObjId)
as
    select ID, convert(xml,dbo.BinaryDecompress(Data)), ObjId
    from dbo.DataWithCompressedXML
go

Unfortunately, if you ran the following query: select count(*) from dbo.vDataWithXML where ObjId = 3, you’d notice that SQL Server recalculates the value of the calculated column even though it is persisted and functions are defined as deterministic and precise. This is just the limitation of the Query Optimizer. Figure 3 shows that ObjId is recalculated.

3. Execution plan of the query

There is still the possibility of using persisted calculated columns. For example, you can define CLR function, which decompress and parse XML and return ObjID as the integer. Something like that:

.Net CLR:
[Microsoft.SqlServer.Server.SqlFunction(IsDeterministic = true, IsPrecise = true,
            DataAccess = DataAccessKind.None)]
public static SqlInt32 GetObjId(SqlBytes input)
{
    if (input.IsNull)
        return SqlInt32.Null;
    /* Parsing XML with XmlReader and return ObjId attribute */
}

SQL: 
create function dbo.GetObjId(@input varbinary (max))
returns int
as external name [LOBCompress].[Compress].[GetObjId]
go

alter table dbo.DataWithCompressedXML drop column ObjId
go

alter table dbo.DataWithCompressedXML
add
    ObjId as dbo.GetObjId(Data)
        persisted
go

alter index PK_DataWithCompressedXML 
on dbo.DataWithCompressedXML rebuild
go

Now, if you run the previous query: select count(*) from dbo.vDataWithXML where ObjId = 3, it would work just fine and does not recalculate the column value. Figure 4 shows the execution plan in this case.

4. Execution plan that utilizes calculated column

It is worth mentioning that execution time of this query on my laptop is just 3 milliseconds comparing to 116 milliseconds of the query against dbo.DataWithXML table. The query is significantly faster because it does not need to shred XML to obtain ObjID value.

Obviously, creating separate CLR methods for each calculated column can lead to some coding overhead. You can consider using regular columns instead and populate them in the code or even in the triggers. Each approach has the own set of benefits and downsides based on the use-cases implemented in the system.

Compressing LOB data in the database could help you to significantly reduce the database size in the large number of cases. However, it adds an overhead of compressing and decompressing data. In some cases, such overhead would be easily offset by the smaller data size, less I/O and buffer pool usage but in any case, you should be careful and take all other factors into the consideration.

Source code is available for download.