Category Archives: Client Development

Thinking Outside of In-Memory Box: Utilizing In-Memory OLTP as the Session- or Object-Store

Modern software systems have become extremely complex. They consist of a large number of components and services responsible for various tasks. They must be scalable and redundant and need to be able to handle load growth and survive hardware failures and crashes.

The common approach to solving scalability and redundancy issues is to design the systems in a way that permits to deploy and run multiple instances of individual services. It allows adding more servers and instances as the load grows and helps you survive hardware failures by distributing the load across other active servers. The services are usually implemented in stateless way, and they don’t store or rely on any local data.

Most systems, however, have data that needs to be shared across the instances. For example, front-end web servers often need to maintain web session states. Back-end processing services often need to have shared cache with some data.

Historically, there were two approaches to address this issue. The first one was to use dedicated storage/cache and host it somewhere in the system. Remember the old ASP.Net model that used either a SQL Server database or a separate web server to store session data? The problem with this approach is limited scalability and redundancy. Storing session data in web server memory is fast but it is not redundant. A SQL Server database, on the other hand, can be protected but it does not scale well under the load due to page latch contention and other issues.

Another approach was to replicate content of the cache across multiple servers. Each instance worked with the local copy of the cache while another background process distributed the changesto the other servers. Several solutions on the market provide such capability; however, they are usually expensive. In some cases, the license cost for such software could be in the same order of magnitude as SQL Server licenses.

Fortunately, you can use In-Memory OLTP as the solution. In the nutshell, it looks similar to the ASP.Net SQL Server session-store model; however, In-Memory OLTP throughput and performance improvements address the scalability issues of the old on-disk solution. You can improve performance even further by using non-durable memory-optimized tables. Even though the data will be lost in case of failover, this is acceptable in most cases.

However, the 8,060-byte maximum row size limit introduces challenges to the implementation. It is entirely possible that a serialized object will exceed 8,060 bytes. You can address this by splitting the data into multiple chunks and storing them in multiple rows in memory-optimized table.

You saw an example of a T-SQL implementation in my previous blog post. However, using T-SQL code and an interop engine will significantly decrease the throughput of the solution. It is better to manage serialization and split/merge functional on the client side.

Let’s look at the oversimplified example and see how we can handle that in the client code. The first listing below creates the table that we will use to store the data along with three stored procedures to load and save data to/from the table.

create table dbo.SessionStore
(
     ObjectKey uniqueidentifier not null,
     ExpirationTime datetime2(2) not null,
     ChunkNum smallint not null,
     Data varbinary(8000) not null,
 
     constraint PK_ObjStore 
     primary key nonclustered hash (ObjectKey, ChunkNum)
     with (bucket_count=1048576),

     index IDX_ObjectKey
     nonclustered hash(ObjectKey)
     with (bucket_count=1048576)
)
with (memory_optimized = on, durability = schema_only);
go 

create type dbo.tvpObjData as table
(
     ChunkNum smallint not null
          primary key nonclustered hash
          with (bucket_count = 128),
     Data varbinary(8000) not null
)
with(memory_optimized=on)
go 

create proc dbo.SaveObjectToStore
(
     @ObjectKey uniqueidentifier
     ,@ExpirationTime datetime2(2)
     ,@ObjData dbo.tvpObjData readonly 
)
with native_compilation, schemabinding, exec as owner
as
begin atomic with
(
     transaction isolation level = snapshot
     ,language = N'English'
)
     delete dbo.SessionStore
     where ObjectKey = @ObjectKey

     insert into dbo.SessionStore(ObjectKey, ExpirationTime, ChunkNum, Data)
          select @ObjectKey, @ExpirationTime, ChunkNum, Data
          from @ObjData
end
go

create proc dbo.SaveObjectToStore_Row
(
     @ObjectKey uniqueidentifier
     ,@ExpirationTime datetime2(2)
     ,@ObjData varbinary(8000) 
)
with native_compilation, schemabinding, exec as owner
as
begin atomic with
(
     transaction isolation level = snapshot
     ,language = N'English'
)
     delete dbo.SessionStore
     where ObjectKey = @ObjectKey

     insert into dbo.SessionStore(ObjectKey, ExpirationTime, ChunkNum, Data)
     values(@ObjectKey, @ExpirationTime, 1, @ObjData)
end
go

create proc dbo.LoadObjectFromStore
(
     @ObjectKey uniqueidentifier not null
)
with native_compilation, schemabinding, exec as owner
as
begin atomic
with
(
     transaction isolation level = snapshot
     ,language = N'English'
)
     select t.Data
     from dbo.SessionStore t
     where t.ObjectKey = @ObjectKey and ExpirationTime >= sysutcdatetime()
     order by t.ChunkNum 
end

As you can see, there are two different stored procedures that save data to the table. The first one – dbo.SaveObjectToStore – uses memory-optimized table-valued parameter and can be used in the case, when serialized object data is greater than 8,000 bytes. The second stored procedure – – dbo.SaveObjectToStore_Row – accepts varbinary(8000) parameter and can be used if serialized object is within 8,000-byte range. This is strictly for optimization purposes. Even though memory-optimized table-valued parameters are very fast, they are still slower compating to the regular parameter.

The client code would contain several static classes. The first ObjStoreUtils class provides four methods to serialize and deserialize objects into the byte arrays, and split and merge those arrays to/from 8,000-byte chunks. You can see the code below.

public static class ObjStoreUtils
{
     // Serialize object of type T to the byte array
     public static byte[] Serialize(T obj)
     {
          using (var ms = new MemoryStream())
          {
               var formatter = new BinaryFormatter();
               formatter.Serialize(ms, obj);

               return ms.ToArray();
          }
     }

     // Deserialize byte array to the object 
     public static T Deserialize(byte[] data)
     {
          using (var output = new MemoryStream(data))
          {
               var binForm = new BinaryFormatter();
               return (T)binForm.Deserialize(output);
          }
     }

     /// Split byte array to the multiple chunks
     public static List<byte[]> Split(byte[] data, int chunkSize)
     {
          var result = new List<byte[]>();

          for (int i = 0; i < data.Length; i += chunkSize) { int currentChunkSize = chunkSize; if (i + chunkSize > data.Length)
                    currentChunkSize = data.Length - i;

               var buffer = new byte[currentChunkSize];
               Array.Copy(data, i, buffer, 0, currentChunkSize);

               result.Add(buffer);
          }
          return result;
     }

     // Combine multiple chunks into the byte array
     public static byte[] Merge(List<byte[]> arrays)
     {
          var rv = new byte[arrays.Sum(a => a.Length)];
          int offset = 0;
          foreach (byte[] array in arrays)
          {
               Buffer.BlockCopy(array, 0, rv, offset, array.Length);
               offset += array.Length;
          }
          return rv;
     }
}(

The ObjStoreDataAccess class shown in the next listing, loads and saves binary data to and from the database. It utilizes another static class – DBConnManager, which returns the SqlConnection object to the target database. This class is not shown there.

public static class ObjStoreDataAccess
{
    // Saves data to the database
    public static void SaveObjectData(Guid key,
                DateTime expirationTime, List<byte[]> chunks)
    {
        if (chunks == null || chunks.Count == 0)
            return;

        using (var cnn = DBConnManager.GetConnection())
        {
            using (var cmd = cnn.CreateCommand())
            {
                cmd.CommandType = CommandType.StoredProcedure;
                cmd.Parameters.Add("@ObjectKey",
                    SqlDbType.UniqueIdentifier).Value = key;
                cmd.Parameters.Add("@ExpirationTime",
                    SqlDbType.DateTime2).Value = expirationTime;

                if (chunks.Count == 1)
                {
                    cmd.CommandText = "dbo.SaveObjectToStore_Row";
                    cmd.Parameters.Add("@ObjData", 
                        SqlDbType.VarBinary, 8000).Value = chunks[0];
                }
                else
                {
                    cmd.CommandText = "dbo.SaveObjectToStore";
                    var tvp = new DataTable();
                    tvp.Columns.Add("ChunkNum", typeof(short));
                    tvp.Columns.Add("ChunkData", typeof(byte[]));

                    for (int i = 0; i < chunks.Count; i++)
                        tvp.Rows.Add(i, chunks[i]);

                    var tvpParam = new SqlParameter("@ObjData",
                         SqlDbType.Structured)
                    {
                        TypeName = "dbo.tvpObjData",
                        Value = tvp
                    };

                    cmd.Parameters.Add(tvpParam);

                }
                cmd.ExecuteNonQuery();
            }
        }
    }

    // Load data from the database
    public List<byte[]> LoadObjectData(Guid key)
    {
        using (var cnn = DBConnManager.GetConnection())
        {
            using (var cmd = cnn.CreateCommand())
            {
                cmd.CommandText = "dbo.LoadObjectFromStore";
                cmd.CommandType = CommandType.StoredProcedure;
                cmd.Parameters.Add("ObjectKey",
                    SqlDbType.UniqueIdentifier).Value = key;

                var result = new List<byte[]>();
                using (var reader = cmd.ExecuteReader())
                {
                    while (reader.Read())
                        result.Add((byte[])reader["Data"]);
                }
                return result;
            }
        }
    }
}

Finally, the ObjStoreService class shown below puts everything together and manages the entire process. It implements two simple methods,Load and Save, calling the helper classes defined above.

public static class ObjStoreService
{
    private const int MaxChunkSize = 8000;

    // Saves object in the object store
    public static void Save(Guid key, 
                DateTime expirationTime, object obj)
    {
        var objectBytes = ObjStoreUtils.Serialize(obj);
        var chunks = ObjStoreUtils.Split(objectBytes, MaxChunkSize);

        ObjStoreDataAccess.SaveObjectData(key, expirationTime, chunks);
    }

    // Loads object from the object store
    public static T Load(Guid key) where T: class 
    {
        var chunks = ObjStoreDataAccess.LoadObjectData(key);
        if (chunks.Count == 0) 
            return null;
        var objectBytes = ObjStoreUtils.Merge(chunks);

        return ObjStoreUtils.Deserialize(objectBytes);
    }
}

Obviously, this is oversimplified example, which I used just to illustrate the concept. Production implementation could be significantly more complex, especially if there is the possibility that multiple sessions can update the same object simultaneously. You can implement retry logic using the similar approach with what we did enforcing uniqueness/referential integrity or create some sort of object locking management in the system if this is the case.

It is also worth mentioning that you can compress binary data before saving it into the database. The compression will introduce unnecessary overhead in the case of small objects; however, it could provide significant space savings and performance improvements if the objects are large. I did not include compression code in the example, although you can easily implement it with the GZipStream or DeflateStream classes.

You can download the demo application from “Expert SQL Server In-Memory OLTP” Companion materials. It has slightly different implementation – I denormalized classes a little bit to reduce C# code overhead during the demos when it is running on the same box with SQL Server. However, it is very similar to what you saw in this post.

P.S. I want to thank Vladimir Zatuliveter (zatuliveter at gmail dot com) for his help with the code.

Thinking Outside of In-Memory Box: Addressing 8,060-byte Maximum Row Size Limitation

The 8,060-byte maximum row size limit is, perhaps, one of the biggest roadblocks in widespread In-Memory OLTP adoption. This limitation essentially prevents you from using (max) data types along with CLR and system data types that require off-row storage, such as XML, geometry, geography and a few others. Even though you can address this by changing the database schema and T-SQL code, these changes are often expensive and time consuming.

When you encounter such a situation, you should analyze if LOB data types are required in the first place. It is not uncommon to see a column that never stores more than a few hundred characters defined as (n)varchar(max). Consider an Order Entry system and DeliveryInstruction column in the Orders table. You can safely limit the size of the column to 500-1,000 characters without compromising the business requirements of the system.

Another example is a system that collects some semistructured sensor data from the devices and stores it in the XML column. If the amount of semistructured data is relatively small, you can store it in varbinary(N) column, which will allow you to move the table into memory.

Unfortunately, sometimes it is impossible to change the data types and you have to keep LOB columns in the tables. Nevertheless, you have a couple options to proceed.

The first approach is to split data between two tables, storing the key attributes in memory-optimized and rarely-accessed LOB attributes in on-disk tables. Again, consider the situation where you have an Order Entry system with the Products table defined as shown in Listing below

create table dbo.Products
(
    ProductId int not null identity(1,1),
    ProductName nvarchar(64) not null,
    ShortDescription nvarchar(256) not null,
    Description nvarchar(max) not null,
    Picture varbinary(max) null,

    constraint PK_Products
    primary key clustered(ProductId)
)

As you can guess, in this scenario, it is impossible to change the data types of the Picture and Description columns, which prevents you from making the Products table memory-optimized. However, you can split that table into two, as shown below. The Picture and Description columns are stored in an on-disk table while all other columns are stored in the memory-optimized table. This approach will improve performance for the queries against the ProductsInMem table and will allow you to access it from natively compiled stored procedures in the system.

create table dbo.ProductsInMem
(
    ProductId int not null identity(1,1)
        constraint PK_ProductsInMem
        primary key nonclustered hash
        with (bucket_count = 65536),
    ProductName nvarchar(64) 
        collate Latin1_General_100_BIN2 not null,
    ShortDescription nvarchar(256) not null,

    index IDX_ProductsInMem_ProductName 
    nonclustered(ProductName)
)
with (memory_optimized = on, durability = schema_and_data);

create table dbo.ProductAttributes
(
    ProductId int not null,
    Description nvarchar(max) not null,
    Picture varbinary(max) null,
	
    constraint PK_ProductAttributes
    primary key clustered(ProductId)
);

Unfortunately, it is impossible to define a foreign key constraint referencing a memory-optimized table, and you should support referential integrity in your code. We have already looked at one of the possible approaches in my previous blog post.

You can hide some of the implementation details from the SELECT queries by defining a view as shown below. You can also define INSTEAD OF triggers on the view and use it as the target for data modifications; however, it is more efficient to update data in the tables directly.

create view dbo.Products(ProductId, ProductName, 
    ShortDescription, Description, Picture)
as
    select 
        p.ProductId, p.ProductName, p.ShortDescription
        ,pa.Description, pa.Picture
    from 
        dbo.ProductsInMem p left outer join 
            dbo.ProductAttributes pa on
                p.ProductId = pa.ProductId

As you should notice, the view is using an outer join. This allows SQL Server to perform join elimination when the client application does not reference any columns from the ProductAttributes table when querying the view. For example, if you ran SELECT ProductId, ProductName from dbo.Products, you would see the execution plan as shown in Figure 1. As you can see, there are no joins in the plan and the ProductAttributes table is not accessed.

1. Execution Plan with Join Elimination

You can use a different approach and store LOB data in memory-optimized tables, splitting it into multiple 8,000-byte chunks. You can use the table similar to what is defined below.

create table dbo.LobData
(
    ObjectId int not null,
    PartNo smallint not null,
    Data varbinary(8000) not null,

    constraint PK_LobData
    primary key nonclustered hash(ObjectID, PartNo)
    with (bucket_count=1048576),

    index IDX_ObjectID
    nonclustered hash(ObjectID)
    with (bucket_count=1048576)
)
with (memory_optimized = on, durability = schema_and_data)

Listing below demonstrates how to insert XML data into the table using T-SQL code in interop mode. It uses an inline table-valued function called dbo.SplitData that accepts the varbinary(max) parameter and splits it into multiple 8,000-byte chunks.

create function dbo.SplitData
(
    @LobData varbinary(max)
)
returns table
as
return
(
    with Parts(Start, Data)
    as
    (
        select 1, substring(@LobData,1,8000) 
        where @LobData is not null
		
        union all
		
        select 
            Start + 8000
            ,substring(@LobData,Start + 8000,8000)
        from Parts
        where len(substring(@LobData,Start + 8000,8000)) > 0
    )
    select 
        row_number() over(order by Start) as PartNo
        ,Data
    from
        Parts
)
go

-- Test Data
declare
    @X xml

select @X = 
    (select * from master.sys.objects for xml raw)

insert into dbo.LobData(ObjectId, PartNo, Data)
    select 1, PartNo, Data
    from dbo.SplitData(convert(varbinary(max),@X))

On the side note, dbo.SplitData function uses recursive CTE to split the data. Do not forget that SQL Server limits the CTE recursion level to 100 by default. You need to specify OPTION (MAXRECURSION 0) in the statement that uses the function in case of very large inputs.

Figure 2 shows the contents of the LobData table after the insert.

2. Content of LobData table after insert

You can construct original data using FOR XML PATH method as shown below. Alternatively, you can develop a CLR aggregate and concatenate binary data there.

;with ConcatData(BinaryData)
as
(
    select 
        convert(varbinary(max),
            (
                select convert(varchar(max),Data,2) as [text()]
                from dbo.LobData
                where ObjectId = 1
                order by PartNo
                for xml path('')
            ),2)
)
select convert(xml,BinaryData) 
from ConcatData

The biggest downside of this approach is the inability to split and merge large objects in natively compiled stored procedures due to the missing (max) parameters and variables support. You should use the interop engine for this purpose. However, it is still possible to achieve performance improvements by moving data into memory even when the interop engine is in use.

This approach is also beneficial when memory-optimized tables are used just for the data storage, and all split and merge logic is done inside the client applications. I will show you such an example in my next blog post.

Thinking Outside of In-Memory Box: Supporting Uniqueness and Referential Integrity in In-Memory OLTP

As with any new technology, adoption of In-Memory OLTP comes at a cost. You need to acquire and/or upgrade to SQL Server 2014, spend time learning the technology and, if you are migrating an existing system, refactor code and test the changes.

Unfortunately, system refactoring can be complex and time consuming. SQL Server 2014 In-Memory OLTP has several important limitations, which can dramatically increase the cost of migration. To name just  few – it does not support off-row storage limiting you to 8,060-byte rows nor support CHECK, UNIQUE, FOREIGN KEY constraints and triggers. All those limitations can be addressed by schema and code refactoring; however, in some cases, that refactoring can require significant amount of efforts.

Today, I would like to start the series of the blog posts discussing how we can address specific limitations that exist in the first release of In-Memory OLTP. I will start with supporting uniqueness and referential integrity in the system. Unfortunately, In-Memory OLTP does not allow you to define foreign keys nor unique indexes and constraints besides a primary key. Such limitation rarely prevents us from adoption of the new technology. Nevertheless, those constraints help to keep the data clean and allow to detect data quality issues and bugs in the code at early stages of development.

To make matter worse, lock-free nature of In-Memory OLTP makes the code approach tricky. In-Memory OLTP transactions do not see any uncommitted changes done by the other transactions. For example, if you ran the code from Figure 1 in default SNAPSHOT isolation level, both transactions would successfully commit without seeing each other changes violating ProductName uniqueness.

Figure 1. Uniqueness Violation in SNAPSHOT Isolation Level

Fortunately, that situation can be addressed by using SERIALIZABLE transaction isolation level. As you remember, In-Memory OLTP validates serializable consistency rules by maintaining transaction scan set. As part of serializable rules validation, In-Memory OLTP checks for the phantom rows making sure that other sessions did not insert any rows that were previously invisible for the active transactions.

Code below creates memory-optimized table and natively compiled stored procedure that inserts data there running in SERIALIZABLE isolation level. Any inserts done through this stored procedure guarantee uniqueness of the ProductName even in multi-user concurrent environment.

The SELECT query builds transaction scan set, which will be used for serializable rule validation. That validation would fail if any other sessions inserted a row with the same ProductName while transaction is still active. Unfortunately, the first release of In-Memory OLTP does not support subqueries and it is impossible to write the code using IF EXISTS construct.

create table dbo.ProductsInMem
(
    ProductId int not null identity(1,1)
        constraint PK_ProductsInMem
        primary key nonclustered hash
        with (bucket_count = 65536),
    ProductName nvarchar(64) 
        collate Latin1_General_100_BIN2 not null,
    ShortDescription nvarchar(256) not null,

    index IDX_ProductsInMem_ProductName nonclustered(ProductName)
)
with (memory_optimized = on, durability = schema_and_data);
create procedure dbo.InsertProduct
(
    @ProductName nvarchar(64) not null
    ,@ShortDescription nvarchar(256) not null
    ,@ProductId int output
)
with native_compilation, schemabinding, execute as owner
as
begin atomic with
(
    transaction isolation level = serializable
    ,language = N'English'
)
    declare
        @Exists bit = 0

    -- Building scan set and checking existense of the product
    select @Exists = 1
    from dbo.ProductsInMem
    where ProductName = @ProductName

    if @Exists = 1
    begin
	;throw 50000, 'Product Already Exists', 1;
	return
    end

    insert into dbo.ProductsInMem(ProductName, ShortDescription)
    values(@ProductName, @ShortDescription);

    select @ProductID = scope_identity()
end

You can validate behavior of the stored procedure by running it in two parallel sessions as shown in Figure 2 below. Session 2 successfully inserts a row and commits the transaction. Session 1, on the other hand, would fail on commit stage with Error 41325.

Figure 2. dbo.InsertProduct Call from Two Parallel Sessions

Obviously, this approach would work and enforce the uniqueness only when you have full control over data access tier and have all INSERT and UPDATE operations performed through the specific set of stored procedures and/or code. INSERT and UPDATE statements executed directly against a table could easily violate uniqueness rules. However, you can reduce the risk by revoking INSERT and UPDATE permissions from the users giving them the EXECUTE permission on the stored procedures instead.

You can use the same technique to enforce referential integrity rules. Code below creates Orders and OrderLineItems tables and two stored procedures InsertOrderLineItems and DeleteOrders enforcing referential integrity between those tables. I am omitting OrderId update scenario, which is very uncommon in the real life.

create table dbo.Orders
(
    OrderId int not null identity(1,1)
        constraint PK_Orders
        primary key nonclustered hash 
        with (bucket_count=1049008),
    OrderNum varchar(32) 
        collate Latin1_General_100_BIN2 not null,
    OrderDate datetime2(0) not null
        constraint DEF_Orders_OrderDate
        default GetUtcDate(),
    /* Other Columns */
    index IDX_Orders_OrderNum
    nonclustered(OrderNum)
)
with (memory_optimized = on, durability = schema_and_data);

create table dbo.OrderLineItems
(
    OrderId int not null,
    OrderLineItemId int not null identity(1,1)
        constraint PK_OrderLineItems
        primary key nonclustered hash 
        with (bucket_count=4196032),
    ArticleId int not null,
    Quantity decimal(8,2) not null,
    Price money not null,
    /* Other Columns */

    index IDX_OrderLineItems_OrderId
    nonclustered hash(OrderId)
    with (bucket_count=1049008)
)
with (memory_optimized = on, durability = schema_and_data);
go

create type dbo.tvpOrderLineItems as table
(
    ArticleId int not null
        primary key nonclustered hash
        with (bucket_count = 1024),
    Quantity decimal(8,2) not null,
    Price money not null
    /* Other Columns */
)
with (memory_optimized = on);
go

create proc dbo.DeleteOrder
(
    @OrderId int not null
)
with native_compilation, schemabinding, execute as owner
as
begin atomic
with 
(
    transaction isolation level = serializable
    ,language=N'English'
)
    -- This stored procedure emulates ON DELETE NO ACTION 
    -- foreign key constraint behavior
    declare
        @Exists bit = 0

    select @Exists = 1
    from dbo.OrderLineItems
    where OrderId = @OrderId

    if @Exists = 1
    begin
        ;throw 60000, 'Referential Integrity Violation', 1;
        return
    end
    
    delete from dbo.Orders where OrderId = @OrderId
end
go

create proc dbo.InsertOrderLineItems
(
    @OrderId int not null
    ,@OrderLineItems dbo.tvpOrderLineItems readonly 
)
with native_compilation, schemabinding, execute as owner
as
begin atomic
with 
(
    transaction isolation level = repeatable read
    ,language=N'English'
)
    declare
        @Exists bit = 0

    select @Exists = 1
    from dbo.Orders
    where OrderId = @OrderId

    if @Exists = 0
    begin
        ;throw 60001, 'Referential Integrity Violation', 1;
        return
    end
    
    insert into dbo.OrderLineItems(OrderId, ArticleId, Quantity, Price)
        select @OrderId, ArticleId, Quantity, Price
        from @OrderLineItems
end

It is worth noting that InsertOrderLineItems procedure is using REPEATABLE READ isolation level. In this scenario, we need to make sure that referenced Order row has not been deleted during the execution and REPEATABLE READ enforces that introducing less overhead than SERIALIZABLE isolation level.

That technique comes with another small benefit. It can demonstrate advantage of having dedicated data access tier to application developers helping to convince them to build one. All of us, database professionals, would like it, don’t we? 🙂

Source code is available for download.

Next: Addressing 8,060-byte Maximum Row Size Limitation

Plan Cache: Parameter Sniffing

There are several reasons why it is beneficial to use parameters in the queries. It mitigates security risks by protecting from SQL Injection. Moreover, it helps with the performance. Query optimization is expensive and resource-intensive process. SQL Server reduces that cost by keeping execution plans in the special part of the memory called Plan Cache and reusing them on subsequent calls.

Even though, this approach benefits the system by reducing the number of query compilations, it could also introduce some issues. Cached plans can become inefficient due to the data distribution and/or parameter values change. SQL Server tracks changes in the data by counting number of changes of the index columns outdating statistics and recompiling queries when number of changes reach about 20% of number of rows in the table (trace flag 2371 can reduce that threshold). However, tracking parameter values change is more complicated.

When SQL Server compiles parametrized query or stored procedure, it analyzes parameters and generates execution plan, which is optimal for those specific values. This process is called Parameter Sniffing. Furthermore, generated plans are cached and reused regardless of parameter values at time of reuse. That behavior can lead to the situations when query is compiled and plan is generated based on atypical parameter set. Cached plan, in this case, would be suboptimal for the calls that reuse that plan.

All of us experienced the situations when query or stored procedure suddenly became very slow even when they ran fine before. Parameter sniffing could be the reason of why it happened. SQL Server decided to recompile the query due to statistics update and the query that triggered recompilation had atypical parameter values. Even though, generated execution plan was efficient for parameters at time of compilation; that cached plan is not necessarily efficient for the other queries that reuses it.

Historically, parameter sniffing has the negative connotation in SQL Server community. It is used to explain why queries have suboptimal execution plans, like I just did in the previous paragraph. This is vastly wrong, however. Parameter sniffing is just the name for SQL Server behavior of analyzing parameters during query compilation.

Let’s look at a few examples of parameter sniffing and discuss how to address the issues it can introduce. In this post, I am using the stored procedure; however, everything would work the same in case of parametrized queries.

As the first step, let’s create a table dbo.Employees. We will populate it distributing data in the way that most rows have Country column value of USA with a few employees with Country value of Canada. Listing below shows the code.

create table dbo.Employees
(
	ID int not null,
	Number varchar(32) not null,
	Name varchar(100) not null,
	Salary money not null,
	Country varchar(64) not null,

	constraint PK_Employees
	primary key clustered(ID)
);

;with N1(C) as (select 0 union all select 0) -- 2 rows
,N2(C) as (select 0 from N1 as T1 cross join N1 as T2) -- 4 rows
,N3(C) as (select 0 from N2 as T1 cross join N2 as T2) -- 16 rows
,N4(C) as (select 0 from N3 as T1 cross join N3 as T2) -- 256 rows
,N5(C) as (select 0 from N4 as T1 cross join N4 as T2 ) -- 65,536 rows
,Nums(Num) as (select row_number() over (order by (select null)) from N5)
insert into dbo.Employees(ID, Number, Name, Salary, Country)
	select 
		Num, 
		convert(varchar(5),Num), 
		'USA Employee: ' + convert(varchar(5),Num), 
		40000,
		'USA'
	from Nums;

;with N1(C) as (select 0 union all select 0) -- 2 rows
,N2(C) as (select 0 from N1 as T1 cross join N1 as T2) -- 4 rows
,N3(C) as (select 0 from N2 as T1 cross join N2 as T2) -- 16 rows
,Nums(Num) as (select row_number() over (order by (select null)) from N3)
insert into dbo.Employees(ID, Number, Name, Salary, Country)
	select 
		65536 + Num, 
		convert(varchar(5),65536 + Num), 
		'Canada Employee: ' + convert(varchar(5),Num), 
		40000,
		'Canada'
	from Nums;

create nonclustered index IDX_Employees_Country
on dbo.Employees(Country);

As a next step, let’s create a stored procedure that calculates the average salary for employees in a specific country.

create proc dbo.GetAverageSalary @Country varchar(64)
as
begin
	select Avg(Salary) as [Avg Salary]
	from dbo.Employees
	where Country = @Country
end

As you can guess, when we call this stored procedure with parameter @Country=’USA’, stored procedure needs to touch almost every row in the table and the most optimal execution plan is Clustered Index Scan. However, when we call the stored procedure with parameter @Country=’Canada’, the most efficient execution plan is Nonclustered Index Seek with Key Lookup operators.

Let’s call the stored procedure twice: the first time with @Country=’USA’ and the second time with @Country=’Canada’, as shown below.

exec dbo.GetAverageSalary @Country='USA';
exec dbo.GetAverageSalary @Country='Canada';

As you can see in Figure 1, SQL Server compiles the stored procedure and caches the plan with the first call and reuses it later. Even though such a plan is less efficient with the @Country=’Canada’ parameter value, it may be acceptable when those calls are rare, which is expected with such a data distribution.

01. Execution plans when the stored procedure is compiled with @Country=’USA’

Now let’s take a look what happens if we swap those calls when the plan is not cached. We will use the DBCC FREEPROCCACHE command, which clears plan cache (DO NOT RUN IT ON PRODUCTION SERVER). As I already mentioned, another instance when this might happen is with a statistics update that forces query to recompile.

dbcc freeproccache
go

exec dbo.GetAverageSalary @Country='Canada';
exec dbo.GetAverageSalary @Country='USA';

As you can see in Figure 2, SQL Server now caches the plan based on the @Country=’Canada’ parameter value. Even though this plan is more efficient when the stored procedure is called with @Country=’Canada’, it is highly inefficient for @Country=’USA’ calls.

02. Execution plans when the stored procedure is compiled with @Country=’Canada’

There are a few ways to address the issue. You can force the recompilation of either stored procedure using EXECUTE WITH RECOMPILE or a statement-level recompile with OPTION (RECOMPILE) clauses. Obviously, a statement-level recompile is better because it performs the recompilation on a smaller scope. SQL Server sniffs the parameter values at the time of the recompilation, generating the optimal execution plan for each parameter value. It is also worth mentioning, that statement-level recompile allows SQL Server to obtain number of rows stored in table variables, which can help to generate more efficient execution plans in some cases.

alter proc dbo.GetAverageSalary @Country varchar(64)
as
begin
	select Avg(Salary) as [Avg Salary]
	from dbo.Employees
	where Country = @Country
	option (recompile)
end
go

exec dbo.GetAverageSalary @Country='Canada';
exec dbo.GetAverageSalary @Country='USA';

As you can see in Figure 3, SQL Server recompiles the statement on every call, and it generates the most efficient execution plan for every parameter value. It is also worth noting that SQL Server does not cache plans for the statements when statement-level recompile is used.

03. Execution plans with OPTION (RECOMPILE) hint

The statement-level recompile may be a good option when the queries do not execute very often or in the case of complex queries, when the compilation time is just a fraction of the total execution time. However, it is hardly the best approach for OLTP queries, which are constantly running in the system due to the extra CPU load that recompilation introduces.

Another option is using an OPTIMIZE FOR hint, which forces SQL Server to optimize a query for the specific parameter values provided in the hint. Listing below illustrates such an approach.

alter proc dbo.GetAverageSalary @Country varchar(64)
as
begin
	select Avg(Salary) as [Avg Salary]
	from dbo.Employees
	where Country = @Country
	option (optimize for(@Country='USA'))
end
go

exec dbo.GetAverageSalary @Country='Canada';
exec dbo.GetAverageSalary @Country='USA';

As you can see in Figure 4, SQL Server ignores the parameter value during the compilation and optimizes the query for the @Country=’USA’ value.

04. Execution plans with OPTIMIZE FOR hint

Unfortunately, the OPTIMIZE FOR hint introduces supportability issues, and it can lead to suboptimal execution plans in cases where the data distribution has changed. Let’s consider the situation, albeit unrealistic, when a company and all of its employees moved from the United States to Germany.

update dbo.Employees set Country='Germany' where Country='USA';

exec dbo.GetAverageSalary @Country='Germany';

Statistics are outdated at the time of the update, which forces SQL Server to recompile the statement in the stored procedure. At this point, there are no rows in the table with Country=’USA’, and the recompilation produces a suboptimal execution plan, as shown in Figure 5. As a side note, the query uses more reads than before due to the index fragmentation introduced by the update.

05. Inefficient execution plan after data distribution change

SQL Server 2008 introduced another optimization hint, OPTIMIZE FOR UNKNOWN, which helps to address such situations. With this hint, SQL Server performs an optimization based on the most statistically common value in the table. Listing below shows the code involved in doing this.

alter proc dbo.GetAverageSalary @Country varchar(64)
as
begin
	select Avg(Salary) as [Avg Salary]
	from dbo.Employees
	where Country = @Country
	option (optimize for(@Country UNKNOWN))
end
go

exec dbo.GetAverageSalary @Country='Canada';

Figure 6 illustrates the execution plan. Country=’Germany’ is the most statistically common value in the table, and therefore SQL Server generates an execution plan that is optimal for such a parameter value.

06. Execution plan with OPTIMIZE FOR UNKNOWN hint

You can achieve the same results with an OPTIMIZE FOR UNKNOWN hint by using local variables instead of parameters. This method also works with SQL Server 2005, where the OPTIMIZE FOR UNKNOWN hint is not supported. Listing below illustrates that approach. It introduces the same execution plan with a Clustered Index Scan, as shown in Figure 6 above.

alter proc dbo.GetAverageSalary @Country varchar(64)
as
begin
	declare
		@CountryTmp varchar(64)
	set @CountryTmp = @Country

	select Avg(Salary) as [Avg Salary]
	from dbo.Employees
	where Country = @CountryTmp
end

Finally, in case when you are dealing with 3rd party systems and cannot add optimization hints to the code, you could use plan guides to achieve the same goal. We will discuss plan guides in one of the future blog posts.

Source code is available for download.

Data Partitioning – Why do we want to partition the data?

Even if we spoke about table partitioning a long time ago, I’d like to circle back to this question from the different, architectural standpoint. In the next a few posts we will discuss why we want to partition data and what options do we have to do that.

Data Partitioning is the complex and time consuming process. So as the first step I’d like to explain why we want to go through all the efforts to do that.

Reason #1 – Without partitioning everything is in the same place.

Quite obvious, is not it? And it’s not necessarily bad. One of advantages of when data is in the same place – it makes development and optimization very easy. Execution plans are predictable but that’s all about it. The biggest problem is that when everything is in the same table (or better say partition) it
stored on the same filegroup, same set of files and same disk array. Well, technically speaking we can separate our clustered and non-clustered indexes between different filegroups but at the end it introduces its own set of the issues especially in disaster recovery situations.

The problem here is that in every system we have data that belongs to the different “groups” / types – operational and historical. Duration of the operational period (which is basically customer facing OLTP portion of the system) varies – could be a day, week, month, maybe even year(s) but in the large system
there is always additional historical data and often there is a lot of historical data.

Obviously we care more about operational (customer facing) performance and would like to put operational data to the fastest disk array possible. But when everything is in the same place we don’t have a lot of choices – we either need to buy disk array big enough to store historical data, which could be
terribly expensive and generally speaking is waste of money, or compromise and go with slower disks due budget constraints.

Reason #2 – The same schema, indexes and compression

Operational and historical data usually have different access patterns. As I mentioned above, operational data is usually OLTP-related data. So we have a lot of short optimized queries. Historical data is often used for analysis and reporting – e.g. we are talking about Data Warehouse type access. Different
access patterns require different index structures. Again, when we have everything in the one table we don’t have a lot of choices. We either create additional indexes to support queries against historical data and hurt performance of OLTP part of the system or, alternatively, do not have those indexes and, as result, have bad performance of the queries against historical portion of the data.

Another thing is that in some cases we want to have slightly different schemas for those portions of the data. One of the examples – operational data can have some additional columns used for the processing. There is the good chance that historical data does not necessarily need them and when we have a lot of
records every byte counts.

Lastly, we cannot partially compress our data. Compression could help a lot with historical data which is relatively static – we are increasing performance by reducing number of pages and, as result, amount of IO operations. On the other hand, for operational data compression usually hurts performance because
data is changing quite often.

Reason #3 – Index maintenance

Same thing as above. Generally speaking we don’t need to rebuild or reorganize the indexes for historical data which is static. But there is no way we can do rebuild/reorg on the part of the data.

Reason #4 – Backup strategy

Same thing again. All data is in the same filegroup. We cannot exclude historical portion from the backups. As result, it increases backup time, size of backup file and introduces additional overhead during the process

Reason #5 – Disaster recovery

One of the biggest benefits of Enterprise Edition of SQL Server is piecemeal restore. It allows us to bring system partially online on filegroup by filegroup basis. If we had operational and historical data separated to the different filegroups we could restore operational part first and make system available
to the customers while we are working on historical part. With everything in the same non-partitioned table it’s impossible. We will need to restore everything first before system becomes available and online.

Reason #6 – Statistics

SQL Server needs to estimate number of rows on the every step of the execution plan to make it efficient. In order to do so SQL Server uses statistics and histogram in
particular. Histogram contains some values from the key as well as the information about data distributions in the intervals of the values. The problem is that histogram contains
at most 200 steps/values. So more data we have in the table, bigger intervals are. Approximations are done on each interval and as result our estimations are less accurate.

Another thing that’s worth to mention is that by default SQL Server updates statistics only after 20% of the key values are updated. So if we have 1 million rows in the table, we can insert 200,000 new rows, delete 200,000 rows or update them before statistics update is triggered. Obviously, more rows we have,
less often statistics would be updated.

That list is not completed by any means. But each of those reasons is big enough by itself to start thinking about data partitioning.

Next: When to partition the data

Locking in Microsoft SQL Server (Part 14) – Deadlocks during DDL operations (alteration, partition switch, etc)

Today I’d like us to talk about one practical problem related with the locking – how to deal with deadlocks during DDL operations. For example, table and partition functions alteration, partition switch and others. This problem is very common in OLTP environment when table is constantly querying by the multiple sessions. One of the common examples there is sliding window pattern when the table has been partitioned based on some time interval (daily, weekly, monthly, etc) and data has been purged on the regular basis by switching partition to the temporary table. Let’s try to understand why we have deadlocks here and what we can do to solve the problem.

First of all, as the pre-requirement, let’s assume that we implemented our partitioning correctly. Most important factor is that our operation does not require any data movement nor scans. For example, we don’t split ranges in partition function in the way that requires  physically move the data to another partition (e.g. we are pre-allocating empty partition rather than splitting the range with the value at the middle of the partition that has some data). In that case partition operations are purely metadata operations and only schema locks are involved.

Another thing to avoid is mixing DML and DDL statements in one transaction. Let’s think about following example (let’s ignore page locks and schema stability (SCH-S) locks for simplicity sake):

  1. Session 1 starts transaction and updates 1 row in the table. E.g. at this stage we will have (X) lock on the row and (IX) lock on the table
  2. Session 2 is trying to select data from the table and being blocked by Session 1. So now we have Session 2 waiting for (S) lock on the row and also have (IS) lock acquired on the table
  3. Now session 1 is trying to implement partition switch. For that operation (SCH-M) lock needs to be acquired on the table and it would be blocked by (IS) lock from the Session 2.

Classic deadlock. But let’s assume that this is not the case. What happens when we have multiple locks on the same object? Generally speaking, lock requests would be serialized. In some cases existing locks could be even re-used. Let’s see that. First, let’s create a table and add a few records there.  Next, let’s have session 1 to read one row in repeatable read mode. As we remember it would hold (S) lock on the resource. Again, for simplicity sake let’s look at the row (key) level locks only

Now let’s try to update the row in another session. This session will try to acquire (X) lock on the resource and would be blocked because (X) lock is not compatible with (S) lock.

Now let’s try to read the same row in read committed mode from the 3rd session. As we can see everything works without blocking.

Let’s try to look at what locks were acquired using SQL Profiler.

As you see session acquired intent (IS) locks but did not acquire shared (S) lock. There is (S) lock from the session 1 already which guarantees that row has not been modified by uncommitted transactions. No reasons to acquire another (S) lock because in read committed isolation level (S) locks on the rows have been released immediately.

But what will happen if we try to read the row in repeatable read mode again? In that isolation level session needs to keep the (S) lock until end of transaction and it’s a reason why it’s trying to acquire the lock.

As you can see – it’s being blocked because there is (X) lock in the queue. So in order to be granted lock needs to be compatible with the all locks on that resource – does not matter granted or now.

It’s worth to mention that the first scenario, when session 3 ran in read committed mode and did not acquire the lock on the resource, can be considered as internal optimization. In some cases SQL Server still acquires another shared (S) lock there even if there is another shared (S) lock held. In such case the query would be blocked similarly to repeatable read transaction isolation level.

Keeping all this in mind we can ask – why would we have deadlocks if locks are serialized? Would not the session with schema modification (sch-m) lock wait till other locks are released and proceed from there? Yes and no. Let’s put one other thing to the mix – lock partitioning. This is quite confusing term and does not have anything to do with table partitioning. In nutshells when server has 16 or more CPUs (think about dual quad-core with hyperthreading enabled) SQL Server starts to partition intent (IS, IX, IU) and schema stability (SCH-S) object locks on per-cpu basis. Those (intent and schema-stability) locks would be acquired and stored within 1 (or a few) lock partitions only. And when another session needs to acquire S, X, U or SCH-M locks on the object, those locks would have to be acquired across all lock partitions. As I mentioned, this behavior is on when system has 16 or more CPUs and cannot be
disabled. So let’s take a look at one of the scenarios:

  1. Session 1 executes the complex query that acquired (IS) and/or (SCH-S) lock on the lock partition = 5. (just an example)
  2. Session 2 tries to alter the table and starts to acquire (SCH-M) locks on each individual lock partition. It successfully acquired locks on partitions 1-4 and blocked on partition 5 by (IS) or (SCH-S) lock held by Session 1
  3. Session 1 tries to acquire another (IS) and/or (SCH-S) lock on lock partition = 2. It’s blocked by (SCH-M) lock held by Session 2.

Classic deadlock again.

Unfortunately there is very little we can do about it. Lock partitioning cannot be disabled with documented approaches. There is the undocumented trace flag T1229 that does the trick although using undocumented trace flags are dangerous and not recommended in production.

In case if you have dedicated data access tier (which is always the great idea) the good option would be serializing access to the table via application locks. Main idea of this method is that any code that accessing the object is acquiring application locks. DDL code acquires exclusive lock and DML code acquires shared lock. Locks would not be partitioned and as result requests would be serialized. Let’s take a look. First – let’s define the stored procedure that alter the metadata.

This is just an example. Obviously you need to think about timeouts, exception handling, retry logic and other things – but hopefully it would give you an idea.

Now let’s try to create the stored procedure that reads the data from the table. But first, we need to define what should we do in case if we are blocked and have timeouts. Of course, we can return the error code to the client but another method that can be useful in some cases is to return empty result set. In order to do that let’s create another (empty) table of the same structure with original one.

Now let’s define the stored procedure.

As you can see, if SP cannot obtain application lock within 3 seconds, it reads data from the empty table and basically returns empty result set to the client. Otherwise it would read the main table. Obviously the biggest “downside” of this method – it would not work well in the case if there is no dedicated data access tier in place and object accessed from the various places. Well, in such case old classic approach with try/catch and retry logic is your friend. And, of course, we can use SET DEADLOCK_PRIORITY to reduce the chance that the session with DDL statement would be chosen as the deadlock victim.

Source code is available for download 

Next: When Transaction Start

Table of content

‘A’ = N’A’ or how to kill the index seek

Let’s start with the example. Create table, populate it and create an index on varchar column.


Next, let’s run 2 selects. First one with varchar parameter. Plan is perfect.

Second, let’s use unicode parameter.

Oops – index scan. It does not look like the big deal – who is going to use unicode for varchar field search. But..

We deal with the client applications. Unfortunately, modern development languages implement String as the unicode. And if you don’t specify the parameter type, it would be unicode.

Let’s take a look:

As you can see, .Net application defines @UntypedParam as nvarchar(5). And it produces 21 times more reads. Another thing worth to mention – the size of the parameter matches the string length. As result, if next statement uses “100” as the parameter value, it would be nvarchar(3) – recompilation, potentially additional plan in the cache and so on.

Keep that in mind – always specify parameter type and size at the client application.

Script is available for download here. Source code is available here

Sunday T-SQL tip: How to save large data packet in SQL Server 2005

Last time we saw that TVP is the clear winner in compare with separate insert calls. Obviously this is not an option with SQL 2005. So let’s see what can we do there.

 We’re trying to accomplish 2 things:

  1. Make save process faster
  2. Minimize transaction duration

XQuery

Let’s start with element-centric xml. Let’s create the following procedure:

Element-centric insert

Now let’s run our tests again. For 5,000 records avg. time is about 8 seconds. For 50,000 records avg time is about 82-84 seconds. Quite surprising results. Let’s take a look at the execution plan.

As you can see, SQL Server basically does the join for every element/column in the XML packet. Obviously – more elements/columns you have – more joins it would produce and slower performance would be.

 Let’s check attribute-centric XML. Procedure would look almost the same:

 

 For 5,000 records avg. time is about 2,7 seconds. For 50,000 records avg time is about 27-28 seconds. Better, but still slower than classical inserts. Plan also looks slightly better, but still, it uses the same approach with joins.

 

So as we can see, XQUERY performance greatly depends on the number of columns/joins.

OpenXML

 Well. Only other option left with XML is OpenXML. Let’s modify our procedure and run our tests

 

For 5,000 records avg. time is about 1,4 seconds. For 50,000 records avg time is about 14-15 seconds. As you can see, results are much better – even better than classical approach. Only minor problem with this approach – memory. MSXML parser uses one-eights of the total memory available for SQL Server. Huh?

Temporary table approach

So looks like we’re out of luck. Only option which is faster than classical inserts is OpenXML and we cannot use it unless memory is completely not an issue.

Although let’s think what can be done if our main problem is not the performance but concurrency. With classical approach and multiple inserts in the one transaction, we will place first exclusive lock on the new row with the first insert. In some cases we cannot simply afford to keep locks on the rows for a few seconds.

If this is the main issue, we can insert data to the temporary table first and just move it to the main table in one short transaction. If we update our attribute-centric insert stored procedure and insert data to the temporary table variable first, it would increase execution time for 200-300 milliseconds (for 5000 records) although transaction duration would be just 30-40 milliseconds. I’m not posting the code here but it’s available for download.

If we have a lot of columns and xquery is not an option, we can use temporary table created on the connection level. Temporary tables are available through the execution stack – so if you create temporary table on connection level (After SqlConnection.Open() call, for example), you should have access to the table from any stored procedure called in the same connection context. As result, implementation could include 2 steps:

  1. Save data to the temporary table with separate insert statements
  2. Move data from the temporary to permanent tables in one transaction

Script and sources are available for download – “Refactoring for Performance” presentation from SQL Saturday #62

Sunday T-SQL tip: A few words about Table-Valued Parameters (TVP)

During my presentation on Sql Saturday #40 event I mentioned that one of the biggest improvements in SQL Server 2008 from development standpoint is table-valued parameters support (TVP). It’s not only simplifying development – think about passing temporary table variable as parameter from within t-sql code as well as from the client applications. It also gives great performance improvement especially if you need to save large data packet. Think about data processing tasks where client application loads large record set, processes it and saves it back in one transaction.

Let’s do some tests. Let’s create the table:

Table creationg statement

First, let’s use classical  approach – use SqlCommand and insert records in the loop

Classical approach

Now let’s use TVP. First of all we need to create a type:

TVP

Next step is creating stored procedure which we will use to insert new records

Insert procedure

Next, let’s modify the code to use TVP and stored procedure

TVP approach

It worth to mention that client code needs to create and populate DataTable object to assign it to the parameter. Obviously it requires some time, on other hand, this is client code – transaction has not been started yet.

Finally, let’s run a few tests:

For the package of 5000 records, avg execution time of the classical method  on my laptop is in around 2,600-2,700 milliseconds. TVP approach takes only about 180 milliseconds where 60 milliseconds is DataTable creation. So actual transaction time takes in around 130 milliseconds only – in about 20 times faster than classical method.

For the package of 50,000 records, avg execution times respectively are 20,500 ms and 1,300 ms with 300 ms as DataTable creation.

As you can see, it’s not only the speed, the transaction duration is about 20 times shorter. Huge difference.

Obviously, if you’re on SQL Server 2005, TVP is not an option. Although there are a couple methods which could help you to improve the classical method. I will show it to you in a few days.