Category Archives: SQL Server 2005

Size does matter: 10 ways to reduce the database size and improve performance in SQL Server

As the database professional, I like multi-terabyte databases. They are fun to deal with; they give you priceless experience and look cool in your resume. Last but not least, customers with multi-terabyte databases do not have problems with multi-kilodollar invoices. Customers usually understand amount of work involved in such projects.

The problem, however, is that the large databases are not good for the customers. Those databases are more expensive to support and maintain. You need to have highly skilled professionals in the team. Professionals, who can design nontrivial solutions in all areas – architecture, availability, maintenance, performance tuning, to name just a few.

Large databases usually require powerful hardware to run. There is, of course, very subtle difference between the size of the database and size of the active (hot) data. It is entirely possible that applications deal only with the fraction of the data stored in the database and, therefore, even mediocre server can handle the load. However, on the bare minimum, there is always the storage cost.

The projects when you have to reduce the size of the databases are very common. Even though size reduction is rarely the primary objectives of such projects, reducing the size often helps to achieve other goals. Think about designing Disaster Recovery (DR) strategy. Plenty of things that can help to meet strict RTO requirements and smaller database size definitely helps.

Today, I am going to discuss several methods that can help in reducing database size. Some of them are fully transparent to the client applications; others require regression testing and/or code refactoring. I would also focus on the data files only – troubleshooting and reducing transaction log size is the different topic.

0. Find the worst offenders

In the nutshell, the database files are just the containers for the data. Some space in the data files is allocated and used by the database objects; however, there is usually some unallocated space. For example, if you created the database with one 1GB data file, you would have 1 GB file on disk. However, immediately after creation, it would have just a handful of the data pages allocated inside the file and most part of the file would be free (unallocated).

It is completely normal to have free space in the data files, especially if amount of data is constantly growing. However, excessive amount of the free space could consume unnecessary space on the disks. The script below helps you to analyze amount of allocated and unallocated space on per-database file basis.

select 
    f.type_desc as [Type]
    ,f.name as [FileName]
    ,fg.name as [FileGroup]
    ,f.physical_name as [Path]
    ,f.size / 128.0 as [CurrentSizeMB]
    ,f.size / 128.0 - convert(int,fileproperty(f.name,'SpaceUsed')) / 
        128.0 as [FreeSpaceMb]
from 
    sys.database_files f with (nolock) left outer join 
        sys.filegroups fg with (nolock) on
            f.data_space_id = fg.data_space_id
option (recompile)

Figure 1 illustrates the output of the script from one of the production servers. Even though the output shows significant amount of free space in the data files, it may or may not be a problem. The decision if that space needs to be deallocated depends on the other factors, such as data growth expectations, disk system configuration and a few others.

01. Allocated and Free space in the database files

We will talk how to handle the situations when free space needs to be deallocated later, for now, let’s focus on the data, and discuss what we can do to reduce its size.

All of us are familiar with Pareto principle, which is also known as 80/20 rule. To simplify, in most projects, you can achieve 80% of improvements by spending 20% of time or resources. That ratio is even more severe when we search for the most space consuming objects in the database. Even with very large databases, usually the most part of the space is consumed by just a handful of tables. Obviously, we would like to focus on them – at least at the beginning of the process.

Let’s do one step backwards, however, and remember how SQL Server stores the data. Every on-disk table has the main copy of the data, which stored either in the clustered index or in heap. In addition, every table can have the set of nonclustered indexes that store the copy of the data for some of columns and reference the main copy of the rows (in the clustered index or heap). For the purpose of this discussion, let’s talk about generic indexes without any further differentiation between their types.

Every index consists of the data pages that can store the data that belong to the different allocation units. IN_ROW allocation unit contains main data rows, which includes internal row structures, fixed-length column data and variable-length column data that fits in IN_ROW data page. LOB allocation unit contains data for variable-length column data (including data types such as XML, CLR UDT, etc)that is greater than 8,000 bytes in size. Finally, ROW_OVERFLOW allocation unit contains data for variable-length column data that does not exceed 8,000 bytes but does not fit IN_ROW.

For example, if you created the table of the following structure and inserted one row there as it is shown below, you would have data for column C5 stored in LOB allocation units, data for one of either C3 or C4 columns stored in ROW_OVERFLOW allocation unit and data for all other columns stored in IN_ROW allocation unit. It is also worth mentioning that main row data in IN_ROW data would have the pointers to the data stored in the other allocation unit.

create table dbo.T1
(
    C1 int,
    C2 datetime,
    C3 varchar(5000),
    C4 varchar(5000),
    C5 varchar(max)
);

insert into dbo.T1(C1, C2, C3, C4, C5)
values
(
    1 /* C1 */
    ,GetUtcDate() /* C2 */
    ,Replicate('A',5000) /* C3 */
    ,Replicate('B',5000) /* C4 */
    ,Replicate(convert(varchar(max),'A'),10000) /* C5 */
)

You can read more about row structure and allocation units here: http://aboutsqlserver.com/storage-engine/

The script below provides you the information about space allocation on per-index basis in the database. You can easily modify it to aggregate on per-table basis or drill-down on per-allocation unit basis; however, at that stage I usually prefer to work on the index level.

;with SpaceInfo(ObjectId, IndexId, TableName, IndexName
    ,Rows, TotalSpaceMB, UsedSpaceMB)
as
( 
    select  
        t.object_id as [ObjectId]
        ,i.index_id as [IndexId]
        ,s.name + '.' + t.Name as [TableName]
        ,i.name as [Index Name]
        ,sum(p.[Rows]) as [Rows]
        ,sum(au.total_pages) * 8 / 1024 as [Total Space MB]
        ,sum(au.used_pages) * 8 / 1024 as [Used Space MB]
    from    
        sys.tables t with (nolock) join 
            sys.schemas s with (nolock) on 
                s.schema_id = t.schema_id
            join sys.indexes i with (nolock) on 
                t.object_id = i.object_id
            join sys.partitions p with (nolock) on 
                i.object_id = p.object_id and 
                i.index_id = p.index_id
            cross apply
            (
                select 
                    sum(a.total_pages) as total_pages
                    ,sum(a.used_pages) as used_pages
                from sys.allocation_units a with (nolock)
                where p.partition_id = a.container_id 
            ) au
    where   
        i.object_id > 255
    group by
        t.object_id, i.index_id, s.name, t.name, i.name
)
select 
    ObjectId, IndexId, TableName, IndexName
    ,Rows, TotalSpaceMB, UsedSpaceMB
    ,TotalSpaceMB - UsedSpaceMB as [ReservedSpaceMB]
from 
    SpaceInfo		
order by
    TotalSpaceMB desc
option (recompile)

Figure 2 shows the output of this script (table and index names are obscured). As you can see, this could quickly pinpoint the indexes that consume most part of the space in the database.

02. Indexes that consume the most space in the database

Now, let’s see what we can do to reduce their size.

1. Reducing Index Fragmentation

As you know, SQL Server stores on-disk table data on the 8KB data pages. Each data page contains data for one or multiple rows. With the exception of index creation or rebuild, SQL Server tries to populate pages in full during normal data modification operations. When data does not fit, for example, when the data pages does not have enough space to accommodate the new row, SQL Server performs the page split operation. In the nutshell, SQL Server allocates another data page and moves about half of the data from original to the new page, which frees up some space to accommodate the new row on the original data page.

Page split operations lead to the index fragmentation, which exists in two kinds. External fragmentation means that the logical order of the pages does not match their physical order, and/or logically subsequent pages are not located in the same or adjacent extents (extent is the group of 8 pages). Such fragmentation forces SQL Server to jump around reading the data from the disk, which makes read-ahead less efficient and increases the number of physical reads required. Moreover, it increases random disk I/O, which is far less efficient when compared to sequential I/O in the case of magnetic hard drives.

Internal fragmentation, on the other hand, means that data pages in the index have free space. As a result, the index uses more data pages to store data. It also increases the number of reads during query execution and amount of memory in buffer pool to cache index pages.

A small degree of internal fragmentation is not necessarily bad. It reduces page splits during insert and update operations when data is inserted into or updated from different parts of the index. Nonetheless, a large degree of internal fragmentation wastes index space and reduces the performance of the system. Moreover, for indexes with ever-increasing keys, for example on identity columns, internal fragmentation is not desirable because the data is always inserted at the end of the index.

You can monitor both, internal and external fragmentation with sys.dm_db_index_physical_stats DMV. Internal fragmentation can be monitored with avg_page_space_used_in_percent column. Lower value in the column indicates higher degree of internal fragmentation.

Let’s take a look at the example and analyze internal fragmentation of one of the indexes with the script below. For simplicity sake, I am using relatively small table; however, you would obviously like to focus on the largest indexes during the tuning process.

select 
    index_id, partition_number, alloc_unit_type_desc
    ,index_level, page_count, avg_page_space_used_in_percent
from 
    sys.dm_db_index_physical_stats
    (
        db_id() /*Database */
        ,object_id(N'dbo.MyTable) /* Table (Object_ID) */
        ,1 /* Index ID */
        ,null /* Partition ID – NULL – all partitions */
        ,'detailed' /* Mode */
    )

Figure 3 illustrates partial output of the script. The table is partitioned and, as result, you will see separate rows in the result – one per partition per allocation unit.

03. Internal Fragmentation in the Index

You can remove internal fragmentation by rebuilding the index. Figure 4 illustrates the output of sys.dm_db_index_physical_stats after the index rebuild with FILLFACTOR=100 (more on it later)

04. Internal Fragmentation after Index Rebuild

Figure 5 illustrates amount of space index used before (on the left side) and after (on the right side) rebuild. As you can see, index rebuild was able to free up more than 40% of the space index consumed before the rebuild.

05. Index Size before and after rebuild

FILLFACTOR parameter controls amount of free space SQL Server reserves on the pages during index creation and rebuild operation. For example, FILLFACTOR=80 forces SQL Server to make data pages about 80% full keeping 20% of the page space reserved. This could reduce the number of page splits and internal fragmentation when new rows are inserted to the middle of the index or updated in the way that increase their size. It is very important to remember that FILLFACTOR is applied only during index creation or rebuild stages. After index is created, SQL Server continue populates pages in full up to 100% performing page splits as needed.

As you can guess, the optimal solution would require fine-tuning FILLFACTOR and designing index maintenance strategy in the way that keeps internal fragmentation at minimum most part of the time. Unfortunately, there is no “one size fits all” advice in terms of FILLFACTOR. You should try to figure out the most optimal value by using different FILLFACTOR values and monitor how it would affect your fragmentation. You can start with FILLFACTOR value close to 100 and gradually decrease it by 5 until you find the sweet spot. Alternatively, you can monitor page splits in real time using transaction_log extended event tracking LOP_DELETE_SPLIT operation changing value based on amount of splits (you can see more on it at Jonathan Kehayias’ blog),

Lastly, you should remember than index rebuild creates another copy of the index during the process. In fact, it could increase the size of the data file on disk during the operation. Moreover, it generates large amount of transaction log records that could also affect transaction log size, network load and size of send and redo queues if database mirroring or AlwaysOn AG are in use.

2. Implementing Data Compression

If you are the lucky enough to have Enterprise Edition of SQL Server, you can reduce the size of the data by implementing Data Compression. There are two types of Data Compression supported in SQL Server – Row and Page.

Row compression addresses the storage inefficiency introduced by fixed-length data types. By default, in non-compressed row, size of the fixed-length data is based on the data type size. For example, INT column would always use 4 bytes, regardless of the value – even when it is NULL. Row compression addresses that and removes such an overhead. For example, INT value of 255 would use just 1 byte rather than 4 bytes.

Page compression goes one-step further and implements dictionary-based compression removing repetitive sequences of bytes on the page. I am not going to dive deep in the storage format of the compressed rows here and will follow up with additional blog post at some point. Alternatively, you can read about it in my book.

The actual results would greatly depend on the data and the schema. As you can guess, row compression would be beneficial when table has fixed-length data columns. More fixed-length columns you have, better the space savings are. Results of the page compression, on the other hand, depend on how repetitive is the data on the page. You can use sp_estimate_data_compression_savings system stored procedure to estimate compression results for your data. That procedure works by copying and compressing the sample of your data in tempdb measuring compression results.

It is also very important to remember that data compression works with IN_ROW allocation units only. It does not compress LOB nor ROW_OVERFLOW data.

Obviously, there is an overhead. Compression and decompression adds additional CPU load to the system. That overhead is relatively small for the ROW compression, especially when you read the data; however, for PAGE compression that overhead is more significant. There is the catch, though. While compression adds the load to CPU, it reduces I/O load in the system – SQL Server needs to issue less I/O operations due to the smaller data size. In the end, the queries could execute even faster especially on the systems that are not heavily CPU bound.

There is also the overhead during the batch operations. Batch inserts and updates could take more time when data is compressed. The same applies to the index maintenance. Just to give you some numbers, I ran a few tests at time when I worked on the book. I was using the data from one of the production tables with a decent number of fixed- and variable-length columns. Obviously, different table schema and data distribution will lead to slightly different results. However, in most cases, you would see similar patterns.

At the beginning of the tests, I have created three different heap tables and inserted one million rows into each of them. After that, I created clustered indexes with different compression settings and FILLFACTOR=100. This workflow led to zero index fragmentation and fully populated data pages.

During the first test, I ran SELECT statements to scan all of the clustered indexes accessing some row data. The second test updated every row in the tables, changing the value of the fixed-length column in a way that did not increase the row size. The third test inserted another batch of a million rows in the tables. Finally, I rebuilt all of the clustered indexes. You can see the execution statistics in Figure 5 below. All tests ran with warm cache with the data pages cached in the buffer pool. Cold cache could reduce the difference in execution times for the queries against compressed and non-compressed data, because queries against compressed data perform less physical I/O.

All statements were forced to run on a single CPU by using a MAXDOP 1 query hint. Using parallelism would decrease the query execution times, however it would add the overhead of parallelism management during query execution, which I wanted to avoid during the tests.

Figure 6 demonstrates the results of the tests.

6. Data Compression – Storage Size and Performance

Obviously, it is impossible to provide generic advice how and when to compress the data. However, in case, if system is not heavily CPU bound, I would usually suggest implementing ROW compression on the indexes with volatile data. PAGE compression, on the other hand, could be the good choice for the old static data, especially when that data accessed infrequently.

It is also impossible to avoid mentioning data partitioning when we are discussing compression. It is very common to have the situation when just small subset of the data in the table is volatile. Unfortunately, you cannot apply different compression methods to hot and cold data unless the data is partitioned (either with table partitioning or with multiple tables utilizing partitioned views). Such partitioning helps you to implement different compression schemas to different table partitions (or tables) and will allow you to reduce index maintenance overhead by rebuilding the index on partition scope.

A word of caution, however. Partitioning is the great tool that can help you to address multiple challenges especially in database administration area. Even though, table partitioning can be implemented transparently to the client applications, it could introduce plan regressions. One of such examples is described here: http://aboutsqlserver.com/2012/07/10/cautionary-tale-about-triggers-version-store-and-fragmentation/. Make sure to carefully test your application if you decided to implement table partitioning.

Finally, if you are interested in data partitioning, I would like to reference my book again. The chapter on data partitioning is the largest one there and it discusses various examples and approaches of partitioning in various editions of SQL Server.

3. Removing unused indexes

It is often possible to reduce the size of the data during the index tuning process. The main goal of the index tuning is creating the right set of indexes, which also requires you to drop existing unused and redundant indexes. Keep in mind that you always need to carefully test your system when you change the indexes making sure that there is no plan regressions after the tuning.

There are two data management views that can help you to detect non-efficient indexes. The first one, sys.dm_db_index_usage_stats, shows you the statistics on the various index operations, such as index seek and scan, index updates and a few others, along with the time of the last operation. The second DMV, sys.dm_db_index_operational_stats, dives deeper and provides an information on I/O, access methods and locking statistics on the index.

The key difference between two DMOs is how they collect data. Sys.dm_db_index_usage_stats tracks how many times an operation appeared in the execution plan. Alternatively, sys.dm_db_index_operation_stats tracks operations at the row level. For example, if query execution plan includes Key Lookup operation and SQL Server ran it twice during query execution, sys.dm_db_index_usage_stats would track the single lookup operation, while sys.dm_index_operation_stats would track two of them.

You can obtain the information about index usage with sys.dm_db_index_usage_stats by running the statement below. You can use similar approach with sys.dm_db_index_operation_stats if you need more detailed analysis.

select 
    s.Name + N'.' + t.name as [Table]
    ,i.name as [Index] 
    ,i.is_unique as [IsUnique]
    ,ius.user_seeks as [Seeks], ius.user_scans as [Scans]
    ,ius.user_lookups as [Lookups]
    ,ius.user_seeks + ius.user_scans + ius.user_lookups as [Reads]
    ,ius.user_updates as [Updates], ius.last_user_seek as [Last Seek]
    ,ius.last_user_scan as [Last Scan], ius.last_user_lookup as [Last Lookup]
    ,ius.last_user_update as [Last Update]
from 
    sys.tables t with (nolock) join sys.indexes i with (nolock) on
        t.object_id = i.object_id
    join sys.schemas s with (nolock) on 
        t.schema_id = s.schema_id
    left outer join sys.dm_db_index_usage_stats ius on
        ius.database_id = db_id() and
        ius.object_id = i.object_id and 
        ius.index_id = i.index_id
order by
    s.name, t.name, i.index_id
option (recompile)

As you can see in the output in Figure 7, you can easily pinpoint the problematic indexes. Indexes in red ovals were not used in any queries for the duration of statistics collection. Those indexes consume space in the database. Moreover, they introduce update overhead in the system. As the side note, it is also beneficial to look at the indexes where update overhead exceeds their usefulness – for example, index in green oval is used only for scans even though it is constantly updated.

07. Index Usage Statistics

Obviously, you need to be careful making sure that you captured valid usage statistics. SQL Server does not persist statistics and it resets on restarts. Statistics is also cleared whenever the database is detached or shut down when the AUTO_CLOSE database property is on. Moreover, in SQL Server 2012 and 2014, statistics resets when the index is rebuilt.

You must keep this behavior in mind during index analysis. It is not uncommon to have indexes to support queries that execute on a given schedule. As an example, you can think about an index that supports a payroll process running on a bi-weekly or monthly basis. Index statistics information could indicate that the index has not been used for reads if SQL Server was recently restarted or, in the case of SQL Server 2012 and 2014, if index was recently rebuilt.

One of the ways to address statistics reset is collecting usage statistics based on some schedule and persists results in one of the tables in the database. This could help to catch the situations when index is required to support some of the rarely executed processes. As the side note, you can consider to recreate such indexes as part of the process dropping them when processes are completed.

You should also be careful with unique indexes. It is entirely possible that such indexes are created to support uniqueness constraints and removal of such indexes would violate business requirements for the system.

4. Removing Redundant Indexes

As you know, SQL Server can use composite index for an Index Seek operation as long as a query has a SARGable predicate on the leftmost query column. Ok, I know, it is confusing so let’s look at the example and create a table with clustered and two nonclustered indexes.

create table dbo.Employee
(
    EmployeeId int not null,
    LastName nvarchar(64) not null,
    FirstName nvarchar(64) not null,
    DateOfBirth date not null,
    Phone varchar(20) null,
    Picture varbinary(max) null
);
 
create unique clustered index IDX_Employee_EmployeeId 
on dbo.Employee(EmployeeId);
 
create nonclustered index IDX_Employee_LastName_FirstName
on dbo.Employee(LastName, FirstName);
 
create nonclustered index IDX_Employee_LastName
on dbo.Employee(LastName);

SQL Server can utilize IDX_Employe_LastName_FirstName index if query has the predicate on the LastName regardless of existence of the predicate on the FirstName. For example, both of the queries below would be able to use that index:

select EmployeeId, LastName, FirstName, DateOfBirth
from dbo.Employee
where LastName = @LastName and FirstName = @FirstName

select EmployeeId, LastName, FirstName, DateOfBirth
from dbo.Employee
where LastName = @LastName

Thus, the index IDX_Employee_LastName is redundant and can be dropped. There are, of course, always exceptions from the rule. IDX_Employee_LastName index stores less data and, therefore, it is more compact. If you have a process that constantly scan the index, smaller index size could be beneficial. However, those situations are very rare and usually update overhead of the extra index is not worth such small performance improvement on SELECT queries.

The script below shows you potentially redundant indexes by checking the indexes that have the same leftmost columns.

select
    s.Name + N'.' + t.name as [Table]
    ,i1.index_id as [Index1 ID], i1.name as [Index1 Name]
    ,dupIdx.index_id as [Index2 ID], dupIdx.name as [Index2 Name] 
    ,c.name as [Column]
from 
    sys.tables t join sys.indexes i1 on
        t.object_id = i1.object_id
    join sys.index_columns ic1 on
        ic1.object_id = i1.object_id and
        ic1.index_id = i1.index_id and 
        ic1.index_column_id = 1  
    join sys.columns c on
        c.object_id = ic1.object_id and
        c.column_id = ic1.column_id      
    join sys.schemas s on 
        t.schema_id = s.schema_id
    cross apply
    (
        select i2.index_id, i2.name
        from
            sys.indexes i2 join sys.index_columns ic2 on       
                ic2.object_id = i2.object_id and
                ic2.index_id = i2.index_id and 
                ic2.index_column_id = 1  
        where	
            i2.object_id = i1.object_id and 
            i2.index_id > i1.index_id and 
            ic2.column_id = ic1.column_id
    ) dupIdx     
order by
    s.name, t.name, i1.index_id

For example, for dbo.Employee table, script would provide the output shown in Figure 8.

08. Potentially Redundant Indexes

You can use such information for further analysis performing further index consolidation. In some cases, consolidation is trivial. For example, if a system has two indexes: IDX1(LastName, FirstName) include (Phone) and IDX2(LastName) include(DateOfBirth), you can consolidate them as IDX3(LastName, FirstName) include(DateOfBirth, Phone).

In the other cases, consolidation requires further analysis. For example, if a system has two indexes: IDX1(OrderDate, WarehouseId) and IDX2(OrderDate, OrderStatus), you have three options. You can consolidate it as IDX3(OrderDate, WarehouseId) include(OrderStatus) or as IDX4(OrderDate, OrderStatus) include(WarehouseId). Finally, you can leave both indexes in place. The decision primarily depends on the selectivity of the leftmost column and index usage statistics.

5. Implementing Filtered Indexes

Filtered indexes, introduced in SQL Server 2008, allowed you to index only a subset of the data and, therefore, reduce the index size. Consider a table with some data that needs to be processed as an example. This table can have a Processed bit column, which indicates the processing status as it is shown below.

create table dbo.Data
(
    RecId int not null,
    Processed bit not null,
    /* Other Columns */
)

Let’s assume that you have a backend process that loads unprocessed data based on the following query.

select top 1000 RecId, /* Other Columns */
from dbo.Data
where Processed = 0
order by RecId;

This query can benefit from the following index: CREATE NONCLUSTERED INDEX IDX_Data_Processed_RecId ON dbo.Data(Processed, RecId). Even though SQL Server rarely uses indexes on bit columns due to their low selectivity, such a scenario might be an exception if there are just a handful of unprocessed rows. SQL Server can use that index to select them; however, the index will never be used for selection of processed rows if a Key Lookup is required.

As a result, all index rows with a key value of Processed=1 would be useless. They will increase the index’s size, waste storage space, and introduce additional overhead during index maintenance.

Filtered indexes solve that problem by allowing you to index just unprocessed rows, making the index small and efficient as it is shown below.

create nonclustered index IDX_Data_Unprocessed_Filtered
on dbo.Data(RecId)
include(Processed)
where Processed = 0;

Obviously, there are several catches to remember. First, Query Optimizer has a design limitation, which can lead to suboptimal execution plans when columns from the filter are not present in leaf-level index rows. Always add all columns from the filter to the index, either as key or included columns. In our example, Processed column is added as an included column.

Second problem is filtered index statistics. SQL Server does not count the changes in the filtered columns towards statistics update threshold, which can lead to very inaccurate statistics. You should factor this behavior to statistics maintenance and, perhaps, update statistics manually on the regular basis.

Finally, SQL Server is very conservative when to use filtered indexes in case of plan caching. SQL Server would not generate and cache the plan with filtered index if there is the possibility that this plan would be invalid for some parameter values. For example, our filtered index would not be used for the case below:

select top 1000 RecId, /* Other Columns */
from dbo.Data
where Processed = @Processed
order by RecId;

As you can guess, auto-parameterization would make the matter worse. The bottom line, you should carefully test your system after implementing filtered indexes making sure that there is no plan regressions.

Finally, if you are using XML indexes in SQL Server 2012 and above, you can reduce their size (which, by the way, could be gigantic) by implementing Selective XML Indexes , which index just subset of the data. Pretty much the same approach with as filtered indexes.

6. Using Appropriate Data Types

All approaches we have already discussed could be implemented transparently to the client applications. Obviously, transparency here is very misleading – index tuning and partitioning require careful regression testing to be performed. However, all those changes are located in the database tier and you do not need to change any application code.

Now, it is the time to talk about several approaches that require such code changes. I would like to start with the general design principle that is often ignored during database design stage. The principle is simple – you should choose appropriate data types for the job.

Let’s consider the system that collects GPS location information from the multiple devices as the example. The main transaction entity in such system is Positions. One of the approaches to define such a table is the following (just a few columns from there):

create table dbo.Positions
(
    ATime datetime not null, -- 8 bytes 
    Latitude float not null, -- 8 bytes 
    Longitude float not null, -- 8 bytes 
    IsPrecise int not null,	-- 4 bytes
    IsAssistanceUsed int not null, -- 4 bytes
    -- Total: 32 bytes
    ...
)

Alternatively, you can define the same table a little bit differently:

(
    ATime datetime2(0) not null, -- 6 bytes 
    Latitude decimal(9,6) not null, -- 5 bytes 
    Longitude decimal(9,6) not null, -- 5 bytes 
    IsPrecise bit not null,	-- 1 bytes
    IsAssistanceUsed bit not null, -- 0 bytes
    -- Total: 17 bytes
    ...
)

As you see, even in the scope of those 6 columns you can save 15 bytes per row. It does not sound as significant saving; however, it quickly adds up as amount of data growth. For example, if such system collects 1M rows per day and stores it for a year, 15 bytes per row would become ~5.4GB of data on the leaf level of the index without counting any fragmentation overhead. And trust me, 1M positions per day is very small number for such systems.

While row compression can help to address some overhead, it would not help much when data types store excessive information. For example, row compression would cut an extra space from the boolean data stored in int columns; however, it would not help much with datetime in case if it has more precision that needed. Moreover, compression is the Enterprise Edition feature, which would not help you with the other editions.

One of the questions you should answer during database design stage is how precise the information should be. This could help you to choose correct data type for the column. As the example, consider the OrderDate column in Order Entry/Shopping cart system. Do you really need to store the time when order was placed with up to 3-millisecond precision provided by datetime column (8 bytes)? If this was not the case, you could use 1-second precision of datetime2(0) type (6 bytes). Or, for 1 minute precision, you can end up with 4-byte smalldatetime data.

You should also remember that smaller data rows help with the performance during the scans. The table with smaller rows would have more rows stored on the data page and, therefore, would have less data pages stored in the index. Queries that need to perform scans (including range scans) will be faster due to the less I/O operations involved. Last but not least, such indexes will use less memory in the buffer pool, which allows to cache more data and reduces the number of physical I/O in the system.

Finally, you should remember that table alteration never ever decreases size of the data row. You should rebuild the indexes that reference altered columns in order to see the space saving.

7. Storing LOB data outside of the database

So far, we have discussed how to reduce the size of IN_ROW data. Let’s talk a bit about LOB data. First, let’s discuss the situation with the binary data, which does not require any in-database processing. For example, images, binary documents and other similar entities. With such entities, you always have the question of how to store them. Either within the database or externally, keeping just a reference (perhaps file name) in the database.

There was the rule of thumb introduced by Microsoft at time of FILESTREAM release. The binary data greater than 1MB would benefit from external storage. The data smaller than 200KB should live within the database. Well, everything in between is in the grey area. While I do not want to charge the numbers, there are usually more factors involved that just a size. Obviously, I am not talking about huge binary objects when the choice is obvious, but in general, you should make the decision on case-by-case basis.

In-database storage of binary data is usually the simplest solution to implement. The obvious downside of this approach is the increase of the database size. However, you can mitigate it up to degree with proper architecture. For example, you can put binary data to the separate filegroup(s) that reside on the slower disk arrays. You can also implement partial backup and exclude static binary data from the dayly backup files. In Enterprise Edition of SQL Server, you can utilize piecemeal restore and achieve strict RTO requirements even with the binary data in the database; however, in non-Enterprise Edition, RTO requirements could become the deal breaker. Binary data could significantly slow down restore time (due to the database and backup size), which can prevent you from meeting RTO requirements.

In case, if you decided to store binary data outside of the database, there are several questions to answer. The first, and perhaps the most important one, is how to handle redundancy and high availability of external data. For example, if you decided to store binary data as the files and reference them in the database, you need to make sure that such schema is compatible with SQL Server High Availability solution and file storage itself is redundant. Redundancy question mainly relies on storage administrators; however, High Availability aspect could be tricky in this scenario. Especially, if you have geo-redundancy and/or hybrid solutions in place. Obviously, you can implement something based on SAN replication moving files across multiple data centers; however, it requires significant investments into the hardware and software as well as incur the implementation cost.

Consistency of the data is another important question. If binary data needs to be transactionally consistent, you have a little choice but using FILESTREAM. While it is technically possible to implement the consistency in the code without FILESTREAM – for example, if transaction modifies the data, application generates another file and replace the reference to this file in the database; it would be extremely hard to support disaster recovery in this scenario. As you can guess, you can easily have “out of sync” situation when restoring data from the backup.

FILESTREAM could help you here; however, it has a few caveats. It is incompatible with some of SQL Server features, for example with database mirroring and, in some cases, it is complicated to implement. Performance-wise, you should use Streaming API on the client side to get the most from it.

As I already said, there is no right or wrong solutions. You should consider pros and cons of all approaches and consider other requirements in the system. With Enterprise Edition, I personally prefer to store relatively small (up to a few MB) data in the database carefully architecting filegroup layout and backup/restore strategy in case if I am using Enterprise Edition of SQL Server. With Standard Edition, the choices are much more limited.

8. Compressing LOB data in the database

As you already know, data compression is Enterprise Edition feature that compresses IN_ROW data only. However, it is entirely possible, that large amount of space in the database is consumed by LOB data. Do not forget, that there are plenty of data types that are, in the nutshell, LOBs. Think about XML as the example – it is not uncommon to see that XML-centric systems with XML data that consume large amount of space in the system.

One of the approaches to address such an overhead is manually compress LOB data in the code. It is very easy to create the methods to compress and decompress data utilizing one of the classes from System.IO.Compression namespace, for example using GZipStream class. Moreover, that method could be implemented in CLR stored procedures and used directly in T-SQL code.

I am not going to provide the examples of how to code that; you should be able to find quite a few of them searching in Internet. I would like, however, to discuss a couple implementation-related questions.

First, compression is CPU intensive. It is better to run such code on the client whenever it is possible. I would still, however, suggest implementing CLR routines in the database and have them available to T-SQL code. This could help to address some of the use-cases, when client needs to work with uncompressed data. Consider, for example, some external analytical or reporting tools that query the database directly. You can create the view that call CLR function and decompress the data on the fly providing it to the clients.

You should be careful with the version management in such scenario making sure that the code is the same on both, client side and in the database and that algorithms remain the same and data can be decompressed on either side.

The second important consideration is performance. Obviously, decompression adds an overhead, which you would like to avoid on the large scope. For example, it is the bad idea to have a query that scans large amount of data and performs decompression on every row to evaluate the predicate against one of compressed attributes. For example, query shown below would be highly inefficient.

select 
from T
where convert(xml,dbo.DecompressData(CompressedXML)).value('..') = 1

One of the ways to address such an issue is creating persisted calculated columns for the attributes that are used in where clauses of the queries. The downside of this approach is that SQL Server would not be able to use parallel execution plans in such queries – this is one of the limitations of Query Optimizer when you are using columns calculated with scalar UDFs. However, it is often the small price to pay comparing to constant decompression overhead.

With all being said, compressing LOB data manually is definitely the option, which is worth considering. You can use sys.dm_index_physical_stats view to evaluate amount of such data on per-index basis. Obviously, Pareto 80/20 principle still applies – do not add extra complexity if benefits are not worth it.

9. Storing Data in Clustered Columnstore Indexes

In case of Enterprise Edition of SQL Server 2014, you have another option to consider. You can store some of the data in columnstore format utilizing Clustered Columnstore Indexes. This format can provide significant space saving comparing to the regular B-Tree row-based storage. Moreover, you can also utilize Archival Columnstore Compression that applies gzip-like compression on columnstore data and reduces the size even further at cost of extra CPU load.

Just to give you an example, Figure 9 below shows you the difference in the storage space between row-based and column-based storage. The table in this example was generated based on FactSalesBig table from AdventureWorks2012DW database. Obviously, different data leads to the different results; however, in the most cases, clustered columnstore indexes would give you order of magnitude decrease on the storage space. It is also worth mentioning that nonclustered indexes on B-Tree tables would contribute to additional storage space, which is not the case with clustered columnstore indexes that are the single copy of the data in the table.

09. Storage Size based on different compression methods and storage formats

Obviously, clustered columnstore indexes are not for everyone. They are very beneficial for Data Warehouse workload that requires to scan and aggregate large amount of data. The same time they are the very bad choice for OLTP workload – they do not support point-lookups nor small range scans.

Same time, it is not uncommon to have different use-cases for the old and new data in OLTP systems. Customers can generate OLTP workload to support day-to-day operations with the new data; however, the old data can be used for analysis and reporting, which is mainly Data Warehouse workload. In such scenario, you can consider to partition your data into the multiple tables using columnstore format for the tables with the old data. You can abstract all those changes via partitioned views making the differences in the schema and storage format transparent to the clients.

This is by no means not the simplest thing to implement. However, such design could lead to significant performance improvements and storage space saving for the certain kind of workloads.

10. Reducing amount of free space in the database

Finally, let’s discuss what we can do when data files have large amount of free space.

As strange as it sounds, one of the best possible options in that case is leaving everything as is. Consider the situation when system implements sliding-window pattern keeping 1-year worth of data in the system. Typically, such systems purge the data based on some schedule. For example, it is possible that every 1st day of the month system purges the 13th month of data – the one with the data older than 1 year.

Let’s assume that system collects 500GB of data per month. In this scenario, if you measured amount of free space in the data files right after the purge, you would notice that files have more than 500GB of free space available. Obviously, you can shrink the files and release such space to OS; however, the database would reclaim it as data growth.

However, for the purpose of this discussion, let’s assume that we have legitimate case to decrease the size of the files. Unfortunately, it is not very easy to do. DBCC SHRINKFILE command is the terrible way to reduce the size of the database. That command works in the very simple matter – it starts to move the allocated extents from the end of the file to unallocated space in the beginning of the file. As you can guess, this leads to the terrible index fragmentation. Moreover, it generates excessive amount of log records, which can affect the system in the multiple ways.

Obviously, you can perform index maintenance after you are done with the shrink. However, there is the catch – index rebuild will grow the file again (it needs space to accommodate the new version of the index). Index reorg could be the better choice in this scenario even though it does not provide results in par with the index rebuild. In the end, everything depends on amount of the space you are clearing and the size of your data. For example, if you had terabytes of free space and your biggest index is just a couple hundred gigabytes, you could consider to shrink and run index rebuild afterwards. Files would grow; however, such growth is much smaller comparing to space saving after the shrink.

Another, and often the better way to accomplish the task is moving all the data to another filegroup dropping original filegroup afterwards. The actual implementation would vary based on the version and edition of SQL Server. In Enterprise Edition, you can perform online index rebuild to the different filegroup, which will keep system available during the process. In Standard Edition, you can rebuild indexes only offline.

There is another catch though – index rebuild does not move LOB data between filegroups by default. The only way to workaround it is by rebuilding index to the new partition schema instead of the filegroup. However, it requires Enterprise Edition, which supports partitioning. Unfortunately, in Standard Edition you are out of luck.

I am not providing the examples here; however, I would like to reference my book again where I have discussed it in details. Alternatively, you can download book demo scripts and see how data movement works in action.

Lastly, there is always the option of creating another table on another filegroup, copying data there and dropping original table and renaming the new table afterwards. This approach would work in either edition; however, in the most part of the cases it needs be done offline. Online implementation is, of course, possible but it is usually complicated if table has volatile data.

Wrapping Up

This blog post ended up being much bigger than I expected. Unfortunately, even with such size it was impossible to cover all the details for some of the methods. Anyway, I hope you found this information useful, at least as the high-level overview.

Please, do not take the order in which I outlined approaches as the guideline. Every system is unique and you need to design the solution targeted to particular system taking hardware, software and business requirements into consideration.

Reducing Offline Index Rebuild and Table Locking Time in SQL Server

Enterprise Edition of SQL Server is all about availability features. It allows you to build robust High Availability and Database Maintenance strategies, which are the must for the mission critical systems. Unfortunately, all those benefits come at cost. At very high cost. Enterprise Edition of SQL Server is anything but cheap. Enterprise Edition licenses usually several times more expensive comparing to the Standard Edition licenses. It applies to both, on-premises and Cloud installations. For example, in Microsoft Azure, VMs with Enterprise Edition are about 4 times more expensive comparing to Standard Edition VMs.

Not every business can afford Enterprise Edition. And, to be honest, not every business even need Enterprise edition. It is entirely possible to build relatively big and complex solutions with Standard Edition of SQL Server especially with SQL Server 2014, which can utilize 128GB of RAM. I have personally worked with several multi-TB databases using Standard Edition of SQL Server.

One of the common challenges with non-Enterprise Editions, however, is Index Maintenance. Enterprise Edition allows you to rebuild indexes keeping table online. It uses row-versioning similar to optimistic transaction levels under the hood. With exception of short-time shared (S) at the beginning and schema-modification (Sch-M) locks at the end of the execution, there are no full table locks held. Other users can access the table, and read, and modify data from there.

Unfortunately, this is not the case with the Standard Edition where only offline index rebuild is supported. Offline index rebuild acquires schema modification (Sch-M) lock for the duration of the execution, which blocks entire access to the table even in read uncommitted transaction isolation level. Based on the index size, rebuild can be very time consuming operation. Obviously, you can use index reorganize instead of rebuild., which is online operation. However, it does not work as good as index rebuild on heavily fragmented tables.

Today, I am going to show you very simple technique that can help to decrease offline index rebuild time, and, therefore, time when table is inaccessible for the other sessions. It is not the magic silver bullet that helps everywhere. In fact, in some of the cases, it could even hurt rather than help. We will discuss that later in the post.

I am going to use a table from one of my production systems. That table has a couple dozen columns including one nvarchar(max) column, which is hosting some LOB data. I have rebuilt the index before the tests making sure that initial fragmentation does not affect execution time on the first test run.

As you see in Figure 1 below, leaf level of the clustered index for IN-ROW data consist of 20,374 pages, which is about 159MB of data. LOB data consists of 76,708 pages, which is about 600MB of data.

01. Index Physical Stats

As the first step, let’s run ALTER INDEX REBUILD statement and measure execution time for the operation. I am clearing buffer pool emulating a cold cache before the test (DO NOT RUN IT IN PRODUCTION!). This is, obviously, the worst case scenario – in the real life, at least some of the data pages would be cached at time of the index rebuild operation.

I am also running index rebuild without parallelism involved just to make sure that results are consistent and simplify the analysis. As the side note, my test server has pretty slow I/O subsystem, and execution time without MAXDOP=1 hint is just about the same as with the hint.

-- Clearing buffer pool. DO NOT RUN IN PRODUCTION
checkpoint
go

dbcc dropcleanbuffers
go

-- Rebuilding Index
alter index IDX_CI on dbo.MyTable rebuild with (maxdop = 1)

As you can see in Figure 2, index rebuild time in my system is 1 minute and 24 seconds. As you know, other sessions would be unable to access the table during this time due to schema modification (Sch-M) lock held on the table.

02. Execution Time of Index Rebuild with the Cold Cache

Let’s repeat our test (again, with cold cache) and analyze session waits during the index rebuild. We can do it by creating Extended Event session with the code shown below. This code will work in SQL Server 2012 and 2014 and will require slight modifications in SQL Server 2008/2008 R2.

create event session [Waits] on server
add event sqlos.wait_info
(
	action (sqlserver.session_id)
	where
	(
		sqlserver.session_id = 141 -- @@SPID of the session with index rebuild)
		and opcode = 1 -- END of wait
	)
)
add target package0.event_file(SET filename=N'C:\ExtEvents\Waits.xel',max_rollover_files=(0))
with
(
	max_memory=4096 KB
	,event_retention_mode=allow_single_event_loss
	,max_dispatch_latency=10 SECONDS
)
go

alter event session [Waits] on server state = start

After index rebuild command, let’s stop the session and look at the waits that occur during index rebuild.

alter event session [Waits] on server
state = stop
go

;with EventData(event_data)
as
(
	select cast(event_data AS XML) AS event_data
	from sys.fn_xe_file_target_read_file
		('C:\ExtEvents\Waits*.xel',null,null,null)
)
,Waits
as
(
	select
        event_data.value ('(/event/@timestamp)[1]'
			, 'datetime') as [Time],
        event_data.value ('(/event/data[@name=''wait_type'']/text)[1]'
			,'varchar(255)') AS [Wait],
        event_data.value ('(/event/data[@name=''duration'']/value)[1]'
			,'bigint') AS [Duration],
        event_data.value ('(/event/data[@name=''signal_duration'']/value)[1]'
			,'bigint') AS [Signal]
	from EventData
)
select 
	w.[Wait]
	,count(*) as [Count]
	,sum(w.Duration) as [Total Duration (ms)]
	,sum(w.Duration) - sum(w.Signal) as [Resource Waits (ms)]
	,sum(w.Signal) as [Signal Waits (ms)]
from Waits w
group by w.[Wait]

As you can see in the Figure 3 below, the vast majority of the waits are PAGELATCHIO*, which occur when SQL Server is waiting for the data page to be read from the disk.

03. Waits during Index Rebuild with the Cold Cache

That discovery can point us to the right direction. If large portion of index rebuild was waiting for the data to be read from disk, we could speed up the process by pre-fetching and caching data in the memory. SQL Server would not need to perform physical read operations, which should reduce rebuild time.

Let’s try that and run the following select below (again, making sure that we are using cold cache for test sake):

checkpoint
go

dbcc dropcleanbuffers
go


declare
	@T1 bigint, @T2 bigint

select 
	@T1 = avg(RecId) -- Making sure that IN_ROW pages are read
	,@T2 = avg(len(LOB_Column)) -- Making sure that LOB pages are read
from dbo.MyTable with (nolock, index=IDX_CI) 
option (maxdop 1)

I am forcing the scan of the clustered index – the one, which we are going to rebuild. I am using read uncommitted transaction isolation level with NOLOCK hint. As you can guess, in this situation I do not worry about any data consistency issues that can arise.

However, there is very important part – we need to make sure that SQL Server reads LOB data pages in addition to IN_ROW data pages. We can force it by accessing our LOB column in the select list. You can use the same technique dealing with ROW_OVERFLOW pages in case, if index has such allocation units.

Finally, I am also referencing one of columns from IN_ROW data to make sure that those pages are also read into the memory. In some cases, with the large tables and read uncommitted transaction isolation level, SQL Server can choose to use allocation unit scans instead of index scan. I seriously doubt, that SQL Server can perform such operation and scan only LOB data pages in our select, especially with the index hint in use, but it is better to be safe than sorry.

The execution time of the select in my system is 1 minute and 3 seconds as it is shown in Figure 4 below.

04. Execution Time of Select Statement

Now, let’s run index rebuild again. As you see, with index pages cached in memory, index rebuild time is just 10 seconds as it is shown in Figure 5. Even though, total time of both operations (select and index rebuild) are comparable with original index rebuild over cold cache, the time when table was inaccessible due to schema modification (Sch-M) lock is about 8 times less than before.

05. Execution Time of Index Rebuild when Data is Cached

If you looked at the waits during the last index rebuild, you would see that PAGELATCHIO* waits practically disappeared as it is shown in Figure 6.

06. Waits During Index Rebuild with the Warm Cache

While this technique can significantly reduce blocking during offline index rebuild, it is not for everyone. First, and foremost, you should be careful in case of the large indexes especially when system has limited amount of memory and/or low page life expectancy. Caching large indexes will require significant amount of memory and, therefore, it could lead to the situations when you flush the buffer pool possibly flushing the index pages you just read. This is especially important if you expect data growth in the future. Small indexes today can become the large ones tomorrow.

Second, it is beneficial to analyze how many pages in the index are typically cached in memory. Obviously, if most part of the pages have been already cached, you would not get much gain from running the select and pre-fetching remaining pages to the memory. On the other hand, logical reads are fast anyway so select over warm cache would not take much time.

You can use the script below to analyze number of pages from the index that have been cached. You can compare those numbers with results of sys.dm_db_index_physical_stats to estimate the number of cached pages per allocation unit from the index.

select i.name, au.type_desc, count(*) as [Cached Pages]
from 
	sys.dm_os_buffer_descriptors bd with (nolock) 
        join sys.allocation_units au with (nolock) on 
		    bd.allocation_unit_id = au.allocation_unit_id
	join sys.partitions p with (nolock) on
		(au.type in (1,3) and au.container_id = p.hobt_id) or
		(au.type = 2 and au.container_id = p.partition_id)
	join sys.indexes i with (nolock) on
		p.object_id = i.object_id and 
		p.index_id = i.index_id
where
	bd.database_id = db_id() and
	p.object_id = object_id (N'dbo.MyTable') and 
	p.index_id = 1 -- ID of the index
group by
	i.name, au.type_desc

Finally, that technique is useful when the biggest bottleneck in the system is I/O performance as it happened in my case. This often happens in the Cloud-based systems (SQL Server in VMs) where I/O latency is usually high and Standard Edition of SQL Server is commonly used due to its cost. In that scenario, pre-fetching data to the cache could significantly reduce the locking time for offline index rebuild when you have enough memory for the index data.

Plan Cache: Plan Reuse

Last time we have discussed how parameter sniffing can affect the quality of generated execution plans. Today, I would like to talk about another aspect of plan caching, which is plan reuse. Plans, cached by SQL Server, must be valid for any combination of parameters during future calls that reuse the plan. In some cases, this can lead to situations where a cached plan is suboptimal for a specific set of parameter values.

One of the common code patterns that leads to such situations is the implementation of stored procedure that search for the data based on a set of the optional parameters. Let’s look at the typical implementation of such stored procedure as shown in the code below. That code uses dbo.Employees table from the “parameter sniffing” post and it also creates two nonclustered indexes on that table.

create proc dbo.SearchEmployee
(
	@Number varchar(32) = null
	,@Name varchar(100) = null
)
as
begin
	select Id, Number, Name, Salary, Country
	from dbo.Employees
	where 
		((@Number is null) or (Number=@Number)) and
		((@Name is null) or (Name=@Name))
end
go

create unique nonclustered index IDX_Employees_Number
on dbo.Employees(Number);

create nonclustered index IDX_Employees_Name
on dbo.Employees(Name);

There are several different approaches how SQL Server can execute the query from the stored procedure based on what parameters were provided. In the large number of cases, the most efficient approach would be using Nonclustered Index Seek and Key Lookup operators. Let’s run the stored procedure several times with different parameter combinations and check the execution plans:

exec dbo.SearchEmployee @Number = '10000';
exec dbo.SearchEmployee @Name = 'Canada Employee: 1';
exec dbo.SearchEmployee @Number = '10000', @Name = 'Canada Employee: 1';
exec dbo.SearchEmployee @Number = NULL, @Name = NULL;

01.Execution Plans When Plans Are Cached

SQL Server compiles stored procedure at time of the first call when only @Number parameter was provided. Even though, the most efficient execution plan for this case is IDX_Employees_Number Index Seek operation, SQL Server cannot cache such execution plan because it would not be valid for the case, when @Number parameter is NULL. Therefore, SQL Server generated and cached execution plan that utilizes Index Scan operation, which is highly inefficient in case, when @Number parameter is not provided. In that case, SQL Server performs Key Lookup operation for every row in the nonclustered index optionally evaluating predicate on @Name parameter afterwards.

Similar to parameter sniffing issues, you can address this problem with statement-level recompilation as it is shown below. SQL Server recompiles the query on every call, and therefore it can choose the most beneficial execution plan for every parameter set.

It is also worth mentioning that the plans are not cached in cases where statement-level recompile is used.

alter proc dbo.SearchEmployee
(
	@Number varchar(32) = null
	,@Name varchar(100) = null
)
as
begin
	select Id, Number, Name, Salary, Country
	from dbo.Employees
	where 
		((@Number is null) or (Number=@Number)) and
		((@Name is null) or (Name=@Name))
	option (recompile)
end

02.Execution Plans With Statement-Level Recompile

As we already discussed, query recompilation adds CPU overhead. That overhead can be acceptable in case of large and complex queries when compilation time is just the fraction of the query execution time and system is not CPU-bound. In that case, recompilation can even help producing the better execution plans, which would be generated based on the current parameter values, especially if table variables are involved. However, this recompilation overhead is usually not acceptable in case of OLTP queries that are called very often.

One of the options you can use to address the issue is writing multiple queries using IF statements covering all possible combinations of parameters. SQL Server would cache the plan for each statement in the procedure. Listing below shows such an approach, however it quickly becomes unmanageable with a large number of parameters. The number of combinations to cover is equal to the number of parameters squared.

alter proc dbo.SearchEmployee
(
	@Number varchar(32) = null
	,@Name varchar(100) = null
)
as
begin
	if @Number is null and @Name is null
		select Id, Number, Name, Salary, Country
		from dbo.Employees
	else if @Number is not null and @Name is null
		select Id, Number, Name, Salary, Country
		from dbo.Employees
		where Number=@Number
	else if @Number is null and @Name is not null
		select Id, Number, Name, Salary, Country
		from dbo.Employees
		where Name=@Name
	else 
		select Id, Number, Name, Salary, Country
		from dbo.Employees
		where Number=@Number and Name=@Name
end

In the case of a large number of parameters, dynamic SQL becomes the only option. SQL Server will cache the execution plans for each dynamically generated SQL statement. Remember that using dynamic SQL breaks ownership chaining, and it always executes in the security context of CALLER. You should also always use parameters with dynamic SQL to avoid SQL Injection.

alter proc dbo.SearchEmployee
(
	@Number varchar(32) = null
	,@Name varchar(100) = null
)
as
begin
	declare
		@SQL nvarchar(max) = N'	
select Id, Number, Name, Salary, Country
from dbo.Employees
where 1=1'
	
	if @Number is not null
		select @Sql = @SQL + N' and Number=@Number'
	if @Name is not null
		select @Sql = @SQL + N' and Name=@Name'
	exec sp_executesql @Sql, N'@Number varchar(32), @Name varchar(100)'
		,@Number=@Number, @Name=@Name
end

While most of us are aware about danger of using optional parameters and OR predicates in the queries, there is another less known issue associated with filtered indexes. SQL Server will not generate and cache a plan that uses a filtered index, in cases when that index cannot be used with some combination of parameter values.

Listing below shows an example. SQL Server will not generate the plan, which is using the IDX_RawData_UnprocessedData index, even when the @Processed parameter is set to zero because this plan would not be valid for a non-zero @Processed parameter value.

create unique nonclustered index IDX_RawData_UnprocessedData
on dbo.RawData(RecID)
include(Processed)
where Processed = 0;

-- Compiled Plan for the query would not use filtered index
select top 100 RecId, /* Other columns */
from dbo.RawData
where RecID > @RecID and Processed = @Processed
order by RecID;

In that particular case, rewriting the query using IF statement would be, perhaps, the best option.

if @Processed = 0
	select top 1000 RecId, /* Other Columns */
	from dbo.RawData
	where RecId > @RecId and Processed = 0
	order by RecId;
else 
	select top 1000 RecId, /* Other Columns */
	from dbo.Data
	where RecId > @RecId and Processed = 1
	order by RecId;

Unfortunately, IF statement does not always help. In some cases, SQL Server can auto-parametrize the queries and replace some of the constants with auto-generated parameters. That behavior allows SQL Server to reduce the size of the plan cache; however, it could lead to all plan reuse issues we have already discussed.

By default, SQL Server uses SIMPLE parametrization and parametrize only the simple queries. However, if database or query are using FORCED parametrization, that behavior can become the issue. Let’s look at a particular example and create a database with a table with a filtered index and populate it with some data, as shown below.

use master
go

create database ParameterizationTest
go

use ParameterizationTest
go

create table dbo.RawData
(
	RecId int not null identity(1,1), 
	Processed bit not null, 
	Placeholder char(100),
	constraint PK_RawData
	primary key clustered(RecId)
);

/* Inserting:
	Processed = 1: 65,536 rows
	Processed = 0: 16 rows */
;WITH N1(C) AS (SELECT 0 UNION ALL SELECT 0) -- 2 rows
,N2(C) AS (SELECT 0 FROM N1 AS T1 CROSS JOIN N1 AS T2) -- 4 rows
,N3(C) AS (SELECT 0 FROM N2 AS T1 CROSS JOIN N2 AS T2) -- 16 rows
,N4(C) AS (SELECT 0 FROM N3 AS T1 CROSS JOIN N3 AS T2) -- 256 rows
,N5(C) AS (SELECT 0 FROM N4 AS T1 CROSS JOIN N4 AS T2 ) -- 65,536 rows
,IDs(ID) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM N5)
insert into dbo.RawData(Processed)
	select 1
	from Ids;

insert into dbo.RawData(Processed)
	select 0
	from dbo.RawData
	where RecId <= 16;

create unique nonclustered index IDX_RawData_Processed_Filtered 
on dbo.RawData(RecId)
include(Processed)
where Processed = 0;

For the next step, let’s run the queries that count a number of unprocessed rows in both SIMPLE and FORCED parametrization modes.

select count(*)
from dbo.RawData
where Processed = 0			
go

alter database ParameterizationTest set parameterization forced
go

select count(*)
from dbo.RawData
where Processed = 0

If you examine the execution plans shown in Figure 3, you will notice that SQL Server utilized a filtered index in the case of a SIMPLE parametrization. SQL Server can cache this plan because of the constant in the Processed=0 predicate. Alternatively, with FORCED parametrization, SQL Server parametrizes the query using the parameter in the Processed=@0 predicate. Therefore, it cannot cache the plan with the filtered index because it would not be valid for the case when a query selects processed (Processed=1) rows. SQL Server generated the execution plan with a Clustered Index Scan, which is far less efficient in this case.

03.Execution Plans and Parametrization

The workaround in this case is using SIMPLE parametrization for the database, or forcing it on query level with plan guide (more on it later). In some cases, you would also need to rewrite the query and use one of the constructs that prevent parametrization in SIMPLE parametrization mode, such as IN, TOP, DISTINCT, JOIN, UNION, subqueries and quite a few others.

Plan caching and plan reuse are the great features that help to reduce CPU load on the server. However, they introduce several side effects you need to be aware of and keep them in mind when you write queries and stored procedures.

Source code is available for download

Plan Cache: Parameter Sniffing

There are several reasons why it is beneficial to use parameters in the queries. It mitigates security risks by protecting from SQL Injection. Moreover, it helps with the performance. Query optimization is expensive and resource-intensive process. SQL Server reduces that cost by keeping execution plans in the special part of the memory called Plan Cache and reusing them on subsequent calls.

Even though, this approach benefits the system by reducing the number of query compilations, it could also introduce some issues. Cached plans can become inefficient due to the data distribution and/or parameter values change. SQL Server tracks changes in the data by counting number of changes of the index columns outdating statistics and recompiling queries when number of changes reach about 20% of number of rows in the table (trace flag 2371 can reduce that threshold). However, tracking parameter values change is more complicated.

When SQL Server compiles parametrized query or stored procedure, it analyzes parameters and generates execution plan, which is optimal for those specific values. This process is called Parameter Sniffing. Furthermore, generated plans are cached and reused regardless of parameter values at time of reuse. That behavior can lead to the situations when query is compiled and plan is generated based on atypical parameter set. Cached plan, in this case, would be suboptimal for the calls that reuse that plan.

All of us experienced the situations when query or stored procedure suddenly became very slow even when they ran fine before. Parameter sniffing could be the reason of why it happened. SQL Server decided to recompile the query due to statistics update and the query that triggered recompilation had atypical parameter values. Even though, generated execution plan was efficient for parameters at time of compilation; that cached plan is not necessarily efficient for the other queries that reuses it.

Historically, parameter sniffing has the negative connotation in SQL Server community. It is used to explain why queries have suboptimal execution plans, like I just did in the previous paragraph. This is vastly wrong, however. Parameter sniffing is just the name for SQL Server behavior of analyzing parameters during query compilation.

Let’s look at a few examples of parameter sniffing and discuss how to address the issues it can introduce. In this post, I am using the stored procedure; however, everything would work the same in case of parametrized queries.

As the first step, let’s create a table dbo.Employees. We will populate it distributing data in the way that most rows have Country column value of USA with a few employees with Country value of Canada. Listing below shows the code.

create table dbo.Employees
(
	ID int not null,
	Number varchar(32) not null,
	Name varchar(100) not null,
	Salary money not null,
	Country varchar(64) not null,

	constraint PK_Employees
	primary key clustered(ID)
);

;with N1(C) as (select 0 union all select 0) -- 2 rows
,N2(C) as (select 0 from N1 as T1 cross join N1 as T2) -- 4 rows
,N3(C) as (select 0 from N2 as T1 cross join N2 as T2) -- 16 rows
,N4(C) as (select 0 from N3 as T1 cross join N3 as T2) -- 256 rows
,N5(C) as (select 0 from N4 as T1 cross join N4 as T2 ) -- 65,536 rows
,Nums(Num) as (select row_number() over (order by (select null)) from N5)
insert into dbo.Employees(ID, Number, Name, Salary, Country)
	select 
		Num, 
		convert(varchar(5),Num), 
		'USA Employee: ' + convert(varchar(5),Num), 
		40000,
		'USA'
	from Nums;

;with N1(C) as (select 0 union all select 0) -- 2 rows
,N2(C) as (select 0 from N1 as T1 cross join N1 as T2) -- 4 rows
,N3(C) as (select 0 from N2 as T1 cross join N2 as T2) -- 16 rows
,Nums(Num) as (select row_number() over (order by (select null)) from N3)
insert into dbo.Employees(ID, Number, Name, Salary, Country)
	select 
		65536 + Num, 
		convert(varchar(5),65536 + Num), 
		'Canada Employee: ' + convert(varchar(5),Num), 
		40000,
		'Canada'
	from Nums;

create nonclustered index IDX_Employees_Country
on dbo.Employees(Country);

As a next step, let’s create a stored procedure that calculates the average salary for employees in a specific country.

create proc dbo.GetAverageSalary @Country varchar(64)
as
begin
	select Avg(Salary) as [Avg Salary]
	from dbo.Employees
	where Country = @Country
end

As you can guess, when we call this stored procedure with parameter @Country=’USA’, stored procedure needs to touch almost every row in the table and the most optimal execution plan is Clustered Index Scan. However, when we call the stored procedure with parameter @Country=’Canada’, the most efficient execution plan is Nonclustered Index Seek with Key Lookup operators.

Let’s call the stored procedure twice: the first time with @Country=’USA’ and the second time with @Country=’Canada’, as shown below.

exec dbo.GetAverageSalary @Country='USA';
exec dbo.GetAverageSalary @Country='Canada';

As you can see in Figure 1, SQL Server compiles the stored procedure and caches the plan with the first call and reuses it later. Even though such a plan is less efficient with the @Country=’Canada’ parameter value, it may be acceptable when those calls are rare, which is expected with such a data distribution.

01. Execution plans when the stored procedure is compiled with @Country=’USA’

Now let’s take a look what happens if we swap those calls when the plan is not cached. We will use the DBCC FREEPROCCACHE command, which clears plan cache (DO NOT RUN IT ON PRODUCTION SERVER). As I already mentioned, another instance when this might happen is with a statistics update that forces query to recompile.

dbcc freeproccache
go

exec dbo.GetAverageSalary @Country='Canada';
exec dbo.GetAverageSalary @Country='USA';

As you can see in Figure 2, SQL Server now caches the plan based on the @Country=’Canada’ parameter value. Even though this plan is more efficient when the stored procedure is called with @Country=’Canada’, it is highly inefficient for @Country=’USA’ calls.

02. Execution plans when the stored procedure is compiled with @Country=’Canada’

There are a few ways to address the issue. You can force the recompilation of either stored procedure using EXECUTE WITH RECOMPILE or a statement-level recompile with OPTION (RECOMPILE) clauses. Obviously, a statement-level recompile is better because it performs the recompilation on a smaller scope. SQL Server sniffs the parameter values at the time of the recompilation, generating the optimal execution plan for each parameter value. It is also worth mentioning, that statement-level recompile allows SQL Server to obtain number of rows stored in table variables, which can help to generate more efficient execution plans in some cases.

alter proc dbo.GetAverageSalary @Country varchar(64)
as
begin
	select Avg(Salary) as [Avg Salary]
	from dbo.Employees
	where Country = @Country
	option (recompile)
end
go

exec dbo.GetAverageSalary @Country='Canada';
exec dbo.GetAverageSalary @Country='USA';

As you can see in Figure 3, SQL Server recompiles the statement on every call, and it generates the most efficient execution plan for every parameter value. It is also worth noting that SQL Server does not cache plans for the statements when statement-level recompile is used.

03. Execution plans with OPTION (RECOMPILE) hint

The statement-level recompile may be a good option when the queries do not execute very often or in the case of complex queries, when the compilation time is just a fraction of the total execution time. However, it is hardly the best approach for OLTP queries, which are constantly running in the system due to the extra CPU load that recompilation introduces.

Another option is using an OPTIMIZE FOR hint, which forces SQL Server to optimize a query for the specific parameter values provided in the hint. Listing below illustrates such an approach.

alter proc dbo.GetAverageSalary @Country varchar(64)
as
begin
	select Avg(Salary) as [Avg Salary]
	from dbo.Employees
	where Country = @Country
	option (optimize for(@Country='USA'))
end
go

exec dbo.GetAverageSalary @Country='Canada';
exec dbo.GetAverageSalary @Country='USA';

As you can see in Figure 4, SQL Server ignores the parameter value during the compilation and optimizes the query for the @Country=’USA’ value.

04. Execution plans with OPTIMIZE FOR hint

Unfortunately, the OPTIMIZE FOR hint introduces supportability issues, and it can lead to suboptimal execution plans in cases where the data distribution has changed. Let’s consider the situation, albeit unrealistic, when a company and all of its employees moved from the United States to Germany.

update dbo.Employees set Country='Germany' where Country='USA';

exec dbo.GetAverageSalary @Country='Germany';

Statistics are outdated at the time of the update, which forces SQL Server to recompile the statement in the stored procedure. At this point, there are no rows in the table with Country=’USA’, and the recompilation produces a suboptimal execution plan, as shown in Figure 5. As a side note, the query uses more reads than before due to the index fragmentation introduced by the update.

05. Inefficient execution plan after data distribution change

SQL Server 2008 introduced another optimization hint, OPTIMIZE FOR UNKNOWN, which helps to address such situations. With this hint, SQL Server performs an optimization based on the most statistically common value in the table. Listing below shows the code involved in doing this.

alter proc dbo.GetAverageSalary @Country varchar(64)
as
begin
	select Avg(Salary) as [Avg Salary]
	from dbo.Employees
	where Country = @Country
	option (optimize for(@Country UNKNOWN))
end
go

exec dbo.GetAverageSalary @Country='Canada';

Figure 6 illustrates the execution plan. Country=’Germany’ is the most statistically common value in the table, and therefore SQL Server generates an execution plan that is optimal for such a parameter value.

06. Execution plan with OPTIMIZE FOR UNKNOWN hint

You can achieve the same results with an OPTIMIZE FOR UNKNOWN hint by using local variables instead of parameters. This method also works with SQL Server 2005, where the OPTIMIZE FOR UNKNOWN hint is not supported. Listing below illustrates that approach. It introduces the same execution plan with a Clustered Index Scan, as shown in Figure 6 above.

alter proc dbo.GetAverageSalary @Country varchar(64)
as
begin
	declare
		@CountryTmp varchar(64)
	set @CountryTmp = @Country

	select Avg(Salary) as [Avg Salary]
	from dbo.Employees
	where Country = @CountryTmp
end

Finally, in case when you are dealing with 3rd party systems and cannot add optimization hints to the code, you could use plan guides to achieve the same goal. We will discuss plan guides in one of the future blog posts.

Source code is available for download.

Partial Database Backup and Piecemeal Restore in Microsoft SQL Server

Unfortunately, disasters happen. Regardless of how good is High Availability strategy that you have implemented, there is always the chance that one day you will need to move the database to another server and quickly bring system online. And the chance is – you will have to do it under stress with your phone ringing every few minutes and your manager asking about ETA. It does not help that such process can be very time consuming in the case of the large databases.

Enterprise Edition of SQL Server supports concept of piecemeal restore and allows you to restore database on filegroup-by-filegroup basis keeping database online during the process. Queries that access data from online filegroups would work just fine. This is the great technique that can dramatically reduce system downtime.

In the large number of cases, the size of the operational data that is required for system to be operational is relatively small. Historical data, on the other hand, is often kept due to regulation/compliance reasons and rarely accessed by the applications. As you can guess, it allows you to quickly bring part of the database that supports operational activity online and work with remaining historical data afterwards while customers can connect and use the system.

Obviously, you should design data placement in the way that supports piecemeal restore. In the nutshell, it means the separation of the operational and historical data across different filegroups. As the example, let’s consider Order Entry/Shopping Cart system that stores data for several years. One of the data layout designs could be the following:

  • Empty Primary FG. Primary Filegroup should be online in order for database to be online. It is good idea to keep primary filegroup empty and do not place any objects there.
  • Entities FG. This filegroup could store catalog tables, such as Customers, Articles and others.
  • One or more filegroups for the operational data. For example, if operational period is the current year, this filegroup can store Orders, OrderLineItems and related entities that stores current-year data.
  • One or more filegroups for the historical data. Those filegroups store data that is not required to support operational activity in the system.

Piecemeal restore strategy will require you to bring online Primary, Entities and Operation data filegroups first. System will be available to the customers at this point. After that, you can work on restoring historical data filegroups, which in most part of the cases, will be significantly larger than operational data and, therefore, will take longer time to restore.

It is also worth noting, that in case of SQL Server 2014 In-memory OLTP, you should also have Hekaton filegroup online before database becomes available to the users. Usually, In-memory tables keep operational data anyway, so it should not be a problem in most part of the cases.

Let’s look at the example and create a database with the structure outlined above. For simplicity sake, every filegroup has only one data file. However, in the real-life you should consider creating multiple files to reduce allocation contention in the filegroups with volatile data.

create database [MyBigOrderDb]
on primary
(name = N'MyBigOrderDb', filename = N'c:\db\MyBigOrderDb.mdf'), 
filegroup [Entities] 
(name = N'MyBigOrderDB_Entities', filename = N'c:\db\MyBigOrderDB_Entities.ndf'), 
filegroup [FG2013] 
(name = N'MyBigOrderDB_FG2013', filename = N'c:\db\MyBigOrderDB_FG2013.ndf'), 
filegroup [FG2014] 
(name = N'MyBigOrderDB_FG2014', filename = N'c:\db\MyBigOrderDB_FG2014.ndf')
log on
(name = N'MyBigOrderDb_log', filename = N'c:\db\MyBigOrderDb_log.ldf')

As the next step, let’s create a few tables including partitioned table Orders.

create table dbo.Customers
(
    CustomerId int not null,
    CustomerName nvarchar(64) not null,
)
on [Entities];

create table dbo.Articless
(
 ArticlesId int not null,
 ArticleName nvarchar(64) not null,
)
on [Entities];

create partition function pfOrders(smalldatetime)
as range right
for values('2014-01-01');

create partition scheme psOrders
as partition pfOrders
to (FG2013,FG2014)
go

create table dbo.Orders
(
    OrderId int not null,
    OrderDate smalldatetime not null,
    OrderNum varchar(32) not null,
    constraint PK_Orders
    primary key clustered(OrderDate, OrderId)
    on psOrders(OrderDate)
)
go

insert into dbo.Customers(CustomerId, CustomerName) values(1,'Customer 1');
insert into dbo.Orders(OrderDate, OrderId, OrderNum)
values
    ('2013-01-01',1,'Order 1'),
    ('2013-02-02',2,'Order 2'),
    ('2014-01-01',3,'Order 3'),
    ('2014-02-02',4,'Order 4')

Next, let’s create the backup chain and perform FULL, DIFFIRENTIAL and LOG backups.

-- Full backup
backup database [MyBigOrderDb] 
to disk = N'c:\db\MyBigOrderDb_Full.bak' 
with noformat, init, name = N'MyBigOrderDb-Full Database Backup', 
    compression, stats = 2
go

-- Differential backup
backup database [MyBigOrderDb] 
to disk = N'c:\db\MyBigOrderDb_Diff.bak' 
with differential, noformat, init, 
    name = N'MyBigOrderDb-Differential Database Backup', 
    compression, stats = 2
go

-- Transaction log
backup log [MyBigOrderDb] 
to disk = N'c:\db\MyBigOrderDb_Log.trn' 
with noformat, init, name = N'MyBigOrderDb-Tran Log', 
    compression, stats = 2
go

And at this point, let’s assume that disaster happens and we need to move database to another server. In this example, I would assume that we still have access to transaction log of the original database and we will perform tail-log backup to avoid any data loss.

backup log [MyBigOrderDb] 
to disk = N'c:\db\MyBigOrderDb_TailLog.trn' 
with no_truncate, noformat, init, name = N'MyBigOrderDb-Tail Log', 
 compression, norecovery, stats = 2

At this point, we will need to copy all files from the backup chain to another server and start restore process. We will perform piecemeal restore of Primary, Entities and FG2014 filegroups to support operational activity of the system without bringing historical FG2013 data online. The first operation is performing restore of the FULL database backup specifying just the filegroups we need to restore.

-- Restoring on another server (Same folder structure for demo sake)
-- Full Backup
restore database [MyBigOrderDb] 
FILEGROUP = 'primary', FILEGROUP = 'Entities', FILEGROUP = 'FG2014'
from disk = N'C:\DB\MyBigOrderDb_Full.bak' with file = 1,
move N'MyBigOrderDB' to N'c:\db\MyBigOrderDb.mdf', 
move N'MyBigOrderDB_Entities' to N'c:\db\MyBigOrderDb_Entities.ndf', 
move N'MyBigOrderDB_FG2014' to N'c:\db\MyBigOrderDb_2014.ndf', 
move N'MyBigOrderDb_log' to N'c:\db\MyBigOrderDb.ldf', 
NORECOVERY, partial, stats = 2;

it is worth mentioning, that I am using WITH NORECOVERY clause in all RESTORE commands including tail-log backup restore. In the end, I am recovering database with the separate RESTORE statement. This is just a good practice and safety measurement. By default, Management Studio uses WITH RECOVERY option with restore, which can lead to the situation that you accidentally recovered database at intermediate restore stage. You would not be able to restore further backups after that and would be forced to start from scratch.

As the next steps, we need to restore DIFFERENTIAL, LOG and tail-log backups as shown below. You do not need to specify filegroups as part of restore statement anymore – SQL Server would perform restore only in scope of the filegroups we are working with and which are in RESTORING state.

-- Diff Backup
restore database [MyBigOrderDb] 
from disk = N'C:\DB\MyBigOrderDb_Diff.bak' with file = 1,
NORECOVERY, stats = 2;

-- Tran Log
restore database [MyBigOrderDb] 
from disk = N'C:\DB\MyBigOrderDb_Log.trn' with file = 1,
NORECOVERY, stats = 2;

-- Tail-log
restore database [MyBigOrderDb] 
from disk = N'C:\DB\MyBigOrderDb_TailLog.trn' with file = 1,
NORECOVERY, stats = 2;

-- Recovery
restore database [MyBigOrderDb] with RECOVERY;

At this point, our database is partially online. We can query operational data as it shown below in Figure 1.

select * from MyBigOrderDb.dbo.Customers
select * from MyBigOrderDb.dbo.Orders where OrderDate >= '2014-01-01'

01. Querying Data from Operational Filegroups

However, if we try to query historical data, we will get an error, as shown in Figure 2

select * from MyBigOrderDb.dbo.Orders where OrderDate < '2014-01-01'

02. Querying Data from Historical Filegroup

You can check the status of the database filegroups with the following query. Figure 3 shows that three filegroups are online while FG2013 filegroup is still in RECOVERY_PENDING stage.

select file_id, name, state_desc, physical_name
from MyBigOrderDb.sys.database_files

03. Filegroup Status Before Historical Filegroup Restore

As you see, customers can use the system while we are working on restoring of FG2013 filegroup. We can perform this with the following script:

-- Full Backup (restoring individual filegroup)
restore database [MyBigOrderDb] 
FILEGROUP = 'FG2013'
from disk = N'C:\DB\MyBigOrderDb_Full.bak' with file = 1,
move N'MyBigOrderDB_FG2013' to N'c:\db\MyBigOrderDb_2013.ndf',  
stats = 2;

-- Diff Backup
restore database [MyBigOrderDb] 
from disk = N'C:\DB\MyBigOrderDb_Diff.bak' with file = 1,
stats = 2;

-- Tran Log
restore database [MyBigOrderDb] 
from disk = N'C:\DB\MyBigOrderDb_Log.trn' with file = 1,
stats = 2;

-- Tail-log
restore database [MyBigOrderDb] 
from disk = N'C:\DB\MyBigOrderDb_TailLog.trn' with file = 1,
stats = 2;

Now database is online as it is shown in Figure 4.

04. Filegroups Status After Historical Filegroup Restore

Piecemeal restore can significantly decrease database restore time. However, there is still time-consuming part in this process. Regardless how many filegroups you are restoring, you should copy/move backup file that contains initial FULL database backup to the new server. Time of this operation depends on network throughput and slow network and/or large backup files can lead to very long delays.

Obviously, you can take several approaches. You can always “be prepared” and copy files to reserved server after each backup. With such strategy, you would already have backup files in place when they need to be restored if/when disaster occurs.

However, you can use another approach if historical data is read-only. This approach is called “partial database backup”. Even though, Microsoft stated that it is designed for SIMPLE recovery models, it would work just fine with FULL recovery model, as long as you implemented it correctly.

Let’s look how it works and as the first step, let’s mark our FG2013 filegroup as read-only:

alter database MyBigOrderDb modify filegroup FG2013 readonly

At this point, you should start the new backup chain, which, in the nutshell, are two different sets of backup files. One set includes backup files for read-write filegroups (FULL, DIFFERENTIAL and LOG backups using READ_WRITE_FILEGROUPS option. In addition, you need to perform backup of read-only filegroup. You can do it with the script shown below:

-- Backing Up Read-Only Filegroup
backup database [MyBigOrderDb] 
FILEGROUP = N'FG2013'
to disk = N'c:\db\MyBigOrderFG2013.bak' 
with noformat, init, 
	name = N'MyBigOrderDb-FG2013 FG backup', 
	compression, stats = 2
go

-- Full backup of read_write filegroups
backup database [MyBigOrderDb] READ_WRITE_FILEGROUPS
to disk = N'c:\db\MyBigOrderDbRW_Full.bak' 
with noformat, init, name = N'MyBigOrderDb-Full Database Backup (R/W FG)', 
	compression, stats = 2
go

-- Differential backup
backup database [MyBigOrderDb] 
to disk = N'c:\db\MyBigOrderDbRW_Diff.bak' 
with differential, noformat, init, 
	name = N'MyBigOrderDb-Differential Database Backup (R/W FG)', 
	compression, stats = 2
go

-- Transaction log
backup log [MyBigOrderDb] 
to disk = N'c:\db\MyBigOrderDbRW_Log.trn' 
with noformat, init, name = N'MyBigOrderDb-Tran Log', 
	compression, stats = 2

The beauty of this situation that now you have data backups separated. You would have relatively small backup chain file(s) for operational data as well as large static backup file for read-only historical data. This will dramatically reduce time required to move operational data backup files over network.  It also helps with day-to-day backup strategy and reduces the time of backup operation and, server and network load, and storage space required to store the files. You can create new backup chains of operation data without taking new backup for historical data as long as those filegroups stay read-only.

If disaster occurs and you need to move database to another server, you can start with operation data backup chain as shown below. For simplicity sake, I am omitting tail-log backup – however, in real life you should always obtain it to avoid data loss.

-- Restoring on another server (Same folder structure for demo sake)
-- Full Backup (R/W filegroups only)
restore database [MyBigOrderDb] 
filegroup = 'primary', filegroup = 'Entities', filegroup = 'FG2014'
from disk = N'C:\DB\MyBigOrderDbRW_Full.bak' with file = 1,
move N'MyBigOrderDB' to N'c:\db\MyBigOrderDb.mdf',  
move N'MyBigOrderDB_Entities' to N'c:\db\MyBigOrderDb_Entities.ndf',  
move N'MyBigOrderDB_FG2014' to N'c:\db\MyBigOrderDb_2014.ndf',  
move N'MyBigOrderDb_log' to N'c:\db\MyBigOrderDb.ldf',  
norecovery, partial, stats = 2;

-- Diff Backup
restore database [MyBigOrderDb] 
from disk = N'C:\DB\MyBigOrderDbRW_Diff.bak' with file = 1,
norecovery, stats = 2;

-- Tran Log
restore database [MyBigOrderDb] 
from disk = N'C:\DB\MyBigOrderDbRW_Log.trn' with file = 1,
norecovery, stats = 2;

-- Recovery
restore database [MyBigOrderDb] with recovery;

If you queried the status of the database filegroups after restore, you would see very similar picture – all operational filegroups are online and historical filegroup is in RECOVERY_PENDING state. Figure 5 illustrates that.

05. Filegroup Status During Partial Restore

Again, at this point system is available to users and you can continue working on historical data while system is operational with the following script.

-- Restoring Read/Only Filegroup
restore database [MyBigOrderDb] 
filegroup = 'FG2013'
from disk = N'C:\DB\MyBigOrderFG2013.bak' with file = 1,
move N'MyBigOrderDB_FG2013' to N'c:\db\MyBigOrderDb_2013.ndf',  
recovery, stats = 2;

Be careful when you change read-only status of the filegroups when partial backup is used. One of examples of such scenario is when operational period changed and you need to move some former-operational data that becomes historical to another filegroup. One of the approaches to accomplish it is making historical filegroup as read-write, copying data there and making it read-only again. You can still restore the database using old read-only filegroup backup as long as you have backup chain with LOG backups that cover data movement operations. However, you would not be able to recover historical data if you start new backup chain for updateable filegroups after the fact and did not take new backup of the read-only filegroup.

As the general recommendation, it is safer to start new backup chain together with backup of historical filegroup after you made historical filegroup read-only again. And, most importantly, regardless what solution you are using, test your backup and restore strategies. This would help you to avoid unpleasant surprises when things went south.

Writing Triggers in the Right Way

The only correct way of writing triggers is not writing them. I would repeat – the only correct way of writing triggers is not writing them. Triggers negatively affect performance. AFTER triggers introduce fragmentation and increase tempdb load due to version store usage. Triggers are running in context of transaction, increase the time in which locks are held and contribute to locking and blocking in the system. However, in some cases, we have to use triggers and it is important to develop them correctly.

SQL Server fires triggers on statement rather than row level. For example, if update statement modified 10 rows, AFTER UPDATE and/or INSTEAD OF UPDATE trigger would fire once rather than 10 times. Therefore, trigger must handle situations when inserted and/or deleted tables have multiple rows.  Let’s look at the example and create a table with AFTER INSERT trigger as shown below:

create table dbo.Data 
( 
	ID int not null, 
	Value varchar(32) 
) 
go 

create trigger trgData_AI on dbo.Data 
after insert 
as 
	/* Some code */ 
	declare 
		@ID int 
	set @ID = (select ID from inserted) 
	/* Some code */

Everything is fine when you inserted a single row. However, multi-row insert would fail with exception shown below.

Msg 512, Level 16, State 1, Procedure trgData_AI, Line 9
Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.
The statement has been terminated.

And there is another consequence of statement-level nature of triggers. They fire even if DML statement did not modify any rows. Would it introduce any issues in the system or not depends on implementation. However, it would always lead to unnecessary performance overhead.

Let’s look at example. The code below creates another table and changes trigger in a way, that mimics behavior of audit functional, which is frequently implemented based on triggers.

create table dbo.Audit 
( 
	ID int not null identity(1,1), 
	OnDate datetime2(0) not null 
		constraint DEF_Audit_OnDate  
		default getutcdate(), 
	Message varchar(64) not null, 
)
go

alter trigger trgData_AI on dbo.Data 
after insert 
as 
begin
	declare 
		@Msg varchar(64) = 'Triggers are bad. And @@ROWCOUNT=' + convert(varchar(10),@@rowcount) 

	insert into dbo.Audit(Message) values(@Msg); 
end

Now let’s run insert statement, which does not insert any rows to the table.

insert into dbo.Data(ID, Value) 
	select 1, 'ABC'  
	where 1 = 0

If you checked content of Audit table, you would see that trigger was fired:

01. Content of Audit table

So the first important conclusion is that every trigger must checks @@ROWCOUNT variable as the very first statement in implementation.

SET NOCOUNT ON should be the second action trigger does. Without that SQL Server returns affected number of rows for each operator in the trigger in addition to original DML statement. Some client libraries rely on the single message in the output and would not work correctly in case of multiple messages. Therefore, the first two statements in the trigger should look like it is shown below.

alter trigger trgData_AI on dbo.Data 
after insert 
as 
begin
	if @@ROWCOUNT = 0
		return
	set nocount on

	/* Some Code Here */
end

Finally, there is another caveat. While implementation above works for insert, update and delete operators, it is not the case with merge@@ROWCOUNT in this case represents total number of rows affected by merge statement rather than by individual insert, update or delete action in the trigger. Let’s prove it with the example.

create table dbo.Data2(Col int not null)
go

create trigger trg_Data_AI on dbo.Data2
after insert 
as
	select 
		'After Insert' as [Trigger]
		,@@RowCount as [RowCount]
		,(select count(*) from inserted) as [Inserted Cnt]
		,(select count(*) from deleted) as [Deleted Cnt]
go

create trigger trg_Data_AU on dbo.Data2
after update
as
	select 
		'After Update' as [Trigger]
		,@@RowCount as [RowCount]
		,(select count(*) from inserted) as [Inserted Cnt]
		,(select count(*) from deleted) as [Deleted Cnt]
go

create trigger trg_Data_AD on dbo.Data2
after delete
as
	select 
		'After Delete' as [Trigger]
		,@@RowCount as [RowCount]
		,(select count(*) from inserted) as [Inserted Cnt]
		,(select count(*) from deleted) as [Deleted Cnt]
go

Now let’s run MERGE statement as shown below:

merge into dbo.Data2 as Target
using (select 1 as [Value]) as Source
on Target.Col = Source.Value
when not matched by target then
	insert(Col) values(Source.Value)
when not matched by source then
	delete
when matched then
	update set Col = Source.Value;

The table is empty, therefore only one insert statement would be executed. However, as you should see below, all three triggers were fired, It is worth mentioning that previous code example demonstrated very bad practice of returning result sets from trigger. You should avoid doing it in production code.

02. Triggers and MERGE statement

The right way to deal with such situation is checking content of inserted and deleted tables as shown below.

alter trigger trg_Data_AI on dbo.Data2
after insert 
as
	if @@rowcount = 0 
		return
	set nocount on
	if exists(select * from inserted)
		/* Some Code Here */      
go

alter trigger trg_Data_AU on dbo.Data2
after update
as
	if @@rowcount = 0 
		return
	set nocount on
	if	exists(select * from inserted) and 
		exists(select * from deleted)
		/* Some Code Here */      
go

alter trigger trg_Data_AD on dbo.Data2
after delete
as
	if @@rowcount = 0 
		return
	set nocount on
	if	exists(select * from deleted)
		/* Some Code Here */

I hope that those tips will help you to write trigger in the most optimal way. However, I would repeat – the best way of writing triggers is not writing them at all.

Unwinding Table Spools

One of the questions I have often been asked during various SQL Server events is what exactly Table Spool operator does in execution plan. I would try to answer it today.

Spool operators, in the nutshell, are internal in-memory or on-disk caches/temporary tables. SQL Server often uses spools for performance reasons to cache results of complex subexpressions that needs to be used multiple times during query execution.

Let’s look at the example and create the table, which stores some sales information as shown below:

create table dbo.Orders
(
    OrderID int not null,
    CustomerId int not null,
    Total money not null,
    constraint PK_Orders
    primary key clustered(OrderID)
);

;with N1(C) as (select 0 union all select 0) -- 2 rows
,N2(C) as (select 0 from N1 as T1 cross join N1 as T2) -- 4 rows
,N3(C) as (select 0 from N2 as T1 cross join N2 as T2) -- 16 rows
,N4(C) as (select 0 from N3 as T1 cross join N3 as T2) -- 256 rows
,Nums(Num) as (select row_number() over (order by (select null)) from N4)
    insert into dbo.Orders(OrderId, CustomerId, Total)
        select Num, Num % 10 + 1, Num
        from Nums;

Now let’s run the query that returns the list of orders with the information about total amount of sales on per-customer basis.

select OrderId, CustomerID, Total
	,Sum(Total) over(partition by CustomerID) as [Total Customer Sales] 
from dbo.Orders

As you see, in the execution plan below, SQL Server scans the table, sorts the data based on CustomerID order and uses Table Spool operator to cache the results. It allows SQL Server to access the cached data and avoid expensive sorting operation later.

01. Execution Plan with Spool operators

Even though Table Spool operator is shown in the execution plan multiple times, it is essential the same spool/cache. SQL Server builds it the first time and using its data later.

Technically speaking, there are two different logical spool operators – Eager Spool and Lazy Spool. The only difference between them are how data is populated. With Eager Spool, SQL Server fetches all rows as soon as spool is called. With Lazy Spool, SQL Server fetches rows on demand – when they are needed.

SQL Server also uses spools for Halloween Protection when modifying the data. Halloween Protection helps to avoid situations when data modifications affect what data need to be updated. The classic example of such situation is shown below. Without Halloween Protection, insert statement would fall into infinitive loop, reading rows it has been inserting.

create table dbo.HalloweenProtection
(
	Id int not null identity(1,1),
	Data int not null
);

insert into dbo.HalloweenProtection(Data)
	select Data from dbo.HalloweenProtection;

As you can see in the execution plan of insert statement, SQL Server uses Table Spool operator to cache the data from table as of before insert starts to avoid infinitive loop during execution.

02. Halloween Protection and Table Spools.

Halloween Protection has very interesting side-effect when we are talking about multi-statement user-defined functions (both, scalars and table-valued). Using multi-statement functions is bad practice by itself, however creating them without SCHEMABINDING option is even worse. That option forces SQL Server to analyze if user-defined function performs data access and avoid extra Halloween Protection-related Spool operators in the execution plan.

Let’s see the example and create two user-defined functions and using them in where clause of update statements.

create function dbo.ShouldUpdateData(@Id int)
returns bit
as
begin
	return (1)
end
go

create function dbo.ShouldUpdateDataSchemaBound(@Id int)
returns bit
with schemabinding
as
begin
	return (1)
end
go

update dbo.HalloweenProtection
set Data = 0
where dbo.ShouldUpdateData(ID) = 1;

update dbo.HalloweenProtection
set Data = 0
where dbo.ShouldUpdateDataSchemaBound(ID) = 1;

Neither of functions access the data and therefore can introduce Halloween effect. However, SQL Server does not know that in case of non-schema bound function and add Spool operator to execution plan as shown below.

03. Execution Plans for user-defined functions

Bottom line – you should always make functions schema-bound when you create them.

Spool temporary tables are usually referenced as worktables in I/O statistics for the queries. You should analyze table spool-related reads during query performance tuning. While spools can improve performance of the queries, there is the overhead introduced by unnecessary spools. You can often remove them by creating appropriate indexes on the tables.

SQL Server Storage Engine: Heap Tables

Heap tables are tables without a clustered index. The data in heap tables is unsorted.  SQL Server does not guarantee nor maintain any sorting order of the data in the heap tables.
When we insert data into heap tables, SQL Server tries to fill pages as much as possible, although it does not analyze the actual free space available on a page. It uses the Page Free Space (PFS) allocation map instead. SQL Server errs on the side of caution, and it uses the low value from the PFS free space percentage tier during the estimation.

For example, if a data page stores 4,100 bytes of data, and, as result, has 3,960 bytes of free space available, PFS would indicate that the page is 51-80 percent full. SQL Server would not put a new row to the page if its size exceeds 20 percent (8,060 bytes * 0.2 = 1,612 bytes) of the page size.

Let’s look at the example and create the table with the code shown below.

create table dbo.Heap(Val varchar(8000) not null);

;with CTE(ID,Val) as 
(
    select 1, replicate('0',4089)
    union all
    select ID + 1, Val from CTE where ID < 20 
) 
insert into dbo.Heap(Val)
   select Val from CTE;

select page_count, avg_record_size_in_bytes, avg_page_space_used_in_percent
from sys.dm_db_index_physical_stats(db_id(),object_id(N'dbo.Heap'),0,null,'DETAILED');

01. Page Count after initial insert

At this point, the table stores 20 rows of 4,100 bytes each. SQL Server allocates 20 data pages—one page per row—with 3,960 bytes available. PFS would indicate that pages are 51-80 percent full.

As the next step, let’s inserts the small 111-byte row, which is about 1.4 percent of the page size. As a result, SQL Server knows that the row would fit into one of the existing pages (they all have at least 20 percent of free space available), and a new page should not be allocated.

insert into dbo.Heap(Val) values(replicate('1',100));

select page_count, avg_record_size_in_bytes, avg_page_space_used_in_percent 
from sys.dm_db_index_physical_stats(db_id(),object_id(N'dbo.Heap'),0,null,'DETAILED');

02. Page Count after insertion of small row

Lastly, the third insert statement needs 2,011 bytes for the row, which is about 25 percent of the page size. SQL Server does not know if any of the existing pages have enough free space to accommodate the row and, as a result, allocates the new page. You can see that SQL Server does not access existing pages by checking the actual free space and uses PFS data for the estimation.

insert into dbo.Heap(Val) values(replicate('2',2000));

select page_count, avg_record_size_in_bytes, avg_page_space_used_in_percent 
from sys.dm_db_index_physical_stats(db_id(),object_id(N'dbo.Heap'),0,null,'DETAILED');

03. Page Count after insertion of large row

That behavior leads to the situation where SQL Server unnecessarily allocates new data pages, leaving large amount of free space unused. It is not always a problem when the size of rows vary—in those cases, SQL Server eventually fills empty spaces with the smaller rows. However, especially in cases when all rows are relatively large, you can end up with large amounts of wasted space.

When selecting data from the heap table, SQL Server uses as Index Allocation Map (IAM) to find the pages and extents that need to be scanned. It analyzes what extents belong to the table and processes them based on their allocation order rather than on the order in which the data was inserted. You can see it in figure below.

04. IAM Scan

When you update the row in the heap table, SQL Server tries to accommodate it on the same page. If there is no free space available, SQL Server moves the new version of the row to another page and replaces the old row with a special 16-byte row called a forwarding pointer. The new version of the row is called forwarded row.

05. Forwarding Pointers

There are two main reasons why forwarding pointers are used. First, they prevent updates of nonclustered index keys, which referencing the row.

In addition, forwarding pointers helps minimize the number of duplicated reads – the situation when a single row is read multiple times during the table scan. As the example, let’s look at figure above and assume that SQL Server scans the pages in left-to-right order. Next, let’s assume that the row in page 3 was modified after the page was read at the time when SQL Server reads page 4. The new version of the row would be moved to page 5, which has yet to be processed. Without forwarding pointers, SQL Server would not know that the old version of the row had already been read, and it would read it again during the page 5 scan. With forwarding pointers, SQL Server would ignore the forwarded rows.

Forwarding pointers help minimize duplicated reads at cost of additional read operations. SQL Server follows the forwarding pointers and reads the new versions of the rows at the time it encounters them. That behavior can introduce an excessive number of I/O operations.

Let’s create the table and insert three rows there.

create table dbo.ForwardingPointers
(
   ID int not null,
   Val varchar(8000) null
);

insert into dbo.ForwardingPointers(ID,Val)
values
   (1,null),
   (2,replicate('2',7800)),
   (3,null);

select page_count, avg_record_size_in_bytes, avg_page_space_used_in_percent, forwarded_record_count
from sys.dm_db_index_physical_stats(db_id(),object_id(N'dbo.ForwardingPointers'),0,null,'DETAILED');

set statistics io on
select count(*) as [RowCnt] from dbo.ForwardingPointers
set statistics io off

You can see results below

06. Forwarding Pointers: I/O without forwarding pointers

All three rows fit into the single page, and SQL Server needs to read just that page when it scans the table.

07. Page Layout without Forwarding Pointers

Let’s update two of the table rows and increase their size. The new version of the rows would not fit into the page anymore, which introduces the allocation of the two new pages and two forwarding pointers.

update dbo.ForwardingPointers set Val = replicate('1',5000) where ID = 1;
update dbo.ForwardingPointers set Val = replicate('3',5000) where ID = 3;

select page_count, avg_record_size_in_bytes, avg_page_space_used_in_percent, forwarded_record_count
from sys.dm_db_index_physical_stats(db_id(),object_id(N'dbo.ForwardingPointers'),0,null,'DETAILED');

set statistics io on
select count(*) as [RowCnt] from dbo.ForwardingPointers
set statistics io off

08. Forwarding Pointers: I/O with Forwarding Pointers

When SQL Server reads the forwarding pointer rows from page 1, it follows them and reads pages 2 and 3 immediately thereafter. After that, SQL Server reads those pages one more time during the regular IAM scan process. As a result, we have five read operations, even though our table has just three data pages.

09. Page Layout and I/O with Forwarding Pointers

It does not look as bad in case of the small table. Let’s look at the same issue in case, when table has more rows. Let’s insert 65,536 rows to our table.

truncate table dbo.ForwardingPointers
go

;with N1(C) as (select 0 UNION ALL select 0) -- 2 rows
,N2(C) as (select 0 from N1 as T1 cross join N1 as T2) -- 4 rows
,N3(C) as (select 0 from N2 as T1 cross join N2 as T2) -- 16 rows
,N4(C) as (select 0 from N3 as T1 cross join N3 as T2) -- 256 rows
,N5(C) as (select 0 from N4 as T1 cross join N4 as T2) -- 65,536 rows
,IDs(ID) as (select row_number() over (order by (select NULL)) from N5)
insert into dbo.ForwardingPointers(ID)
	select ID from IDs;

select page_count, avg_record_size_in_bytes, avg_page_space_used_in_percent, forwarded_record_count
from sys.dm_db_index_physical_stats(db_id(),object_id(N'dbo.ForwardingPointers'),0,null,'DETAILED');

set statistics io on
select count(*) as [RowCnt] from dbo.ForwardingPointers
set statistics io off

10. Large table: I/O without Forwarding Pointers

As you see, there are 106 pages in the table and as result, SQL Server performs 106 reads during IAM scan. Let’s update our table and introduce forwarding pointers.

update dbo.ForwardingPointers set Val = replicate('a',500);

select page_count, avg_record_size_in_bytes, avg_page_space_used_in_percent, forwarded_record_count
from sys.dm_db_index_physical_stats(db_id(),object_id(N'dbo.ForwardingPointers'),0,null,'DETAILED');

set statistics io on
select count(*) as [RowCnt] from dbo.ForwardingPointers
set statistics io off

11. Large Table: I/O with Forwarding Pointers

Now our table has 4,461 pages however it requires almost 70 thousand reads to perform a scan. As you see, the large number of the forwarding pointers leads to extra I/O operations and significantly reduces the performance of the queries accessing the data.

When the size of the forwarded row is reduced by another update and the data page with forwarding pointer has enough space to accommodate updated version of the row, SQL Server might move it back to original data page and remove the forwarding pointer. Nevertheless, the only reliable way to get rid of the all forwarding pointers is by rebuilding the heap table. You can do that by using an ALTER TABLE REBUILD statement or by creating and dropping a clustered index on the table.

Heap tables can be useful in staging environment where you want to import a large amount of data into the system as fast as possible. Inserting data into heap tables can often be faster than inserting it into tables with clustered indexes. Nevertheless, during a regular workload, tables with clustered indexes usually outperform heap tables due to their suboptimal space control and forwarding pointers.

Table of content

SQL Server Storage Engine: Allocation Maps

SQL Server logically groups eight pages into 64KB units called extents. There are two types of extents available: Mixed extents store data that belongs to different objects. Uniform extents store the data for the same object.

When a new object is created, SQL Server stores first eight object pages in mixed extents. After that, all subsequent space allocation for that object is done with uniform extents.

SQL Server uses special kind of pages, called Allocation Maps, to track extent and page usage in a file. There are several different types of allocation maps pages in SQL Server.

Global Allocation Map (GAM) pages track if extents have been allocated by any objects. The data is represented as bitmaps where each bit indicates the allocation status of an extent. Zero bits indicate that the corresponding extents are in use. The bits with a value of one indicate that the corresponding extents are free. Every GAM page covers about 64,000 extents, or almost 4GB of data. This means that every database file has one GAM page for about 4GB of file size.

Shared Global Allocation Map (SGAM) pages track information about mixed extents. Similar to GAM pages, it is a bitmap with one bit per extent. The bit has a value of one if the corresponding extent is a mixed extent and has at least one free page available. Otherwise, the bit is set to zero. Like a GAM page, SGAM page tracks about 64,000 extents, or almost 4GB of data.

SQL Server can determine the allocation status of the extent by looking at the corresponding bits in GAM and SGAM pages. Figure below shows the possible combinations of the bits.

01. GAM and SGAM bit statuses

When SQL Server needs to allocate a new uniform extent, it can use any extent where a bit in the GAM page has the value of one. When SQL Server needs to find a page in a mixed extent, it searches both allocation maps looking for the extent with a bit value of one in a SGAM page and the corresponding zero bit in a GAM page. If there are no such extents available, SQL Server allocates the new free extent based on the GAM page, and it sets the corresponding bit to one in the SGAM page.

Every database file has its own chain of GAM and SGAM pages. The first GAM page is always the third page in the data file (page number 2). The first SGAM page is always the fourth page in the data file (page number 3). The next GAM and SGAM pages appear every 511,230 pages in the data files which allows SQL Server to navigate through them quickly when needed.

SQL Server tracks the pages and extents used by the different types of pages (in-row, row-overflow, and LOB pages), that belong to the object with another set of the allocation map pages, called Index Allocation Map (IAM). Every table/index has its own set of IAM pages, which are combined into separate linked lists called IAM chains. Each IAM chain covers its own allocation unit – IN_ROW_DATA, ROW_OVERFLOW_DATA, and LOB_DATA.

Each IAM page in the chain covers a particular GAM interval and represents the bitmap where each bit indicates if a corresponding extent stores the data that belongs to a particular allocation unit for a particular object. In addition, the first IAM page for the object stores the actual page addresses for the first eight object pages, which are stored in mixed extents.

The figure below shows a simplified version of the allocation map pages bitmaps.

02. Allocation maps

Partitioned tables and indexes have separate IAM chains for every partition. In fact, you can consider non-partitioned table as the partitioned with just a single partition.

There is another type of allocation map page called Page Free Space (PFS). Despite the name, PFS pages track a few different things. We can call PFS as a byte-mask, where every byte stores information about a specific page, as shown below.

03. PFS byte structure

The first three bits in the byte indicate the percent of used space on the page. SQL Server tracks the used space for row-overflow and LOB data, as well as for in-row data in the heap tables. These are the only cases when amount of free space on the page matters.

When you delete a data row from the table, SQL Server does not remove it from the data page but rather marks the row as deleted. Bit 4 indicates if the page has logically deleted (ghosted) rows.

Bit 5 indicates if the page is an IAM page. Bit 6 indicates whether or not the page is in the mixed extent. Finally, bit 7 indicates if the page is allocated.

Every PFS page tracks 8,088 pages or about 64MB of data space. It is always the second page (page 1) in the file and every 8,088 pages thereafter.

There are two more types of allocation map pages. The seventh page (page 6) in the file is called a Differential Changed Map (DCM). These pages keep track of extents that have been modified since the last FULL database backup. SQL Server uses DCM pages when it performs DIFFERENTIAL backups.

The last allocation map is called Bulk Changed Map (BCM). It is eighth page (page 7) in the file, and it indicates what extents have been modified in minimally-logged operations since the last transaction log backup. BCM pages are used only with a BULK_LOGGED database recovery model.

Both, DCM and BCM pages are the bitmasks that cover 511,230 pages in the data file.

Next: HEAP tables

Table of Content

SQL Server Storage Engine: LOB Storage

As you already know, the fixed-length data and the internal attributes of a row must fit into a single page. Fortunately, SQL Server can store the variable-length data on different data pages. There are two different ways to store the data, depending on the data type and length.

ROW_OVERFLOW storage

SQL Server stores variable-length column data, which does not exceed 8,000 bytes, on special pages called ROW_OVERFLOW pages. Let’s create a table and populate it with the data shown in listing below.

create table dbo.RowOverflow 
( 
    ID int not null, 
    Col1 varchar(8000) null, 
    Col2 varchar(8000) null 
);

insert into dbo.RowOverflow(ID, Col1, Col2) 
values (1,replicate('a',8000),replicate('b',8000));

SQL Server creates the table and inserts the data row without any errors, even though the data row size exceeds 8,060 bytes. Let’s look at the table page allocation using the DBCC IND command.

DBCC IND('SqlServerInternals','dbo.RowOverflow',-1)

 

01. ROW_OVERFLOW data: DBCC IND results

Now you can see two different sets of IAM and data pages. The data page with PageType=3 represents the data page that stores ROW_OVERFLOW data.

Let’s look at data page 214647, which is the in-row data page that stores main row data. The partial output of the DBCC PAGE command for the page (1:214647) is shown below.

Slot 0 Offset 0x60 Length 8041

Record Type = PRIMARY_RECORD Record Attributes = NULL_BITMAP VARIABLE_COLUMNS
Record Size = 8041 
Memory Dump @0x000000000FB7A060
0000000000000000:30000800 01000000 03000002 00511f69 9f616161 0............Q.iŸaaa
0000000000000014:61616161 61616161 61616161 61616161 61616161 aaaaaaaaaaaaaaaaaaaa
0000000000000028:61616161 61616161 61616161 61616161 61616161 aaaaaaaaaaaaaaaaaaaa
000000000000003C:61616161 61616161 61616161 61616161 61616161 aaaaaaaaaaaaaaaaaaaa
0000000000000050:61616161 61616161 61616161 61616161 61616161 aaaaaaaaaaaaaaaaaaaa
<Skipped>
0000000000001F2C:61616161 61616161 61616161 61616161 61616161 aaaaaaaaaaaaaaaaaaaa
0000000000001F40:61616161 61616161 61616161 61616161 61020000 aaaaaaaaaaaaaaaaa...
0000000000001F54:00010000 00290000 00401f00 00754603 00010000 .....)...@...uF.....
0000000000001F68:00

As you see, SQL Server stores Col1 data in-row. Col2 data, however, has been replaced with a 24-byte value. The first 16 bytes are used to store off-row storage metadata attributes, such as type, length of the data, and a few other attributes. The last 8 bytes is the actual pointer to the row on the row-overflow page, which is the file, page, and slot number. Figure below shows this in detail. Remember that all information is stored in byte-swapped order.

02. ROW_OVERFLOW page pointer

As you see, the slot number is 0, file number is 1, and page number is the hexadecimal value 0x00034675, which is decimal 214645. The page number matches the DBCC IND results shown earlier in the post.

The partial output of the DBCC PAGE command for the page (1:214645) is shown below.

Blob row at: Page (1:214645) Slot 0 Length: 8014 Type: 3 (DATA)
Blob Id:2686976

0000000008E0A06E: 62626262 62626262 62626262 62626262 bbbbbbbbbbbbbbbb
0000000008E0A07E: 62626262 62626262 62626262 62626262 bbbbbbbbbbbbbbbb
0000000008E0A08E: 62626262 62626262 62626262 62626262 bbbbbbbbbbbbbbbb

Col2 data is stored in the first slot on the page.

LOB Storage

For the text, ntext, or image columns, SQL Server stores the data off-row by default. It uses another kind of page called LOB data pages. You can control this behavior by using the “text in row” table option. For example, exec sp_table_option dbo.MyTable, ‘text in row’, 200 forces SQL Server to store LOB data less or equal to 200 bytes in-row. LOB data greater than 200 bytes would be stored in LOB pages.

The logical LOB data structure is shown below.

03: LOB data structure

Like ROW_OVERFLOW data, there is a pointer to another piece of information called the LOB root structure, which contains a set of the pointers to other data pages/rows. When LOB data is less than 32 KB and can fit into five data pages, the LOB root structure contains the pointers to the actual chunks of LOB data. Otherwise, the LOB tree starts to include an additional, intermediate level of pointers, similar to the index B-Tree.

Let’s create the table and insert one row of data there.

create table dbo.TextData
(
    ID int not null,
    Col1 text null
);

insert into dbo.TextData(ID, Col1) 
values (1, replicate(convert(varchar(max),'a'),16000));

The page allocation for the table is shown below.

04. LOB data: DBCC IND results

As you see, the table has one data page for in-row data and three data pages for LOB data. I am not going to examine the structure of the data row for in-row allocation; it is similar to the ROW_OVERFLOW allocation. However, with the LOB allocation, it stores less metadata information in the pointer and uses 16 bytes rather than the 24 bytes required by the ROW_OVERFLOW pointer.

The result of DBCC PAGE command for the page that stores the LOB root structure is shown below.

Blob row at: Page (1:3046835) Slot 0 Length: 84 Type: 5 (LARGE_ROOT_YUKON)
Blob Id: 131661824 Level: 0 MaxLinks: 5 CurLinks: 2
Child 0 at Page (1:3046834) Slot 0 Size: 8040 Offset: 8040 
Child 1 at Page (1:3046832) Slot 0 Size: 7960 Offset: 16000

As you see, there are two pointers to the other pages with LOB data blocks, which are similar to the blob data stored in ROW_OVERFLOW pages.

The format, in which SQL Server stores the data from the (MAX) columns, such as varchar(max), nvarchar(max), and varbinary(max), depends on the actual data size. SQL Server stores it in-row when possible. When in-row allocation is impossible, and data size is less or equal to 8,000 bytes, it stored as ROW_OVERFLOW data. The data that exceeds 8,000 bytes is stored as LOB data.

It is also worth mentioning that SQL Server always stores rows that fit into a single page using in-row allocations. When a page does not have enough free space to accommodate a row, SQL Server allocates a new page and places the row there rather than placing it on the half-full page and moving some of the data to ROW_OVERFLOW pages.

SELECT * and I/O

There are plenty of reasons why selecting all columns from a table with the select * operator is not a good idea. It increases network traffic by transmitting columns that the client application does not need. It also makes query performance tuning more complicated, and it introduces side effects when the table schema changes.

It is recommended that you avoid such a pattern and explicitly specify the list of columns needed by the client application. This is especially important with ROW_OVERFLOW and LOB storage, when one row can have data stored in multiple data pages. SQL Server needs to read all of those pages, which can significantly decrease the performance of queries.

As an example, let’s assume that we have table dbo.Employees with one column storing employee pictures.

create table dbo.Employees
(
    EmployeeId int not null,
    Name varchar(128) not null,
    Picture varbinary(max) null
);

;WITH N1(C) AS (SELECT 0 UNION ALL SELECT 0) -- 2 rows
,N2(C) AS (SELECT 0 FROM N1 AS T1 CROSS JOIN N1 AS T2) -- 4 rows
,N3(C) AS (SELECT 0 FROM N2 AS T1 CROSS JOIN N2 AS T2) -- 16 rows
,N4(C) AS (SELECT 0 FROM N3 AS T1 CROSS JOIN N3 AS T2) -- 256 rows
,N5(C) AS (SELECT 0 FROM N4 AS T1 CROSS JOIN N2 AS T2) -- 1,024 rows
,IDs(ID) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM N5)
insert into dbo.Employees(EmployeeId, Name, Picture)
    select   
        ID
        ,'Employee ' + convert(varchar(5),ID)
        ,convert(varbinary(max),replicate(convert(varchar(max),'a'),120000))
    from Ids;

The table has 1,024 rows with binary data of 120,000 bytes. Let’s assume that we have code in the client application that needs the EmployeeId and Name to populate a drop-down box. If a developer is not careful, he can write a select statement using the select * pattern, even though a picture is not needed for this particular use-case.

Let’s compare the performance of two selects; one selecting all data columns and another that selects only EmployeeId and Name

set statistics io on
set statistics time on

select * from dbo.Employees;
select EmployeeId, Name from dbo.Employees;

set statistics io off
set statistics time off
Results:
select EmployeeId, Name from dbo.Employee: 
Number of reads: 7;  Execution time (ms): 2

select * from dbo.Employee
Number of reads: 90,895; Execution time (ms): 343

As you see, the first select, which reads the LOB data and transmits it to the client, is a few orders of magnitude slower than the second select.

One case where this becomes extremely important is with client applications, which use Object Relational Mapping (ORM) frameworks. Developers tend to reuse the same entity objects in different parts of an application. As a result, an application may load all attributes/columns even though it does not need all of them in many cases.

It is better to define different entities with a minimum set of required attributes on an individual use-case basis. In our example, it would work best to create separate entities/classes, such as EmployeeList and EmployeeProperties. An EmployeeList entity would have two attributes: EmployeeId and Name. EmployeeProperties would include a Picture attribute in addition to the two mentioned.

This approach can significantly improve the performance of systems.

Next: Allocation Maps

Table of Content