I am OLTP guy. I cannot grasp concept of the columnstore indexes – indexes that do not care about an order of columns in the definition. It was a reason why Data Warehouses and I lived happily ever after. Just apart from each other.
It was not hard, after all. Even though you can find some use cases for nonclustered columnstore indexes in OLTP environment, inability to modify data after index was created makes those use cases quite rare. Technically, you can use data partitioning and partitioned views and benefit from them in some scenarios; however, such implementation requires large amount of work.
The situation changed after SQL Server 2014 release. Now you can define clustered columnstore indexes, which are updateable. It does not make them suited for OLTP environment – you should remember that they are optimized for large SCAN workloads. Moreover, as the opposite of nonclustered columnstore indexes, they are the only indexes you can define on the table. It is impossible to define B-Tree indexes on the same table and support both environments.
Anyway, I decided to spend some time and explore how clustered columnstore indexes handle data modifications. It was very easy to find some references online; however, neither of the references talks about internal implementation of the indexes. Today, we will try to close this gap.
As the first step, let’s talk about high level structure of clustered columnstore indexes. They use the same storage format as nonclustered columnstore indexes storing columnstore data in row groups. Each row group stores data for up to 1,048,576 rows in column-based format. Data from each column stored separately in highly compressed fashion.
Clustered columnstore indexes have two additional elements to support data modifications. The first is delete bitmap that indicates what rows were deleted from a table. The second structure is delta store that includes newly inserted rows. Both, delta store and delete bitmap use B-Tree format to store data.
SQL Server works with delete bitmap and delta stores transparently to users, which makes terminology confusing. You can often see delta stores being referenced as another row group in the documentation and technical articles. Moreover, delete bitmap is often considered as a part of delta store and/or row groups. I will use the following terminology today to avoid confusion. A term row group references data stored in column-based storage format. I will explicitly reference delta stores and delete bitmap as two separate set of internal objects whenever needed.
You can see example of the structure of clustered columnstore index in a table that has two partitions in Figure 1 below. Each partition can have a single delete bitmap and multiple delta stores. It is worth mentioning that delete bitmap and delta stores are created on-demand, for example, delete bitmap would not be created unless some of the rows in the row groups were deleted.
Every time when you delete a row that is stored in a row group (not in a delta store), SQL Server adds information about deleted row to delete bitmap. Nothing happens to original row. It is still stored in a row group; however, SQL Server checks delete bitmap during query execution excluding deleted rows from the processing. When you insert data into columnstore index, it goes into a delta store. Updating a row that is stored in a row group do not change row data either. Such update triggers deletion of a row, which is, in fact, insertion to delete bitmap, and insertion of a new version of a row to a delta store. However, any data modifications of the rows in delta store are done the same way as in the regular B-Tree indexes by updating and deleting actual rows there. You will see one of such examples later.
Each delta store can be either in open or closed state. Open delta stores accept new rows and allow modifications and deletions of the data. SQL Server closes a delta store when it reaches 1,048,576 rows, which is the maximum number of rows that can be stored in a row group. Another SQL Server process, called tuple mover, runs every five minutes and converts closed delta stores to row groups that store data in column-based storage format.
You can examine the state of row groups and delta store with sys.column_store_row_groups view. Figure 2 illustrates an output of this view, which returns combined information about all columnstore index objects. Rows in OPEN or CLOSED state correspond to delta stores. Rows in COMPRESSED state correspond to row groups with data in column-based storage format. Finally, deleted_rows column provide statistics about deleted rows stored in delete bitmap.
As you see, the second row in a view output shows closed delta store that have yet to be picked up by tuple mover process. The situation would change after tuple mover process converted closed delta store to a row group. Figure 3 illustrates the output from a view after it happened. It is worth mentioning that row_group_id of converted row group changed. Tuple mover created new row group dropping closed delta store afterwards.
Let’s look at the structure of delta store and delete bitmap rows. Listing below creates a table and populates it with the data creating clustered columnstore index afterwards. I am using MAXDOP=1 option to reduce the number of partially populated row groups.
create table dbo.CCI ( Col1 int not null, Col2 varchar(4000) not null ); ;with N1(C) as (select 0 union all select 0) -- 2 rows ,N2(C) as (select 0 from N1 as T1 cross join N1 as T2) -- 4 rows ,N3(C) as (select 0 from N2 as T1 cross join N2 as T2) -- 16 rows ,N4(C) as (select 0 from N3 as T1 cross join N3 as T2) -- 256 rows ,N5(C) as (select 0 from N4 as T1 cross join N4 as T2) -- 65,536 rows ,N6(C) as -- 1,048,592 rows ( select 0 from N5 as T1 cross join N3 as T2 union all select 0 from N3 ) ,IDs(ID) as (select ROW_NUMBER() over (order by (select NULL)) from N6) insert into dbo.CCI(Col1,Col2) select ID, 'aaa' from IDS go create clustered columnstore index IDX_CS_CLUST on dbo.CCI with (maxdop=1) go select g.state_description, g.row_group_id, s.column_id ,s.row_count, s.min_data_id, s.max_data_id, g.deleted_rows from sys.column_store_segments s join sys.partitions p on s.partition_id = p.partition_id join sys.column_store_row_groups g on p.object_id = g.object_id and s.segment_id = g.row_group_id where p.object_id = object_id(N'dbo.CCI') order by g.row_group_id, s.column_id;
As you see in Figure 4, columnstore index has two row groups and does not have delta store nor delete bitmap. You can see Col1 values that are stored in both row groups in min_data_id and max_data_id columns for the rows that have column_id=1.
As the next step, let’s perform some data modifications in the table. First statement inserts two new rows into the table. Second statement deletes three rows, including one row we just inserted. Finally, we will update another, newly inserted, row.
insert into dbo.CCI(Col1,Col2) values (2000000,REPLICATE('c',4000)), (2000001,REPLICATE('d',4000)); delete from dbo.CCI where Col1 in ( 100 -- Row group 0 ,16150 -- Row group 1 ,2000000 -- Newly inserted row (Delta Store) ); update dbo.CCI set Col2 = REPLICATE('z',4000) where Col1 = 2000001; -- Newly inserted row (Delta Store)
Now it is a time to find data pages that used by delta store and delete bitmap. We will use undocumented sys.dm_db_database_page_allocations system function as shown below.
select object_id, index_id, partition_id ,allocation_unit_type_desc as [Type] ,is_allocated,is_iam_page,page_type,page_type_desc ,allocated_page_file_id as [FileId] ,allocated_page_page_id as [PageId] from sys.dm_db_database_page_allocations (db_id(), object_id('dbo.CCI'),NULL, NULL, 'DETAILED')
You can see an output of the query in Figure 5. SQL Server stores columnstore segments in LOB_DATA allocation units. Delta store and delete bitmap are using IN_ROW_DATA allocation.
Let’s look at the data pages using another undocumented DBCC PAGE command with the code shown below. Obviously, in your environment, database, file and page IDs would be different.
-- Redirecting output to console dbcc traceon(3604) -- Analyzing content of a page dbcc page ( 9 -- Database Id ,1 -- FileId ,306 -- PageId ,3 -- Output style )
Figure 6 shows partial content of a data page, which is a delta store page. As you can see, SQL Server stores data in regular row-based storage. There is one internal column CSILOCATOR in addition to two table columns. CSILOCATOR is used as internal unique identifier of the row in delta store. Finally, it is worth mentioning that a row with Col1=2000000, which we have inserted and deleted after clustered columnstore index was created, is not present in delta store. SQL Server deletes (and updates) rows in B-Tree delta store the same way as with regular B-Tree tables.
You can use the same approach to examine content of a deleted bitmap data page. In my case, the page id is 308. Figure 7 shows the partial output of DBCC PAGE command. As you see, delete bitmap includes two columns, which are uniquely identifying a row. The first column is a row group id and the second column is offset of the row in the segment. Do not be confused by the fact that column names match table columns. DBCC PAGE uses table metadata to prepare an output.
As you see, both delta store and deleted bitmap pages were using row compression in our example, which means delta store and delete bitmap either row- or page-compressed. As you know, in case of page compression, SQL Server performs page compression only when page is full and retain it only if it provides significant space savings. Otherwise, data is kept in row-compressed format even when index is defined with page compression.
Let’s run a test that inserts large batch of rows that can benefit from page compression using code shown in Listing below.
;with N1(C) as (select 0 union all select 0) -- 2 rows ,N2(C) as (select 0 from N1 as T1 cross join N1 as T2) -- 4 rows ,N3(C) as (select 0 from N2 as T1 cross join N2 as T2) -- 16 rows ,N4(C) as (select 0 from N3 as T1 cross join N3 as T2) -- 256 rows ,N5(C) as (select 0 from N4 as T1 cross join N4 as T2) -- 65,536 rows ,IDs(ID) as (select ROW_NUMBER() over (order by (select NULL)) from N5) insert into dbo.CCI(Col1,Col2) select ID, REPLICATE('a',255) from IDS
Figure 8 illustrates content of the data page from delta store after insert. The presence of compression info record indicates that delta store is using page compression
Let’s examine what happens with delete bitmap and delete all rows from compressed row groups with code shown in Listing below.
delete from dbo.CCI
As you can see in Figure 9 below, page is still uses row compression even though now it is fully populated. Obviously, we cannot guarantee that delete bitmap is not defined with page compression – after all it is not documented – however, it could be logical to use row compression in this case when we have two small integer values. Row compression would perform perfectly here.
Hope, that information can shed some light on clustered columnstore index internal structure.