Monthly Archives: September 2010

When SQL Server uses non-clustered indexes

Last time we took a look at the structure of the indexes. Now let’s talk about index usage in general

Assume you run the query that requires the clustered index scan. Obviously it would require SQL Server to read every data page of the clustered index for the processing. Same time, number of physical reads would be much less than the logical reads. SQL Server uses technique called “Read ahead”. In the nutshells, SQL Server reads up to 64 data pages during the single read operation. So next time SQL Server needs to read the next data page, it most likely would be present in the buffer cache. It still introduces the logical read but obviously would not produce the physical read.

Now let’s think about the operation that scans non non-clustered index. As we know, SQL Server needs to traverse clustered index b-tree for every key/bookmark lookup operation. Even if read-ahead technique would work for the non-clustered index, actual table (clustered index) data could reside anywhere in the data files. So this is basically random IO operation.

Let’s see the example. First of all, let’s create the following table:

Let’s evenly distribute the data for the Name column with the combination of: AAA, AAB, … ZZZ adding 5 rows per each combination and create the non-clustered index on that field

Let’s see some statistics.

As you can see the table has 87880 rows. Clustered index leaf level has 4626 pages and non-clustered index leaf level has 296 pages. Let’s do the tests.

First test. Let’s select rows with Name = ‘ABC’

As you see it returns 5 rows only and leads to 18 logical reads. SQL Server would use non-clustered index for this case.

Second test. Let’s select rows with name starting with ‘AB’

As you see it returns 130 rows and produces 410 logical reads. SQL Server uses non-clustered index. Another thing worth to mention that 99% of the cost of the query is the key lookup.

Next test. Let’s select rows with name starting with ‘A’. Basically it returns 1/26th of the rows which is about 3.8%

As you can see, SQL Server decided to do the clustered index scan which leads to 4644 logical reads.

Let’s run the same select forcing SQL Server to use non-clustered index

As you can see, it produces 10375 logical reads – 2+ times more than clustered index scan. And without any help from read ahead.

So you see – threshold when non-clustered index scan becomes more expensive is really small. SQL Server will not use non-clustered index with key/bookmark lookup in the case if it expects iterator to return more than a few % of the total rows from the table.

Where does it leave us? Make the index highly selective.

Code can be downloaded from here

Sunday T-SQL Tip: Application Locks

There is the set of the tasks when you need to serialize access to some T-SQL code. Assume you have multiple instances of the data processing applications running simultaneously and does not want them to load the same data for the processing. Or you want to perform some bulk operation and want to stop the client from inserting the new data during this period.

Obviously one of the options is to use transactions in serializable isolation level. The problem with that – it could be more restrict than you want to. Supposedly you don’t want to block access to the data but rather introduce something similar to CriticalSection you have with the client development.

Fortunately SQL Server has the set of the stored procedures you can use for such purposes: sp_getapplock and sp_releaseapplock. sp_getapplock allows you to obtain shared or exclusive “user” lock on transaction or session context. If you run this SP in transaction scope, the lock would be released by the end of transaction. Otherwise when session ends.

Let’s see it in action. Let’s run the following statement from 2 different sessions

And here are the results from the both sessions:

As you can see it does the trick.

Update (2013-05-08): This post would provide more details about different methods of serialization available in SQL Server

Indexes structure

Ok, enough about tables for now. Let’s talk about indexes. As all of us know, there are 2 types of the indexes in SQL Server:

1st – Clustered indexes. This is one index per table and basically specifies the order how data is stored in the table. For example, if the table has the clustered key index on the integer field, it means data will be actually sorted by that integer field. Please don’t be confused – there is still no such thing like default sorting order for the queries – the order of the rows SQL Server returns would depend on the execution plan which could be different than clustered index scan.

SQL Server does not require you to create the clustered indexes – tables without such indexes called heap tables. We will talk about such tables later.

2nd type of the indexes are non-clustered indexes. SQL Server allows to have up to 249 non-clustered indexes per table in SQL 2005 and 999 non-clustered indexes in SQL 2008 (thanks to Chirag for pointing this out). This is “just an index”.

If you think about the book, page number is the clustered index. Alphabetical annotation at the end of the book – is the non-clustered index.

So what exactly is the index? This is B-tree. Let’s see what is that.

The image above shows a small table with ID as the primary key (and clustered index). As the side note, SQL Server creates clustered index on the primary key field by default.

Leaf level (the bottom one) contains actual table data sorted by ID. As you can see, the data pages are linked into the double-linked list so SQL Server can scan the index in both directions.

Levels above the leaf level called “intermediate levels”. Every index row on those levels points to the separate data pages in the level below. At the top level (root level) there is only one page. There could be several intermedite levels based on the table size. But the top root level of the index always has 1 data page.

So let’s see how it actually works: Assuming you want to select the record with ID = 50. SQL Server start from the root level and find that first row contains ID=1 and the second row contains ID=57. It means that the row with ID=50 would be located on the data page started with ID=1 on the next level of the index. So the next step is analyzing the first data page on the intermediate level which contains IDs from 1 to 50. So SQL Server finds the row with ID=50 and jump on the leaf level page with the actual data.

Now let’s look at the non-clustered index. Assuming we have the index by Name field.

The structure of the index is exactly the same with the exception that leaf level does not contain table data but values for the clustered index. It does not really matter if you specify ID in the index definition, it would be there. For the heap tables, leaf level contains actual RID – Row id which consists of FileId:PageNumber:RowNumber. Annotation at the end of the book is a good example. It does not include the actual paragraph from the book but the page # (clustered index)

Let’s see how SQL Server works when it uses non-clustered index for the lookups on the tables with clustered index. As you can see, first it needs to find ID of the row(s) and next perform clustered index lookup in order to obtain the actual table data. This operation called “Key lookup” or “Bookmark lookup” on the previous editions of SQL Server.

2 things I would like us to remember:
1. Non clustered index has the value of the clustered index on the leaf level
2. As result when SQL Server use non-clustered index for lookup, it needs to traverse clustered index to get the value for the actual data row.

This introduce interesting performance implications we will talk about next time.

Sunday T-SQL Tip: Context_Info

One of the interesting but not widely known features in SQL Server is Context_Info. This is basically connection/session specific 128 byte storage.

You can set it with: “Set Context_Info” set option. As for getting the value – there are a few methods available
1. Context_Info() function
2. context_info column in sys.dm_exec_requests and sys.sysprocesses views. Those value updates immediately within the batch
3. context_info column in sys.dm_exec_sessions view. This value is not updates until end of the patch

Let’s see that in action:


What can you do with that? Funny things, like recursion, for example:

As for the “more” practical example – we have the 3rd party system we have to integrate with our database and applications. The integration is done on triggers – when table is updated, trigger modifies 3rd party database and vise verse. In a few specific cases we should not propagate the changes – in such case context_info is ideal. We set this value before the update and check this in the trigger. If flag is set, update is not running.

Vertical Partitioning as the way to reduce IO

Let’s think about the following scenario. Assuming we have a table with a lot of data. Some of the large fields in this table are rarely accessed. Same time there are constant scans on the subset of the fields.

The good example is large Articles catalog in the system where customer can search by various different columns/attributes.  Let’s assume Article has the description field which holds 1000-2000 bytes in the average. Assuming customers are searching on the part of the article name, item number, price, etc etc etc and 98% of the customer activity are searching and browsing the results. Let’s say that only 2% of requests are opening the article properties and looking to the description.

Assuming all those searches will lead to the clustered index scans. As we already know, we want to minimize the number of  IO operations so we want to make row as small as possible. Let’s see the example:

Now let’s run the select based on the part of the item number:

Obviously one of the ways how to improve the situation is create non-clustered index on all fields which could be in the predicate list. Problem with this approach that in most part of the cases, SQL would not be able to do the index seek. It can still do the index scan but price of this option would really depend on the number of the records in the result set – key/bookmark lookups can make it really expensive. Obviously you create covered index and include every field you need for the browse – in our case there are Name, Item Number, Price and  OtherFieldsShownInBrowse, but it would make index leaf very big. In some cases though, it still acceptable and better than the solution I will show below.

Let’s change the schema and split Articles table vertically:

To simplify the situation – let’s create the view and join both tables.

Now, let’s run the previous select against view:

As we see that situation is even worse – even if we don’t need Description attribute, SQL Server joins the both tables. It does it because of the inner join. SQL Server needs to filter out all records from ArticleMain where it does not have corresponding records in ArticleDescriptions.

Let’s change it to the outer join:

Now let’s run select again:

Now when we run the select, we see, that it scans ArticleMain table only. Results are much better. If we need to select fields from the both tables (for example when user want to see the properties, it would do the join.

We can even create INSTEAD OF trigger on the view in order to cover insert and update operations and simplify the client development if needed.

Again, I don’t want to say that this method is right in every case. As bare minimum you should have:
1. A lot of data in the table
2. Scans with the predicates which do not support INDEX SEEK operations and return a lot of data
3. Research if regular (covered) index would be more beneficial than this solution.

But in some cases vertical partitioning would help.

Sunday T-SQL tip: Uniqueness of nullable field

Assuming you have the table with nullable field. What if you need to support uniqueness on not-null values only? For example, you can have the customers table with optional SSNs – not all customers provide it but if SSN is provided, you want to make sure it’s unique.

Regular index is not an option – SQL Server would not allow you to have more than 1 null value.

SQL Server 2008 provides great feature – filtered index – you will be able to create it on not null fields only.

Now when you insert duplicate value – it fails:

With SQL 2005 it’s not an option. Although you can use the indexed view:

Now when you insert the duplicate, it violates clustered index on the view and fails

LOB Storage

Let’s talk how SQL Server stores LOB data. Let’s start with restricted-length large objects – objects which are less or equal than 8000 bytes. This would include varchar, nvarchar, varbinary, CLR data types and sql_variants.

Let’s try to create the table with large varchar columns and insert some data:

As you can see, table created successfully and row with size slightly bigger than 8,800 bytes have been inserted. Let’s check what data pages do we have for this table

As you can see, there are 2 types of the pages – IN_ROW_DATA and ROW_OVERFLOW_DATA. You need to deduct 1 from total pages count (IAM pages), so in this case you see 1 page for IN_ROW_DATA (populated with Field1, Field2, Field3 column data) and 1 page for ROW_OVERFLOW_DATA (for Field4 column data).

So restricted-length LOBs are stored either completely in-row or in row-overflow pages if does not fit. In the second case, there is 24 bytes pointer + 2 byte in offset array in the row.

Let’s see what happened if you update the row:

Now the row size is slightly less than 8,000 bytes so row “should” fit on the single page. Although it still uses row-overflow page. SQL Server does not bother to check. In real life the threshold is about 1000 bytes. Let’s see that.

As you see, we updated another field for more than 1,000 bytes and row_overflow page is gone.

For unrestricted-length data (text, ntext, image), the situation is different. SQL Server also stores it on the own set of the pages with 16 bytes (by default) pointer in the row. The data itself will be organized in B-TREE matter similar to regular indexes with pointers to the actual blocks of data. I don’t show the examples here – but you get an idea. The type of the data pages would be TEXT_MIXED – if page shares the data from the multiple rows or TEXT_DATA if entire chunch of the data on the page is the single value.

What also worth to mention is “text in row” table option. This option controls if some part of the LOB data needs to be stored in the row.

And finally the (max) types. This is quite simple. If the size of the data <= 8000 bytes, it stores as restricted-length objects in the row_overflow pages. If more – it stores as unrestricted-length data.

You can create quite big rows with a lot of data on the row-overflow and text_mixed/text_data pages. But don’t forget, that SQL Server will need to read other page(s) when accessing the row. Not really good in terms of performance.

Sunday T-SQL tip: Temporary table variables and transaction context

Recently I’ve been asked if it’s possible to partially commit the transaction. Interestingly enough – the answer is not simple. SQL Server does not support nested transactions. Although there are a few tricks you can use.

First is savepoints. This will allow you to save transaction state at some point, rollback to that point and commit everything you did prior that point. Obviously it will help you to commit only first (beginning) part of the transaction.

Let’s see it in action. Let’s create the small table.

Next, let see savepoints in action.


Good. Now assumming we would like to do opposite task – commit the “tail’ actions after specific point. Unfortunately you cannot do it with regular T-SQL statements. Fortunately there is the trick – temporary table variables. They are not supporting transaction context. Let’s see it in action:

As you can see, you can create temporary table variable, populate it with data, rollback the transaction and next save data in the separate transaction. It worth to mention that regular temporary tables do not work that way – they fully support transaction context.

One of the practical usage of this scenario is audit on rollback. You can save required data into temp table variable, rollback data modification attempt and write audit information to the audit table.

P.S. Just wonder if anybody reads it 🙂

Hidden facts about table alteration

Last time we saw how row size affects performance during the scan operations. I hope you have not altered columns in your tables already. Please don’t blame me if you did – most likely it made the situation even worse. So let’s talk about table alteration today.

There are 3 types of the alterations:

1. Metadata only. This type of alteration can be done based on metadata only. For example, it happens when you add new nullable column, increase the size of variable-width column, change not null column to nullable, etc

2. Metadata change with data check. For example if you need to change nullable column to not null, SQL Server needs to check that there are no rows with null values for this column and after it update the metadata. Decrease of variable-width column is another example – SQL Server needs to check that there are no values larger than the new size

3. Every row needs to be rebuild. Example is: adding new not null column, some data type changes and so on.

Unfortunately one thing is not commonly known – alteration of the table never (repeat) never decreases the row size. When you drop the column, SQL Server removes column from the metadata but does not reclaim/rebuild the row. When you change column type from int to tinyint, for example, actual storage size remains intact – SQL Server just checks domain value on insert/update stages. When you increase the size of the field (for example change int to bigint), SQL Server creates another bigint column and keep old int column space intact.

Let’s see that in details. Let’s create the table:

First, let’s drop Int2Column

As you can see, column definition is gone although all offsets remain the same. 4 bytes simply wasted.

Next, let’s alter Bigint data type to tinyint

As you can see, column now accepts only 1 byte (max_inrow_length) although still uses 8 bytes in offsets. So another 7 bytes are wasted.

And finally let’s alter IntColumn from int to bigint.

As you can see, even if IntColumn column_id  still equal 1, SQL Server “moves” column to the end of the row – new offset is 30. As you can see now, data row has 15 bytes wasted (4 – 11, 13 – 19).

How to “fix” that? Well, you need to rebuild clustered index. SQL Server will reclaim all space and rebuild the rows when you do that. With Heap tables you’re out of luck. Only workaround is build temporary clustered index (which rebuilds the table) and drop it after that.

The biggest problem with that – clustered index rebuild is time consuming operation which locks the table. If you have huge transaction table – it would be nearly impossible to accomplish. So be careful when you design the table structure