Category Archives: General

LOB Storage

Let’s talk how SQL Server stores LOB data. Let’s start with restricted-length large objects – objects which are less or equal than 8000 bytes. This would include varchar, nvarchar, varbinary, CLR data types and sql_variants.

Let’s try to create the table with large varchar columns and insert some data:

As you can see, table created successfully and row with size slightly bigger than 8,800 bytes have been inserted. Let’s check what data pages do we have for this table

As you can see, there are 2 types of the pages – IN_ROW_DATA and ROW_OVERFLOW_DATA. You need to deduct 1 from total pages count (IAM pages), so in this case you see 1 page for IN_ROW_DATA (populated with Field1, Field2, Field3 column data) and 1 page for ROW_OVERFLOW_DATA (for Field4 column data).

So restricted-length LOBs are stored either completely in-row or in row-overflow pages if does not fit. In the second case, there is 24 bytes pointer + 2 byte in offset array in the row.

Let’s see what happened if you update the row:

Now the row size is slightly less than 8,000 bytes so row “should” fit on the single page. Although it still uses row-overflow page. SQL Server does not bother to check. In real life the threshold is about 1000 bytes. Let’s see that.

As you see, we updated another field for more than 1,000 bytes and row_overflow page is gone.

For unrestricted-length data (text, ntext, image), the situation is different. SQL Server also stores it on the own set of the pages with 16 bytes (by default) pointer in the row. The data itself will be organized in B-TREE matter similar to regular indexes with pointers to the actual blocks of data. I don’t show the examples here – but you get an idea. The type of the data pages would be TEXT_MIXED – if page shares the data from the multiple rows or TEXT_DATA if entire chunch of the data on the page is the single value.

What also worth to mention is “text in row” table option. This option controls if some part of the LOB data needs to be stored in the row.

And finally the (max) types. This is quite simple. If the size of the data <= 8000 bytes, it stores as restricted-length objects in the row_overflow pages. If more – it stores as unrestricted-length data.

You can create quite big rows with a lot of data on the row-overflow and text_mixed/text_data pages. But don’t forget, that SQL Server will need to read other page(s) when accessing the row. Not really good in terms of performance.

Hidden facts about table alteration

Last time we saw how row size affects performance during the scan operations. I hope you have not altered columns in your tables already. Please don’t blame me if you did – most likely it made the situation even worse. So let’s talk about table alteration today.

There are 3 types of the alterations:

1. Metadata only. This type of alteration can be done based on metadata only. For example, it happens when you add new nullable column, increase the size of variable-width column, change not null column to nullable, etc

2. Metadata change with data check. For example if you need to change nullable column to not null, SQL Server needs to check that there are no rows with null values for this column and after it update the metadata. Decrease of variable-width column is another example – SQL Server needs to check that there are no values larger than the new size

3. Every row needs to be rebuild. Example is: adding new not null column, some data type changes and so on.

Unfortunately one thing is not commonly known – alteration of the table never (repeat) never decreases the row size. When you drop the column, SQL Server removes column from the metadata but does not reclaim/rebuild the row. When you change column type from int to tinyint, for example, actual storage size remains intact – SQL Server just checks domain value on insert/update stages. When you increase the size of the field (for example change int to bigint), SQL Server creates another bigint column and keep old int column space intact.

Let’s see that in details. Let’s create the table:

First, let’s drop Int2Column

As you can see, column definition is gone although all offsets remain the same. 4 bytes simply wasted.

Next, let’s alter Bigint data type to tinyint

As you can see, column now accepts only 1 byte (max_inrow_length) although still uses 8 bytes in offsets. So another 7 bytes are wasted.

And finally let’s alter IntColumn from int to bigint.

As you can see, even if IntColumn column_id  still equal 1, SQL Server “moves” column to the end of the row – new offset is 30. As you can see now, data row has 15 bytes wasted (4 – 11, 13 – 19).

How to “fix” that? Well, you need to rebuild clustered index. SQL Server will reclaim all space and rebuild the rows when you do that. With Heap tables you’re out of luck. Only workaround is build temporary clustered index (which rebuilds the table) and drop it after that.

The biggest problem with that – clustered index rebuild is time consuming operation which locks the table. If you have huge transaction table – it would be nearly impossible to accomplish. So be careful when you design the table structure

Why Row Size Matters

We already saw that correct data types could decrease the row size. Why is it important? Well, obviously one of the reasons is the database and backup file sizes. Although this is not the only and frankly not the main reason.

When SQL Server reads or writes data, the data page needs to be in the memory. If data page is not present in the buffer pool, SQL Server needs to read the page from the disk. Alternatively, when data page has been modified, SQL Server needs to write this page to the disk. Physical I/O is one of the most expensive operations. So it leads to the simple conclusion:

Bigger Row Size = Less Rows Per Page = More data pages per table = More I/O operations = Performance degradation

Unfortunately I/O is not only thing affected. RAM and Buffer Pool size are limited. More data pages need to be processed/read from disk, more “old” pages would be flushed away from the Buffer Pool.

As the side note – this is one of the biggest problems with non-optimized queries. Those queries are not only slow by themselves, those are also flushing Buffer Pool with unnecessary data and compromise the system performance in general. There are 2 reliable performance counters you can see in Performance Monitor (perfmon.exe) under “SQL InstanceBuffer Manager” group: Page life expectancy – indicates how long the data page stays in cache (production value should be > 300 seconds) and Buffer cache/hit ratio – indicates % of the cases when requested data page is already in the cache (production value should be >96-97%). If you monitor those counters and see low values there, most likely the problem is non-optimized queries (assuming, of course, that you have enough RAM available for SQL Server).

Let’s forget about Buffer Pool for a minute and just focus on I/O cost. Let’s create 2 tables and populate them with 50,000 records

LargeRow table row size is a little bit more than 2,015 bytes, so it would fit 4 rows per page. SmallRow table row size is just 22 bytes. Let’s see how many data pages every table has:

As we can see, LargeRows table uses 12,500 data pages when SmallRows uses 149.

Now let’s run some tests and see how fast is table scan:

As you can see scan of SmalRows table is about 45 times faster than scan of LargeRow.

Obviously we rarely scan entire table in real life. Although above is true for every partial scans. So always use appropriate data types.

What is the optimal size for variable width columns?

Yesterday we had the discussion during Tampa Bay SQL Server User Group meeting regarding the size of variable width columns. The question was if it matters how to define such columns – for example if column holds 100 characters should it be defined as varchar(100), varchar(1000) or varchar(max).

The answer is not so simple. From the storage prospective it does not really matter. Row stores actual data size + 2 bytes in offset array. The question though is how it affects performance.

When Query Optimizer generates the query plan, it needs to estimate how much memory needed for the query execution. It estimates the number of rows returned by each specific iterator (the valid statistics is the key) as well as the row size. And this is where it comes to play. SQL Server does not know if varchar(100) column holds in average 1 character or 100 characters. So it uses simple assumption – 50% of declared column size and 4000 bytes for (max) columns.

Correct estimate is especially critical for sort and hash operations – if operation cannot be done in memory, Sql Server flushes the data to tempdb and do the operations there. And this is the major performance hit. Let’s see this in action.

Let’s create the table with 2 varchar fields – 100 and 210 characters and populate this table with 1M rows. Let’s put 100 characters into both varchar fields.
create table dbo.Varchars
(
ID int not null,
Field1 varchar(100) not null,
Field2 varchar(210) not null,
primary key (ID)
)
go

declare
@I int = 0
begin tran
while @I < 1000000
begin
insert into dbo.Varchars(ID, Field1, Field2)
values(@I,REPLICATE(CHAR(65 + @I % 26),100),REPLICATE(CHAR(65 + @I % 26),100))
select @I += 1
end
commit

Now let’s run a few tests. Let’s run SQL Profiler and trace “Sort Warning” event. That event basically tells us when data is flushed to TempDb for the sorting purposes. Let’s run the simple query:
declare @A varchar(1000)
select @A = Field1
from dbo.Varchars
where ID < 42000
order by Field1

Let’s look at the Query Plan – as you see row size estimated incorrectly.

We also can see Sort Warning in SQL Profiler

Finally – we can see that query uses 6656Kb

Let’s run the second query:
declare @A varchar(1000)
select @A = Field2
from dbo.Varchars
where ID < 42000
order by Field2

Same query plan but different row size.

No Sort Warnings in SQL Profiler. As for the memory, this query uses 8088K.

Finally let’s run it 100 times in the loop and compare the execution statistics:

As you can see the first query is 25% slower than the second one. This is quite simple example – with complex queries the difference could be dramatic.

So on one hand it looks like that it’s good idea to always define variable width columns really wide. On other hand it introduces the different problem – you don’t want to overestimate row size. It will lead to the bigger memory consumption as well as waits for the large memory grants could be longer than to the small ones. I would suggest to define those columns in about 2 times bigger than actual average data size, so row size estimate would be as accurate as possible.

How SQL Server stores data (Extents, Data Pages, Data Row for In-Row data)

UPDATE (2013/10/16): It is time to refresh the content. I am writing set of posts about SQL Server Storage engine – how it stores data and what happens in database files under the hood. Please check it here.

This is still work in progress and old content is available below.

I know, I spent too much time on the boring talk about “key decisions”, “performance problems” etc. Let’s finally start to talk about practical and useful things. I know, we’re ready to write the first “create table” statement. But before we do that, let’s take a look at how SQL Server stores data. All information below applies to both SQL Server 2005 and SQL Server 2008.

Everything is stored on 8K data pages (8060 bytes are technically available). 8 data pages (64K) combines into an extent. There are 2 types of extents – mixed extent which stores data pages belong to the different objects and uniform extent which stores data pages belong to the one object. First 8 pages for the objects are stored in the mixed extents, after that only uniform extents are used. All space allocation  is done based on extents – 64K blocks regardless of the type (mixed or uniform).

There are a few special data pages types SQL Server is using to track extents allocation. Those pages are basically bitmaps – every bit handles one extent. So one page can cover 64,000 extents or almost 4Gb of data. Let’s take a quick look at them:

GAM – Global Allocation Map – tracks if extent are available for allocation or already in use.
SGAM – Shared Global Allocation Map – tracks if extents are mixed extent and have at least one data page available for use.
IAM – Index Allocation Map – tracks if extents are used by specific table/index.

There is also another special page type PFS – Page Free Space – tracks approximate amount of the free space in the page, as well as a few other things. One PFS page covers 8,088 pages or about 64Mb of data.

Let’s dive into one level deeper and take a look at the structure of the data page.

As you can see, first 96 bytes is the header. After that page contains the set of data rows and ends with offset array. 2 things are worth to mention. First – each row uses 2 extra bytes for offset storage. And second, data on the page is not sorted.

Let’s dive one level deeper and take a look at the classical data row structure for in-row data.

First 2 bytes contain header information. Next 2 bytes store the length of the fixed width data following by the data. Next are 2 bytes for the number of columns. Next, null bitmask (1 byte per 8 nullable columns). It follows by 2 bytes store number of variable width columns, variable width  column offset array (2 bytes per variable column) and variable width data. And finally there is the optional 14 bytes pointer to the version store. This one is used for optimistic isolation levels (snapshot, read committed snapshot), MARS, etc.

So what is important. First of all, fixed width data always uses space even when null. Variable width data uses 2 extra bytes for offset storage for every value. Null values of variable width data are not stored although there are still 2 bytes in the offset array unless null values are last in the row. Sounds confusing? A little bit 🙂

So where does it lead us? Let’s think about the table which accepts some transactional data.
create table dbo.TranData
(
...
TranDate datetime not null,
Amount float not null,
IsApproved int not null,
IsPending int not null,
Created datetime not null constraint DEF_TranData_Created default (getDate())
...
)

5 fields we have in this table require 8 + 8 + 4 + 4 + 8 = 32 bytes. Now let’s think for a minute. Do we really need to store transaction date/time with precision up to 3 milliseconds? Would 1 minute be OK? Same about Created column. Can we use 1 second precision? What about Amount? Can we use smallmoney or maybe decimal(9,3)?

Let’s modify the table a little bit:
create table dbo.TranData
(
...
TranDate smalldatetime not null,
Amount decimal(9,3) not null,
IsApproved bit not null,
IsPending bit not null,
Created datetime2(0) not null constraint DEF_TranData_Created default (getDate())
...
)

Now it requires: 4 + 5 + 1 + 0 (8 bit fields shares 1 byte of storage space) + 6 = 16 bytes. We ended up with 16 bytes of saving. Not much. On other hand, this is about 16K per 1000 rows. Or about 16Mb per 1M rows. And what if your system collects 1M rows per day? It would be ~5.8Gb per year. What if you have 50M rows per day..? Finally it’s not only about the storage size. It greatly affects performance of the system because of the extra IO operations and other things (we will talk about it later).

So always use correct data types. But don’t be cheap – 65,000 customers is a lot when you start the project. In a year from now you will spend hundreds of hours altering your code and replacing CustomerId smallint to int. It is not worth it.

Sources of the performance problems

So all key decisions have been made. But before we go any further let’s think what are the common sources of the performance problems.

SQL Server is bad

When you hear that from someone it means one of the two possible things. This person is either completely green or this person is seasoned professional with a lot of experience in specific areas. Yes, there are some areas where SQL Server (or matter of fact any relation database) is not the best choice. For example, let’s assume you want to create a session store for the web portal. There are no critical data involved and if it crashes customers just need to login to the system again. They would probably be upset but in the long run nothing happened. Would you trade relational database features and transaction consistency for the speed in such circumstances?

Oh, forgot to mention – there is the 3rd reason. Oracle DBAs are typically better paid 🙂

Bad hardware

I would say it’s most popular complain I heard speaking with customers. Yes, for sure, there are the cases when the system outgrows the infrastructure. It would be silly to expect single-cpu box with 4Gb of RAM to handle hundreds gigabytes database. On other hand, in most part of the cases server load produces by bad system design and implementation. For example, IO subsystem can be overloaded because of missing indexes and table scans which same time flush buffer cache and cause memory pressure. CPU load can be forced by extensive recompilations and so on, so on, so on.

Yes, you can improve situation upgrading the server but 1st you need to spend ton of money and you will achieve much better results simply optimizing your system.

Same time let I play Devil’s advocate. Sometimes it’s cheaper to upgrade the hardware than paying team of developers for the system redesign. It always depend on the goals

Bad design

This is good one. It’s so broad and unspecific that everything fits there. System is working slow – bad database design. You have concurrency problems – bad database design. You’re not the person who designed the system – granted, this system is badly designed.

In any case, what is bad database design? I’m not sure I can define it. There are some patterns and some practices needs to be avoided. There are some architectural decisions which cannot scale well. But, generally, if people knew what is the “bad design”, it would be already documented, tough in the universities and nobody would do that.

Honestly I would have the same problem if you ask me what is the good design. Fortunately or unfortunately there is no such thing like golden bullet. But as bare minimum you need to invest into it (hello Agile!).

You obviously should not over-architect. If you design the system which will be used by 50 people department in LAN, you don’t need to have a goal to handle the load from all 50,000 corporate employees accessing it by WAN. Unless, of course, you have unlimited time and budget. On other hand you should think that one day department could be 2 times bigger than now. Or even 10 times bigger.

Bad T-SQL Code

As the application developer, you spent entire life telling compiler how to do things and how to achieve the goal. This approach simply does not work with database development. You should tell database server what do you want to achieve instead.

This is much more complicated task than it sounds. You need to shift your mind into this direction. And it takes time.. A lot of time and practice.

Concurrency issues

Locking, blocking, deadlocks.. Those are making your life miserable. System which works perfectly during development become completely unresponsive after deployment. Fortunately it’s not as complicated as it sounds like. In most part of the cases, concurrency issues could be resolved by optimizing the queries and selecting correct locking strategy & transaction isolation level. We will dive into it later.

Key decision made – time for key decisions

So the big decision has been made – your team decided to use SQL Server. But stop. There are few other things need to be finalized before actual design process begins

SQL Server Version

SQL Server 2005 or 2008? If you can use SQL 2008 – use it. There are quite a few things which make you happy. I would mention only one of them – Table-Valued Parameters ( http://msdn.microsoft.com/en-us/library/bb510489.aspx ). It’s not only simplifying your life, it also increases performance. We’ve tested the pattern of inserting 5000 records (about 250 bytes each) with TVP versus individual inserts from client side, TVP approach was about 7 times faster.

SQL Server Edition

Obviously in most part of the cases decision is not driven by developers. There is the project budget which can include cost of the licenses; existing SQL Server instances; legacy issues and dozen of other factors. It also depends on if you develop hosted solution or box product where you should provide customer clear upgrade path from SQL Express to Data Center edition. In any rate, let’s assume that you have a choice.

There are ton of useful features in the Enterprise edition which make DBA happy. Although from development standpoint there are only 2 major benefits there: Table Partition and Data Compression (SQL Server 2008 only). Usefulness depends on the nature of the system. You need to have those features if you plan to have millions of the records in the transactional tables and plan to implement sliding window type functional and/or data archiving. Of course you can always do those things manually although it would greatly simplify your life if you can use Enterprise edition for those tasks.

Data Access Layer

Everybody agrees that good system needs it. Surprisingly a lot of teams do not implement that. Especially if there are no dedicated database and/or middle-tier developers. It’s time consuming, especially on initial stage. It introduces additional rules and restrict “the creativity’ of the developers, requires better collaboration between team members and so on, so on..

In any case, think about price of database refactoring at the future. It’s complicated process by itself. In the case if you don’t have data access code localized in one place, you will also spend a lot of time looking through thousands lines of code.

Architect DAL up front. It will save you a lot of time later.

SQL generators/ORM

Great thing.. Really simplifies life of middle-tier developers. Everything is simpler, development is faster, stakeholders are happier. Only minor downside – you often don’t have any control over SQL nor even understand what happens below the hood.

I don’t want to say that those frameworks are bad. In any rate, how many people are using assembler or even plain C today? We’re living in commercial development era. But for future sake, do the research. Check how good is SQL generated. And make sure that you can put custom, highly optimized code when needed. In practice, you rarely need to do that but when you need it, you should have this ability. There are dozen frameworks and solutions available on the market – you would be able to find the right one which suit your needs

How Agile is your team?

Agile is popular. Agile is beneficial. Stakeholders love it. Developers love it. Only problem that major Agile principle – “Do not overarchitect and spend time on features you don’t need now” does not work very well with database design and development. Especially if you deploy system to production after each iteration/sprint. The price of refactoring on the database level is far higher than on application level. Especially if system is in production already.

So if your team is using Agile based development process and you expect to have complex database backend with a lot of data and concurrent users – reserve good amount of time for database design. Create user stories like: “As an user I want to be able to use the system with 50 concurrent users querying orders table”. Put those stories to backlog with high priority. And convince the stakeholders that it worth it.

At bare minimum create data access layer. Otherwise you will spend iterations just to accomplish basic refactoring.

Intro

My first acquaintance with SQL Server was in 2002. Granted I had a couple projects where we utilized MSSQL 6.5 but I did not pay much attention there. So in 2002 we decided to use SQL Server 2000 as the backend database for our GPS Tracking system. Why SQL Server? Frankly, I don’t know. We did not care much – we thought about Oracle but it was more expensive, harder to maintain and last, but not least – neither of us had any PL/SQL knowledge. So we made the decision and our journey began.

As I said we did not know PL/SQL but the problem was that we did not know T-SQL either. Granted we knew how to write select/insert/update/delete statements; how to create tables but I would say that’s almost everything we knew. Surprisingly our initial design worked. Well, at least while we had less than 50 customers. At the day when every 5th customer called and asked why the system is so slow, we spent a couple days in research and decided that we need to change “a few things”. It was our first refactoring.

I think one of the biggest benefits and same time disadvantages of SQL Server is simplicity. Everybody can install it and build the database which will work. It does not force you to think ahead. You pay great price and suffer huge pain later but everything looks simple. You will have to refactor a system; then refactor it again, and again, and again..

We are on the same boat. Now we are handling 2000 TPS with servers working on 1/10th of capacity but in order to achieve that we accomplished 5 major refactoring in the last 8 years. And I believe it’s not the final number.

My goal is to share experience. I’m not going to talk about analysis or system architecture in general. Nor tell you how to correctly map business objects into the database tables. I will try to uncover some of SQL Server internals so you will be able to predict how SQL Server behaves. I’ll show you some tricks how to design the database, how to write queries and T-SQL code in the way which makes your system more efficient. I will try to be practical even if it’s not always possible.

If you find anything useful in this blog, I would be happy.