Category Archives: T-SQL

Sunday T-SQL Tip: Equality predicate on large varchar column

Sometimes you need to do the query based on equality predicate on large text column. It does not happen quite often but it happens.Unfortunately sometimes it’s impossible to create the index on such column. First, the size of the column could be greater than 900 bytes (so index would not be created). Another possibility – you have a lot of data and simply don’t want to use a lot of space for such index.

How to optimize the query in such case? Let’s take a look at the example. First, let’s create the table and add some data there. As you can see, LargeField would have a size of 1080 bytes.

Now let’s try to run the select with the clustered index scan. 9150 reads – quite bad.

Index cannot be created – 900 bytes is the limitation.

The solution is persisted calculated column with CHECKSUM() function that generates hash of the string. Let’s take a look:

There are 2 things we need to keep in mind. First – hash is not guaranteed to be unique. So statement still needs to include the predicate on the field. Second, unicode and regular strings obviously return different hash. It should not be generally a problem although make sure that client application specify correct parameter data type.

Source code is available for download

Sunday T-SQL Tip: Insert/Update pattern

Let’s think about the system that collects and processes the data. Quite often processing happens after data have been inserted into the system. Typically the rows have set of the columns inserted as NULL. Processing routine loads the rows, update those columns and save rows back.

Looks simple but this pattern introduces major page split/fragmentation issues. Think about that – SQL Server tried to fill pages up to 100%. So when you update the rows, pages don’t have enough free space to keep the data. Let’s look at the example:

First – let’s create the table. There are only 3 columns in this table – identity, char column as the placeholder for the columns populated during insert stage and post processing placeholder.

Second, let’s check the physical index statistics. Index is practically perfect.

Now let’s execute update and populate post processing placeholder column. Now check the statistics – terrible fragmentation and a lot of space wasted.

Obviously the workaround is simple – reserve the space for the post processing columns on the insert stage.

Source code is available here

Sunday T-SQL Tip: Union vs. Union All

What could be simpler than union? Probably “union all”.

“Union” removes all duplicates from the combined row set. Union all simply concatenate the row sets. Obviously, if original row sets are unique, results would be the same. But plans are not.

Take a look here. Let’s create 2 tables with some data.

Now let’s do the union and union all and see the plans.

Union introduces another expensive “Distinct Sort” operator. As result the execution plan with union is almost 2 times more expensive than with union all. Of course, your mileage may vary – in the case of the large and complex queries the higher cost of “Union” operator could be hidden but in any case, it’s more expensive. So don’t use it if rowsets don’t have duplicates.

Speaking of the duplicates – there is another thing you need to keep in mind. When union removes the duplicates from combined rowsets, it does not really matter where duplicates come from. So you don’t need to use distinct as part of the selects. And it affects the plan too. Look here:

Source code is available for download from here

Sunday T-SQL Tips: Inserted, Deleted tables and OUTPUT clause (Part 4 – insert of the master/detail data with SQL SERVER 2005)

Last Sunday we saw how we can insert packet of the master/detail rows with the multiple master rows with MERGE statement. Obviously this approach would not work with SQL Server 2005. So what can be done there?

The problem that SQL Server does not guarantee the order in which rows are inserted with insert/select statement. It does not guarantee that with insert/select..order by either. Although based on the blog post from SQL Server Engine Team, insert/select..order by guarantees the order in what identity values are generated.  It says: “INSERT queries that use SELECT with ORDER BY to populate rows guarantees how identity values are computed but not the order in which the rows are inserted “.

This is something we can deal with. So let’s start with the same tables we had last Sunday:

The structure of the master and detail packages are slightly different:
1. Ids table does not have internalId column anymore
2. InternalID columns values in @MasterData and @DetailData should start with 1. This is important!

So we know the order in what identity values are generated. If our InternalIds in the tables are the same (started with 1), we can use it the approach below.

This should do the work. Code can be downloaded here 

Sunday T-SQL Tips: Inserted, Deleted tables and OUTPUT clause (Part 3 – Merge and insert of the master/detail data)

Last Sunday we saw what we can do with the MERGE statement and OUTPUT clause. Today I want to show you the practical angle of this feature.

Let’s assume you want to insert packet of master/detail rows in the situation when Master table has ID as identity field.

This is the trivial task if you have only one master row – you can accomplish it with insert statement, get the identity with @@identity or SCOPE_IDENTITY() and next perform insert of the detail rows. The situation is more complicated if you have multiple master-detail rows in the packets. In such case MERGE and OUTPUT would help. First of all, you need to have “internal” unique ID to link master and detail rows in the packets. Let’s see that:

Next, you need to insert the data to the master table with the MERGE operator and get both – internal ID and inserted (identity) ID from the packet. After that you can use simple insert statement.

Code can be downloaded from here

Sunday T-SQL Tips: Inserted, Deleted tables and OUTPUT clause (Part 2 – Merge statement)

Last Sunday we saw how we can intercept new and old values from the row during insert/update/delete operations with OUTPUT clause. Even if it’s very powerful mechanism, there is one very important limitation. You can intercept only values from the row of the table that gets updated.

So, for example, if you have:

you can intercept the values from DestTable rows but you cannot get any values from SourceTable rows.

In SQL Server 2005 you’re stuck. Fortunately SQL Server 2008 has the solution for that problem – Merge operator.

So what is MERGE? In short – it’s insert/update/delete operator  – all-in-one. Assuming you have a source table and need to do that batch update based on another table. So you need to delete rows on the source table that do not exist in the second table, insert new rows from the second table and update modified rows. Let’s see that:

So Refresh process should delete row with ID = 1 from the SourceTable, insert row with ID = 7 and update rows with ID = 3 and 6.

Let’s see that in action. SQL 2005 approach is to use separate statements:

As you can see it works. But there are 3 statements. Longer locking, side effects based on the isolation levels, etc etc etc.

Now let’s do the same thing with MERGE operator:

As you can see – one operator and that’s all. Very useful and very very very convinient. But that’s not all – let’s see what can be done with the OUTPUT clause there:

It’s not only allowing you to capture old and new version of the row, it also allows to capture ACTION and the field from the Source (DataTable). The last one cannot be done with the regular insert/update/delete statements.

We will see what benefits will it give us next Sunday.

The code can be downloaded here

Sunday T-SQL Tips: Inserted, Deleted tables and OUTPUT clause (Part 1 – insert/delete/update statements)

During the next a few Sundays I’m planning to talk about one of the under-appreciated constructs in T-SQL – OUTPUT clause. This clause can help you a lot when you need to develop set-based code or convert old cursor-based code into the set operations.

Output clause works together with inserted and deleted system tables. Every developer who wrote at least one DML Trigger familiar with those tables. Those are 2 tables that SQL Server populates and manages automatically. Inserted table contains the new version of the row values. Deleted tables contains the old version of the row values. As you understand inserted table has the data during insert and update operations. Deleted table has the data during update and delete operations.

Let’s see that in the action. First let’s create a table:

Now let’s assume we want to implement the audit trigger. Below is very simple and straightforward approach how we can do that:

Now let’s test the solution.

Below is the XMLs

Anyway, the good and very useful thing that inserted and deleted tables are available not only with the triggers but with insert/update/delete statements. Let’s take a look. First – insert:

As you can see – you have access to generated identity values. You can see that it  works the same way with updates and deletes.

One particularly interesting thing is what happen if statement is rolled back. Let’s populate data to the table again and create the trigger which rollbacks the transaction

Now let’s run the update statement:

As you can see – statement is rolled back – the data has not been changed in the original table but deleted/inserted table were created and populated. Output clause also worked. Such behavior opens the door to the interesting possibilities, for example audit on rollback.

You can download the script from here. Next Sunday we will talk about OUTPUT clause with MERGE statement. This combination is much more powerful than OUTPUT with insert/update/delete.

Sunday T-SQL Tip: APPLY operator

One of the new features of SQL 2005 is APPLY operator. Based on books online APPLY is:

The APPLY operator allows you to invoke a table-valued function for each row returned by an outer table expression of a query. The table-valued function acts as the right input and the outer table expression acts as the left input. The right input is evaluated for each row from the left input and the rows produced are combined for the final output. The list of columns produced by the APPLY operator is the set of columns in the left input followed by the list of columns returned by the right input.

A bit confusing if you read it for a first time. Let’s try to clarify it. Think about APPLY the same way as about the JOIN. The difference is that JOIN tables are independent from each other but APPLY is dependent from the left source. There are 2 types of APPLY – CROSS APPLY (think about it as about it as about inner join) and OUTER apply (outer join)

Let’s see it in the example. This will use the orders table created earlier (code can be downloaded from here).

Let’s create the Customers table and populate it with the data

Assuming we want to return the result set that returns 2 most recent orders per customer. Let’s create inline table-valued functions which can do that.

Now let’s write select with cross apply – again think about it as about inner join that joins customer data (source) with 2 rows per customer produced per table-valued function.

Here it is. As you can see – it’s quite simple. What’s interesting about it – you don’t really need to use the function – you can simply put dependent select into the FROM cause. Look below:

You can download the code from here

Sunday T-SQL Tip: How to generate “Pseudo-identity” values

There are some cases when you don’t want to create physical identity column in the table and want to generate the similar value manually. There are a few methods how to do that. Let’s look at 2 of them.
The first method is the Counters table. Basically you’re creating the table like that:

When you need the new value, you simply get the current one and update it with the new one. It could be wrapped up to the stored procedure like that:

The beauty of this method is the locking – update statement places the exclusive lock on the row so until transaction is active, no other sessions would be able to update the same row. Be careful though with SNAPSHOT isolation level – it would produce an exception during simultaneous access rather than serialize the access.

Let’s test that:

Second method is using identity but from another dummy table:

Let’s see how we can get the single value:

And next – the group of values:

Obviously this method would not protect from the gaps in the values.

You can download the source code from here

Sunday T-SQL Tip: Application Locks

There is the set of the tasks when you need to serialize access to some T-SQL code. Assume you have multiple instances of the data processing applications running simultaneously and does not want them to load the same data for the processing. Or you want to perform some bulk operation and want to stop the client from inserting the new data during this period.

Obviously one of the options is to use transactions in serializable isolation level. The problem with that – it could be more restrict than you want to. Supposedly you don’t want to block access to the data but rather introduce something similar to CriticalSection you have with the client development.

Fortunately SQL Server has the set of the stored procedures you can use for such purposes: sp_getapplock and sp_releaseapplock. sp_getapplock allows you to obtain shared or exclusive “user” lock on transaction or session context. If you run this SP in transaction scope, the lock would be released by the end of transaction. Otherwise when session ends.

Let’s see it in action. Let’s run the following statement from 2 different sessions

And here are the results from the both sessions:

As you can see it does the trick.

Update (2013-05-08): This post would provide more details about different methods of serialization available in SQL Server