Monthly Archives: July 2011

Sunday T-SQL Tip (on Thursday): Set vs. Select

It’s been a while since I posted any Sunday T-SQL Tips. Well, let’s do it today even if it’s Thursday :). I feel a little bit guilty stopping to blog about locking and blocking. I promise, I will blog about optimistic isolation levels in 2 weeks.

Today I want to talk about 2 very similar operators: SET and SELECT. If you ask T-SQL Developer what to use, in most part of cases she/he answers that it does not really matter. Well, it’s correct – up to degree. Those operators work in the same way unless they work differently. Let’s take a look.

First, let’s create a table and populate it with 50 rows.

Now, let’s look at the similarities. Let’s query the row. As you can see, both operators work the same way. Plain and perfect.

Now let’s change the query and try to select the row that does not exist in the table.

And here is the first major difference – SET operator “sets” variable to null. SELECT kept old value. It makes sense, if you think about it – SET assignes result from subquery that did not return any data. But more interesting behavior is when query tries to select multiple rows. Let’s run select first.

As you can see, it processes 9 rows and at the end variable has the value from the last one. Let’s check SET operator:

As you can see – it failed. Again, makes sense if you think about it from subquery standpoint.

Let’s think about it from the practical angle. Assuming you need to query some row from Users table based on Name. If today Name is unique (and has unique constraint/index on it) it does not really matter what to use SET or SELECT. Although, what will happen with the code that uses SELECT if in one day, name stops to be unique? Obviously it introduces hard-to-find bugs/side effects – system just picks up one of the rows. Using SET in such case helps – code simply fails. Obviously it will require troubleshooting but at the end it will take much less time to pinpoint the problem. Of course, you can check @@RowCount (and it’s the only choice if you need to select/assign more than 1 variable as part of the query), but with 1 variable SET is more elegant (based on my opinion).

Same time, when I need to assign constants to the multiple variables, I prefer to use select. Less typing and probably a couple microseconds of performance improvement due less operators.

Code can be downloaded here

Ad-hoc SQL and plan cache

Let’s take a break from locking and blocking issues and talk about Ad-hoc sql. Everybody knows that it’s bad. Most popular horror story is security hole introduced by SQL Injection. There are a lot of articles available on such subject – just google it. But today I’d like to show another problem related with that implementation – recompilations and plan cache flood. Let’s take a look.

First, let’s create our favorite Data table and populate it with some data.

Second, let’s clear the plan cache and make sure that there are nothing there that accesses Data table.

As you see, clean and clear. Now let’s have some fun and introduce really bad client side code. This code basically runs 1000 ad-hoc selects against that table.

Let’s run it and query the plan cache again. As you can see, now we have 1000 cached plans.

So let’s think about it for a minute – we forced SQL Server to compile 1000 different queries (which is quite expensive in terms of CPU usage), but besides that we just grab about 25MB of the server memory. Huh? Well, of course, those one-time-use plans would be removed from the cache rather sooner than later, but think what if you have the system that constantly runs those ad-hoc queries.. ?Alternatively, if we use parameters, you can see that there will be only 1 plan in the cache that was (re)used 1000 times. Don’t forget to clear cache before you run the second code snippet.

So what are the options if we cannot change the client code? First, of course, is forced parametrization. Although this is completely different can-of-worms and can introduce a lot of side effects related with parameter sniffing and bad plans. It deserves own blog post sometime later.

Another option is enable “Optimize for Ad hoc workloads” option. This is server side option that available in SQL Server 2008/2008R2 only. With this option enabled, SQL Server does not cache complete query plan at the first run – it generates plan stub (which is basically the small hash) instead. Next, when query runs the second time, SQL Server recompiles the query and replaces the stub with the actual plan. Let’s see that. First, let’s enable this option either in server properties window or via the script.

Next, let’s clear the cache and run the first code snipped again. As you can see, we still have 1000 objects in the plan cache, but each objects uses only 232 bytes. In case of complex queries/plans, the memory usage difference could be dramatic.

If you run it the second time, you’ll see that stubs were replaced with actual plans.

What are the downsides of this option? First of all, queries would be compiled twice. So it would introduce addition CPU load. Another issue that this is server-side option, so you cannot control it on the database level. In case if you have OLTP and Data-Warehouse databases hosted on the same server, that additional recompilation could be very expensive for the complex data-warehouse queries. But still, for OLTP and bad client code this option is usually useful.

Of course, the best advice is to get rid of ad-hoc sql at all!

Code is available for download