07 | July | 2011 | About Sql Server

Let’s take a break from locking and blocking issues and talk about Ad-hoc sql. Everybody knows that it’s bad. Most popular horror story is security hole introduced by SQL Injection. There are a lot of articles available on such subject – just google it. But today I’d like to show another problem related with that implementation – recompilations and plan cache flood. Let’s take a look.

First, let’s create our favorite Data table and populate it with some data.

Second, let’s clear the plan cache and make sure that there are nothing there that accesses Data table.

As you see, clean and clear. Now let’s have some fun and introduce really bad client side code. This code basically runs 1000 ad-hoc selects against that table.

Let’s run it and query the plan cache again. As you can see, now we have 1000 cached plans.

So let’s think about it for a minute – we forced SQL Server to compile 1000 different queries (which is quite expensive in terms of CPU usage), but besides that we just grab about 25MB of the server memory. Huh? Well, of course, those one-time-use plans would be removed from the cache rather sooner than later, but think what if you have the system that constantly runs those ad-hoc queries.. ?Alternatively, if we use parameters, you can see that there will be only 1 plan in the cache that was (re)used 1000 times. Don’t forget to clear cache before you run the second code snippet.

So what are the options if we cannot change the client code? First, of course, is forced parametrization. Although this is completely different can-of-worms and can introduce a lot of side effects related with parameter sniffing and bad plans. It deserves own blog post sometime later.

Another option is enable “Optimize for Ad hoc workloads” option. This is server side option that available in SQL Server 2008/2008R2 only. With this option enabled, SQL Server does not cache complete query plan at the first run – it generates plan stub (which is basically the small hash) instead. Next, when query runs the second time, SQL Server recompiles the query and replaces the stub with the actual plan. Let’s see that. First, let’s enable this option either in server properties window or via the script.

Next, let’s clear the cache and run the first code snipped again. As you can see, we still have 1000 objects in the plan cache, but each objects uses only 232 bytes. In case of complex queries/plans, the memory usage difference could be dramatic.

If you run it the second time, you’ll see that stubs were replaced with actual plans.

What are the downsides of this option? First of all, queries would be compiled twice. So it would introduce addition CPU load. Another issue that this is server-side option, so you cannot control it on the database level. In case if you have OLTP and Data-Warehouse databases hosted on the same server, that additional recompilation could be very expensive for the complex data-warehouse queries. But still, for OLTP and bad client code this option is usually useful.

Of course, the best advice is to get rid of ad-hoc sql at all!

Code is available for download