Daily Archives: August 5, 2014

Plan Cache: Parameter Sniffing

There are several reasons why it is beneficial to use parameters in the queries. It mitigates security risks by protecting from SQL Injection. Moreover, it helps with the performance. Query optimization is expensive and resource-intensive process. SQL Server reduces that cost by keeping execution plans in the special part of the memory called Plan Cache and reusing them on subsequent calls.

Even though, this approach benefits the system by reducing the number of query compilations, it could also introduce some issues. Cached plans can become inefficient due to the data distribution and/or parameter values change. SQL Server tracks changes in the data by counting number of changes of the index columns outdating statistics and recompiling queries when number of changes reach about 20% of number of rows in the table (trace flag 2371 can reduce that threshold). However, tracking parameter values change is more complicated.

When SQL Server compiles parametrized query or stored procedure, it analyzes parameters and generates execution plan, which is optimal for those specific values. This process is called Parameter Sniffing. Furthermore, generated plans are cached and reused regardless of parameter values at time of reuse. That behavior can lead to the situations when query is compiled and plan is generated based on atypical parameter set. Cached plan, in this case, would be suboptimal for the calls that reuse that plan.

All of us experienced the situations when query or stored procedure suddenly became very slow even when they ran fine before. Parameter sniffing could be the reason of why it happened. SQL Server decided to recompile the query due to statistics update and the query that triggered recompilation had atypical parameter values. Even though, generated execution plan was efficient for parameters at time of compilation; that cached plan is not necessarily efficient for the other queries that reuses it.

Historically, parameter sniffing has the negative connotation in SQL Server community. It is used to explain why queries have suboptimal execution plans, like I just did in the previous paragraph. This is vastly wrong, however. Parameter sniffing is just the name for SQL Server behavior of analyzing parameters during query compilation.

Let’s look at a few examples of parameter sniffing and discuss how to address the issues it can introduce. In this post, I am using the stored procedure; however, everything would work the same in case of parametrized queries.

As the first step, let’s create a table dbo.Employees. We will populate it distributing data in the way that most rows have Country column value of USA with a few employees with Country value of Canada. Listing below shows the code.

create table dbo.Employees
(
	ID int not null,
	Number varchar(32) not null,
	Name varchar(100) not null,
	Salary money not null,
	Country varchar(64) not null,

	constraint PK_Employees
	primary key clustered(ID)
);

;with N1(C) as (select 0 union all select 0) -- 2 rows
,N2(C) as (select 0 from N1 as T1 cross join N1 as T2) -- 4 rows
,N3(C) as (select 0 from N2 as T1 cross join N2 as T2) -- 16 rows
,N4(C) as (select 0 from N3 as T1 cross join N3 as T2) -- 256 rows
,N5(C) as (select 0 from N4 as T1 cross join N4 as T2 ) -- 65,536 rows
,Nums(Num) as (select row_number() over (order by (select null)) from N5)
insert into dbo.Employees(ID, Number, Name, Salary, Country)
	select 
		Num, 
		convert(varchar(5),Num), 
		'USA Employee: ' + convert(varchar(5),Num), 
		40000,
		'USA'
	from Nums;

;with N1(C) as (select 0 union all select 0) -- 2 rows
,N2(C) as (select 0 from N1 as T1 cross join N1 as T2) -- 4 rows
,N3(C) as (select 0 from N2 as T1 cross join N2 as T2) -- 16 rows
,Nums(Num) as (select row_number() over (order by (select null)) from N3)
insert into dbo.Employees(ID, Number, Name, Salary, Country)
	select 
		65536 + Num, 
		convert(varchar(5),65536 + Num), 
		'Canada Employee: ' + convert(varchar(5),Num), 
		40000,
		'Canada'
	from Nums;

create nonclustered index IDX_Employees_Country
on dbo.Employees(Country);

As a next step, let’s create a stored procedure that calculates the average salary for employees in a specific country.

create proc dbo.GetAverageSalary @Country varchar(64)
as
begin
	select Avg(Salary) as [Avg Salary]
	from dbo.Employees
	where Country = @Country
end

As you can guess, when we call this stored procedure with parameter @Country=’USA’, stored procedure needs to touch almost every row in the table and the most optimal execution plan is Clustered Index Scan. However, when we call the stored procedure with parameter @Country=’Canada’, the most efficient execution plan is Nonclustered Index Seek with Key Lookup operators.

Let’s call the stored procedure twice: the first time with @Country=’USA’ and the second time with @Country=’Canada’, as shown below.

exec dbo.GetAverageSalary @Country='USA';
exec dbo.GetAverageSalary @Country='Canada';

As you can see in Figure 1, SQL Server compiles the stored procedure and caches the plan with the first call and reuses it later. Even though such a plan is less efficient with the @Country=’Canada’ parameter value, it may be acceptable when those calls are rare, which is expected with such a data distribution.

01. Execution plans when the stored procedure is compiled with @Country=’USA’

Now let’s take a look what happens if we swap those calls when the plan is not cached. We will use the DBCC FREEPROCCACHE command, which clears plan cache (DO NOT RUN IT ON PRODUCTION SERVER). As I already mentioned, another instance when this might happen is with a statistics update that forces query to recompile.

dbcc freeproccache
go

exec dbo.GetAverageSalary @Country='Canada';
exec dbo.GetAverageSalary @Country='USA';

As you can see in Figure 2, SQL Server now caches the plan based on the @Country=’Canada’ parameter value. Even though this plan is more efficient when the stored procedure is called with @Country=’Canada’, it is highly inefficient for @Country=’USA’ calls.

02. Execution plans when the stored procedure is compiled with @Country=’Canada’

There are a few ways to address the issue. You can force the recompilation of either stored procedure using EXECUTE WITH RECOMPILE or a statement-level recompile with OPTION (RECOMPILE) clauses. Obviously, a statement-level recompile is better because it performs the recompilation on a smaller scope. SQL Server sniffs the parameter values at the time of the recompilation, generating the optimal execution plan for each parameter value. It is also worth mentioning, that statement-level recompile allows SQL Server to obtain number of rows stored in table variables, which can help to generate more efficient execution plans in some cases.

alter proc dbo.GetAverageSalary @Country varchar(64)
as
begin
	select Avg(Salary) as [Avg Salary]
	from dbo.Employees
	where Country = @Country
	option (recompile)
end
go

exec dbo.GetAverageSalary @Country='Canada';
exec dbo.GetAverageSalary @Country='USA';

As you can see in Figure 3, SQL Server recompiles the statement on every call, and it generates the most efficient execution plan for every parameter value. It is also worth noting that SQL Server does not cache plans for the statements when statement-level recompile is used.

03. Execution plans with OPTION (RECOMPILE) hint

The statement-level recompile may be a good option when the queries do not execute very often or in the case of complex queries, when the compilation time is just a fraction of the total execution time. However, it is hardly the best approach for OLTP queries, which are constantly running in the system due to the extra CPU load that recompilation introduces.

Another option is using an OPTIMIZE FOR hint, which forces SQL Server to optimize a query for the specific parameter values provided in the hint. Listing below illustrates such an approach.

alter proc dbo.GetAverageSalary @Country varchar(64)
as
begin
	select Avg(Salary) as [Avg Salary]
	from dbo.Employees
	where Country = @Country
	option (optimize for(@Country='USA'))
end
go

exec dbo.GetAverageSalary @Country='Canada';
exec dbo.GetAverageSalary @Country='USA';

As you can see in Figure 4, SQL Server ignores the parameter value during the compilation and optimizes the query for the @Country=’USA’ value.

04. Execution plans with OPTIMIZE FOR hint

Unfortunately, the OPTIMIZE FOR hint introduces supportability issues, and it can lead to suboptimal execution plans in cases where the data distribution has changed. Let’s consider the situation, albeit unrealistic, when a company and all of its employees moved from the United States to Germany.

update dbo.Employees set Country='Germany' where Country='USA';

exec dbo.GetAverageSalary @Country='Germany';

Statistics are outdated at the time of the update, which forces SQL Server to recompile the statement in the stored procedure. At this point, there are no rows in the table with Country=’USA’, and the recompilation produces a suboptimal execution plan, as shown in Figure 5. As a side note, the query uses more reads than before due to the index fragmentation introduced by the update.

05. Inefficient execution plan after data distribution change

SQL Server 2008 introduced another optimization hint, OPTIMIZE FOR UNKNOWN, which helps to address such situations. With this hint, SQL Server performs an optimization based on the most statistically common value in the table. Listing below shows the code involved in doing this.

alter proc dbo.GetAverageSalary @Country varchar(64)
as
begin
	select Avg(Salary) as [Avg Salary]
	from dbo.Employees
	where Country = @Country
	option (optimize for(@Country UNKNOWN))
end
go

exec dbo.GetAverageSalary @Country='Canada';

Figure 6 illustrates the execution plan. Country=’Germany’ is the most statistically common value in the table, and therefore SQL Server generates an execution plan that is optimal for such a parameter value.

06. Execution plan with OPTIMIZE FOR UNKNOWN hint

You can achieve the same results with an OPTIMIZE FOR UNKNOWN hint by using local variables instead of parameters. This method also works with SQL Server 2005, where the OPTIMIZE FOR UNKNOWN hint is not supported. Listing below illustrates that approach. It introduces the same execution plan with a Clustered Index Scan, as shown in Figure 6 above.

alter proc dbo.GetAverageSalary @Country varchar(64)
as
begin
	declare
		@CountryTmp varchar(64)
	set @CountryTmp = @Country

	select Avg(Salary) as [Avg Salary]
	from dbo.Employees
	where Country = @CountryTmp
end

Finally, in case when you are dealing with 3rd party systems and cannot add optimization hints to the code, you could use plan guides to achieve the same goal. We will discuss plan guides in one of the future blog posts.

Source code is available for download.