Blocking Monitoring Framework

Capture and Analyze SQL Server Blocking and Deadlock Information With Event Notifications

Troubleshooting of the blocking and concurrency issues is, in the nutshells, a simple process. You need to identify the processes involved in blocking conditions or deadlocks and analyze why those processes acquire the locks on the same resources. In majority of cases, you need to analyze queries and their execution plans identifying possible inefficiencies that led to excessive number of locks being acquired.

Collecting this information is not a trivial task. The information is exposed through DMVs (you can download the set of scripts here); however, it requires you to run the queries at time when blocking occurred. Fortunately, SQL Server allows you to capture blocking and deadlock conditions with the blocked process report and deadlock graph, analyzing them later.

There is the caveat though. Neither blocked process report nor deadlock graph provide you execution plans of the statements. Nor do they always include affected statements in the plain text. You may need to query plan cache and other DMVs to get this information and longer you wait lesser is the chance that the information is available. Moreover, SQL Server may generate enormous number of blocked process reports in cases of prolonged blocking and complex blocking chains, which complicates the analysis.

This page provides you the set of scripts that help to address those problems. The scripts create the infrastructure to capture blocked process reports and deadlock graphs using Event Notification – Service Broker-based technology. Blocked process reports and deadlock graphs are parsed at time when events occurred and all data is still available in the system. Multiple blocked process reports are aggregated into the single entry with cumulative wait time of the blocking. All information is persisted in the set of tables for the further analysis.

The framework also provides the set of scripts to analyze collected data. You can find the queries and execution plans that were involved in the blocking and deadlocks the most; identify potential cases of blocking due to the lock escalation; and do other analysis.

Overview

The first version of the framework consists of three main tables:

  • dbo.BlockedProcessInfo table contains the information about blocking cases. It includes the information about duration and sessions involved in the blocking conditions, resources and lock types, and statements and execution plans of the blocked and blocking sessions (when available). Figure 1 illustrates the sample data persisted in the table.

01. Information about blocking

  • dbo.Deadlocks table stores the information about deadlocks, such as time of the incident and the deadlock graph.
  • dbo.DeadlockProcesses table persists the data about processes involved in the deadlock including their statements and execution plans (when available). Figure 2 illustrates the data collected in those tables. At this time, I do not parse resources involved in the deadlock – they are available as part of the deadlock graphs. This will be improved in the future versions of the framework.

02.Information about Deadlocks

  • dbo.Versions table stores the version of the framework. This is required to support future upgrades.
  • dbo.BMFrameworkConfig table contains configuration settings:
    • CollectPlanFromBlockingReport (0/1) – controls if framework obtains execution plans of the statements from blocking process report.
    • CollectPlanFromDeadlockGraph (0/1) – controls if framework obtains execution plans of the statements from deadlock graphs.

Both of the settings enabled by default. Collecting query execution plans is expensive operation and you may consider disabling them if server has very high CPU load and suffers from large amount of blocking/deadlocks.

Error Handling

I am using this framework on many productions servers. However, I cannot guarantee that I saw and tested every possible shape of blocked process report and deadlock graph. Even with all extensive testing, it is possible that activation stored procedures would not be able to parse some of them.

If exception occurred, depending on its severity it may or may not make transaction uncommittable. The code should handle non-severe exceptions and persist deadlock graph or blocked process report in the tables without parsing them. In case of severe exceptions, the message and error information are persisted in dbo.PoisonMessages table. This allows the framework to skip poison messages when it is possible.

Activation procedures also call dbo.BMFrameworkErrorNotification stored procedure. This procedure is empty and it can be customized to send notifications to DBA team (for example with the Database Mail). The notification is done from the separate procedure to simplify framework version upgrades in the future.

There are still a few cases (like outer exceptions) that may lead to the condition when poison messages are not handled correctly and queues became disabled. You can monitor it with dbo.BMFrameworkQueuesCheck stored procedure running it from SQL Server Agent Job on schedule.

I would appreciate if you send me unprocessed blocked process reports and deadlock graphs. I will fix the code and publish a new version of the framework quickly.

Data Retention and Maintenance

In case, when partitioning is supported, (SQL Server 2016 SP1+ or Enterprise/Developer Edition), the tables are partitioned based on the date of event on weekly basis. Setup script also creates dbo.BlockingProcessInfoTmp, dbo.DeadlocksTmp, dbo.DeadlockProcessesTmp, dbo.PoisonMessagesTmp tables to support sliding window purge implementation.

Below is the list of maintenance stored procedures:

  • dbo.BMFrameworkPartitionMaintenance stored procedure performs table maintenance when data is partitioned. It purges the old data based on @RetentionWeeks parameter and creates new partitions for the future data. I would recommend running it from SQL Server Agent Job on the daily schedule
  • dbo.PurgeBlockingInfodbo.PurgeDeadlockInfo and dbo.PurgePoisonMessages stored procedures purge blocking, deadlock and poison messages data according to @RetentionDays parameter. Those procedures are recommended for non-partitioned implementation.
  • dbo.BMFrameworkQueuesCheck stored procedure checks if Service Broker queues are enabled. You need to customize the notification method and run it from SQL Server Agent Job on the regular basis making sure that framework is operational.

Setup

Make sure that you set Blocked Process Threshold in your system.

The Setup SSMS project contains the set of scripts you need to execute in order:

  • 01.Create DB.sql creates DBA database on the server. This database is used by default; however, you can use other databases if needed. Make sure that the target database has “Enable Broker” set (ALTER DATABASE SET ENABLE_BROKER) and security is properly configured (see step 5). If you decided to create and use DBA database, you may  change default data and log files configuration and adjust scripts below.
  • 02.Create Tables.sql creates tables for the framework. You need to change the filegroups and/or partition scheme if you want to place data to the different filegroup than PRIMARY.
  • 03.Create Procs.sql creates the functions and stored procedures.
  • 04.Event Notifications.sql creates Service Broker and Event Notification objects.
  • 05.1.Security (Trustworthy).sql and 05.2.Security(Certs).sql scripts provides you two ways how you can set up security that allows Service Broker activation procedures to operate. The first script marks the database as Trustworthy, which is the bad practice but still acceptable in many systems. Alternatively, the second script sets certificate-based security for the solution (please review and change the passwords in the script).
  • 06.Enabling Activation.sql enables activation for Service Broker queues.
  • 07.Helpers.sql creates helper stored procedures. Change the filegroup in dbo.BMFrameworkPartitionMaintenance stored procedure if you are storing data on different than PRIMARY filegroup.
  • 08.Set Version.sql sets the version for the framework.

Testing project includes the set of scripts to test the framework. It emulates the blocking and deadlock conditions in the system. Check that data is collected in the tables and check SQL Server log if this is not the case – you may see details of the errors there.

Analysis SSMS projects contains the set of scripts that allow you to perform analysis of the collected data outside of individual blocking cases.

Recommended setup checklist:

  1. Blocked process threshold is set on the server
  2. Target database has Service Broker enabled
  3. Proper security is set (step 5 from above)
  4. dbo.BMFrameworkErrorNotification stored procedure is customized to send notification messages
  5. SQL Server Agent has a job to purge the data from the tables
  6. SQL Server Agent has a job to call dbo.BMFrameworkQueuesCheck stored procedure and notify the team if queues are disabled. The stored procedure is customized with the proper notification method

Compatibility

The framework is tested with all versions of SQL Server starting with SQL Server 2012. It should also work with SQL Server 2008/2008R2; however, I have not tested it in those environments in production. Let me know if you have any issues there and I will address them.

Download

Full version of the framework. Currently v1.0.0. 

Change Log

  • V1.0.0 (2018-08-02). Initial Release

Additional Resources