SQL Server Troubleshooting and Performance Tuning – Observer Effect

I hope you are OK during those turbulent times!

People often ask me about my most curious experience in SQL Server consulting. Well, as many of my colleagues, I have a lot of stories to tell (though many of them are still under NDA). One of my favorite ones is the story about the case when I was able to solve all performance problems on the server in just a few minutes. 

We, Database Professionals, love monitoring tools. They allow us to be more proactive and reduce incident detection and recovery times. Unfortunately, the benefits do not come for free – every type of monitoring adds the overhead to the system. In some cases, this overhead may be negligible and acceptable, in others – it may have significant impact to the server performance. 

I have seen many cases of inefficient monitoring. For example, I saw the tool that provided the information about index fragmentation by calling sys.dm_db_index_physical_stats function in DETAILED mode for every index in the database doing it every 4 hours. I also saw the tool, that constantly polled sys.dm_exec_requests view joining it with sys.dm_exec_query_plan function. It did not work very well on the system that handled hundreds of concurrent requests all the time. 

It is relatively easy with the tools – you can detect those inefficiencies just by looking at the expensive queries in the system. This is not always the case, however, with other technologies, like with monitoring done by Extended Events. They can stay almost invisible in the system, especially if targets are keeping up and don’t generate waits.

Let me show you an example and create xEvent session that captures queries that were executed in the system. This is very common one, is not it? 

create event session CaptureQueries on server 
add event sqlserver.rpc_completed
  set collect_statement=(1)
add event sqlserver.sql_batch_completed(
add event sqlserver.sql_statement_completed
add target package0.event_file
(set filename=N'C:\PerfLogs\LongSql.xel',max_file_size=(200))
  max_memory=4096 KB
  ,max_dispatch_latency=5 seconds

As the next step, let’s deploy it to the server that operates under heavy load with large number of concurrent requests. We will measure the throughput in the system with and without xEvent session running. Obviously, be careful and do not run it on production servers! 

Figure 1 below illustrates CPU load and number of batch requests per second in both scenarios. As you see, enabling xEvent session led to more than 20% decrease of throughput in the system.

Figure 1. System throughput with and without xEvent session

And coming back to my story – this is exactly what happened. The customer had very similar session running. Amazingly, but not surprisingly, nobody used collected data. The team was unaware that this session even existed. 

After some investigation, they found that they created this session several months ago during investigation of some performance issues. After investigation had been completed, they stopped the session. Unfortunately, they overlooked that the session was created with “Start the event session at server startup” flag. At some point the server was patched, rebooted and voila!

I did not make much money from that case – after all, I am billing by the hour. But the look at the customer’s face was priceless!   

The morale of this story – be careful with the monitoring. Know what you are doing and do not keep unnecessary data collectors running. 

This is the link to the video version of this blog post. 🙂

The danger of HADR_SYNC_COMMIT wait and synchronous replication in AlwaysOn Availability Groups

The life of Database Engineer is never boring. We are constantly pushing the boundaries and dealing with more complex systems every year. Our challenges are also changing over time. Our current problems did not even exist 5 years ago. New and powerful hardware creates bottlenecks in unexpected areas.

One of such examples is the blocking introduced by AlwaysOn Availability Groups and synchronous commit. This condition may occur with any transaction log-intensive operations, especially on the modern servers with powerful CPUs and fast flash-based storage.

As you know, Availability Groups consist of one primary and one or more secondary nodes/servers. All data modifications are done on the primary node, which sends the stream of t-log records to secondaries. Those log records are saved (hardened) in transaction logs there and asynchronously re-applied to the data files by the set of REDO threads.

The secondary nodes may be configured using asynchronous or synchronous commit. With asynchronous commit, transaction considered to be committed and all locks were released when COMMIT log record is hardened on the primary node. SQL Server sends COMMIT record to secondary node; however, it does not wait for the confirmation that the record had been hardened in the log there.

This behavior changes when you use synchronous commit as shown in Figure 1. In this mode, SQL Server does not consider transaction to be committed until it receives the confirmation that COMMIT log record is hardened in the log on the secondary node. The transaction on primary will remain active with all locks held in place until this confirmation is received. The session on primary is suspended with HADR_SYNC_COMMIT wait type.

Figure 1. Synchronous Commit in AlwaysOn Availability Groups

You can analyze impact of commit latency by looking at HADR_SYNC_COMMIT wait in wait statistics. It is not uncommon nowadays to see this wait to be at the top of the list in busy OLTP systems. Figure 2 below illustrates the output from one of my production servers.

Figure 2. Wait Statistics from one of Production Servers

While we may argue that the Average Wait Time of 3ms would not introduce noticeable overhead, the situation may quickly change when you run t-log intensive processes that generate large amount of transaction log records. Those processes may quickly saturate the Send queue and dramatically increase HADR_SYNC_COMMIT wait/latency. This will affect all processes and transactions in the system – all of them share the same Send queue.

Unfortunately, index maintenance is one of such processes. Index reorgs and rebuilds may create enormous amount of transaction log records, especially when LOB columns are involved. You can watch the short YouTube video that illustrates how index rebuild operation put the system down in no time.

There are a few things you can try to improve the situation:

  • You can throttle index maintenance performance by reducing MAXDOP for the operation. This will reduce log generation rate at cost of increased execution time. Unfortunately, even MAXDOP=1 may not be enough in some cases
  • Alternatively, in Enterprise Edition of SQL Server, you can throttle the process with Resource Governor, limiting IOPS and CPU bandwidth. This approach may require good amount of trial and errors to achieve optimal configuration
  • In SQL Server 2017 and above, you can utilize resumable index rebuild and build the process which will frequently pause the operation allowing Availability Group queues to catch up. I am not the big fan of this approach; however, it may work when properly implemented.

Unfortunately, even with some tricks, in many cases, you’d still need to switch to asynchronous commit mode during massive index operations. Well, it is the great example that everything in SQL Server fits into “It Depends” category. Even good hardware.

Queue Monitoring

As you can guess, it is essential to monitor Availability Group queues. You need to keep an eye on both, Send and Redo queues.

  • In case of synchronous replicas, large Send queue implies potential blocking. With asynchronous replicas, large Send queue dictates potential data loss – data from the queue has not been sent to secondaries.
  • Large Redo queue implies long database recovery time in case of failover. All records from the queue need to be applied to the database before it becomes available. Moreover, in case of readable secondaries, large Redo queue indicates that secondary replica may fall behind being out of date.

The script below provides you the information about Availability Group health and replication latency along with Send and Redo queues sizes. It is available for download (link is in the end of the post)

I usually run that script from SQL Server job every a few minutes and use it to send alert to the team when Send or Redo queue sizes exceed predefined threshold. Obviously, thresholds may vary based on workload the databases handle.

Addressing High HADR_SYNC_COMMIT waits

I consider HADR_SYNC_COMMIT as the dangerous wait and want to address it when it is noticeable in the wait statistics.

Generally, there are three main factors that contribute to this wait:

  • The time, log record waits in the send queue. You can analyze this with the code from Listing 1 using the data from [Send Queue Size(KB)] and [Send Rate KB/Sec] columns. It is worth noting that the queue management process is CPU intensive, which may lead to additional latency in the systems with high CPU load.
  • Network throughput. You can troubleshoot it with network-related performance counters along with SQL Server:Availability Replica > Bytes Sent performance counter. Obviously, make sure that your network infrastructure is configured in the optimal way
  • I/O latency on secondary nodes. Synchronous commit requires COMMIT log record to be hardened in transaction log before acknowledgement is sent back to the primary node. You can monitor write latency of transaction log file using sys.dm_db_virtual_file_stats view. You can download the script to do it through the link in the end of the blog post.

While network and I/O performance may, sometimes, be addressed by hardware upgrades; it is much harder to deal with the latency, introduced by large amount of log records in very busy OLTP systems. You can reduce the overhead of queue management by utilizing CPUs with higher clock speed; however, there are some limits on what you can achieve with the hardware.

There are several things you can do when you experienced this situation:

  • Make sure that SQL Server schedulers are evenly balanced across NUMA nodes. For example, if SQL Server is using 10 cores on 2 NUMA-node server with 8 cores per node, set affinity mask to use 5 cores per node. Unevenly balanced schedules may introduce various performance issues in the system and affect Availability Groups throughput.
  • Reduce amount of log records generated in the system. Some of the options are redesigning transaction strategy avoiding autocommitted transactions; removing unused and redundant indexes; fine-tuning index FILLFACTOR property reducing the page splits in the system.
  • Rearchitect data tier in the system. It is very common that different data in the system may have different RPO (recovery point objective) requirements and tolerance to the data loss. You may consider to move some data to another Availability Group that does not require synchronous commit and/or utilize NoSQL technologies for some entities.

Finally, if you are using SQL Server prior 2016, you should consider to upgrade to the latest version of the product. SQL Server 2016 has several internal optimizations, which dramatically increase Availability Groups throughput comparing to SQL Server 2012 and 2014. It may be the simplest solution in many cases.

Figure 3 below illustrates HADR_SYNC_COMMIT waits before and after SQL Server 2016 upgrade in one of the systems. As you can see, cumulative time for HADR_SYNC_COMMIT waits had been reduced for more than 3 times after upgrade. It is also worth noting that general system performance had been improved as well.

Figure 3. HADR_SYNC_COMMIT waits before and after SQL Server 2016 upgrade

While having zero data loss is the great thing, remember about synchronous commit overhead. Monitor Availability Groups Queues and HADR_SYNC_COMMIT in the system.

Several useful scripts.

Link to YouTube video with the demo.

PS. I am teaching full day SQL Server Internals class at SQL Saturday #862 in Cork, Ireland on June 28th, 2019. You can register to pre-con here.