If you spent some time working with Microsoft SQL Server 2008, you should be aware of the Merge statement. This statement is not only allowing you to insert/update/delete data as the part of the single statement (which helps with locking and performance), it also gives you an ability to intercept column values from the source rowset – something you cannot do with regular OUTPUT clause of insert, update and delete statement.
Today I’d like to show you another hidden beauty of this statement – ability to use CTE as the Target. Basically it gives you an ability to execute merge against subset of the data from the table. There are quite a few cases where it could be beneficial – let’s think about the situation when you need to synchronize target with the source that contains data only for subset of target rows. Confusing? Let’s think about one real life example.
Let’s think about order entry system and assume that you want to have a cache and store the information about last 15 orders per customer in the system. Let’s create the table and populate it with some data.
In this example orders are sorted by ID and partitioned by customers – so bigger ID means more recent orders. As you can see – you have 100 customers with 15 orders each in the cache.
Let’s assume that every day you get the data about the new orders placed into the system. This data contains the orders for subset of the customers (obviously some customers don’t place orders that day). It could also have the orders from the new customers that you don’t have in the cache. Let’s create the table:
As you can see, in this example we added 10 orders per customer for 21 old customers (CustomerIds from 80 to 100) as well as added 10 new customers (CustomerIds from 101 to 110).
What do we want to have at the end is to update the cache for existing customers (delete first 10 old orders) and add new customers to the cache. Obviously we don’t want to touch customers who did not submit any orders during the day.
Merge statement would work perfectly here. Although if we use Data table as the target, we will have hard time to differentiate the customers who didn’t submit any data. Fortunately we can put CTE that filters out customers who don’t have any orders today and use it as the target. Let’s take a look:
So, first CTE – SourceData - does the trick – it filters out everybody who don’t have the new orders. This would be our Target. Now let’s prepare the Source - first what we need to do is to combine data from the cache with the new data – MergedData CTE does that. As result of this CTE we’ll have all old and new orders combined for the customers who submits the orders today. Next – we need to determine most recent 15 orders – basically let’s sort MergedData (use ROW_NUMBER()) based on ID in descending order. Here is SortedData CTE. And now we can use first 15 rows per customer from this CTE as the Source.
The trick is what to do next – if there is the order in SourceData that is not in the Source (top 15 from SortedData) – it means order is old and we need to delete it from the cache. “When not matched by source” does that. If order is in the Source but not in the cache – we need to insert it (“when not matched by Target“). Obviously if order is in the both places, we should ignore it. And now, if you think about SourceData CTE which is the Target for the merge – it makes perfect sense. In case if you use the dbo.Data table there – all orders from the customers who did not submit data today would not be matched by Source and would be deleted. So CTE as the Target takes care of it.
If you look at the data, you’d see that new customers (CustomerID > 100) have 10 rows in the cache with ID starting at 16. Old customers who submitted data today (CustomerID: 80..100) have last 15 orders – with ID from 11 to 25. And old customers data (CustomerID < 80) is intact.
Source code is available for download