Skip to main content

Missing rows after updating SQL Server index key column

I'm reading the amazing "T-SQL Quering book" (http://www.amazon.com/Inside-Microsoft-Querying-Developer-Reference/dp/0735626030). It has an interesting example, where reading all rows from a table, while updating clustered index key column value, one can miss a row or read a row twice. Updating entity key is not a good idea anyway, so this behavior seemed to be acceptable.

I've tried to update this example, so that the same happens with a non-clustered index. This seems not to be acceptable with non-clustered indexes, since keys of these indexes are not domain keys and related values may very well be changing.

In order to simulate such situation, we need to create a covering non-clustered index, which takes at least 2 pages. Then we need to update it's key column in such a way that first row (in index order) becomes last (jumps to second page in the index) and then - back again. While this jumping happens in the endless loop, we need to select all rows in the table in a separate connection. Very soon, this second connection will get a result set, where not all rows are present.

If we try to put second connection into REPEATABLE READ transaction isolation level, we remove data inconsistency problem, but we get into deadlock error there very soon. It looks like SQL Server does not handle such high frequency page jumps very well.

Following are the SQL statements, repeating the problem.

Table structure: First connection, causing row jumping: Second connection, reading data inconsistently: UPDATE: This SO answer proves that such behavior can also happen even with single index page: http://stackoverflow.com/a/27765798/101685

ANOTHER UPDATE: Using snapshot isolation level (just adding SET TRANSACTION ISOLATION LEVEL SNAPSHOT statement to the reading connection) also solves the deadlocking problem.

UPDATE 3: Actually, when you enable shapshots on your database ("Is Read Committed Snapshot On" property in SSMS), the problem is gone even without setting transaction isolation level explicitly.

Comments

Popular posts from this blog

Nginx secure link with node.js

Serving static files is a natural task for web servers. They, especially ones, having asynchronous architecture (like Nginx), are very good at such tasks. However, usually there is an additional security logic, which should restrict access to files you've published. IIS, for example, offers deep integration with application layer, which allows custom .NET "middleware" logic injection into the request pipeline. Node.js applications are very often published behind Nginx for various reasons and, with the help of Nginx "Secure Link" module, it's possible to offload static file serving tasks from node.js to Nginx, even if the files are not public. This module uses "shared secret" string (known to Nginx and the application) and expects a hash, based on this secret, to be present in the request to decide whether to proceed or return an error. Secure Link module may work in 2 alternative modes ( http://nginx.org/en/docs/http/ngx_http_secure_link_modul

Lost promises

I love Promises . I think they make modern JavaScript possible, especially on the server side. But promises are, you know, promises and some of them are literally lost! I would even say that lost promises are, to a certain degree, the buffer overflow of JavaScript. OK, it's not as widespread and it hasn't cost as many billions of dollars, but it still may be as subtle, as difficult to notice and just as devastating. At least I have encountered this issue a few times and it works like that: In the code above we simply forget to add "return" keyword before call to sideEffect3 function. This is totally OK, except when you rely on the fact that the Promise returned from giveMePromise is resolved after "side effect 3" can be observed. In our case, Promise was given, but it was lost. That sideEffect3  function is trying in vain, because it's work will never be used. I think this is just a danger of asynchronous code and such errors can only be detecte

Performance of Redis sorted set operations

I was working on a feature recently, which involved several Redis "remove from sorted set" operations. Target production environment sorted sets are expected to be small and most of these calls would be trying to remove items, which do not exist in the sets. Although ZREM operation has documented LON(N) time complexity, ZSCORE has documented constant time complexity. This led me to believe, that Redis might have constant time complecity for ZREM calls when values to be removed do not exist in the set (if ZSCORE "knows" if item is NOT in the set in constant time, ZREM can do the same, right?). Not in this case, apparently. ZSCORE documented constant time complexity is actually misleading (as many cases of asymptotic time complexity documentation for small data sets). Redis stores small sorted sets (up to 128 items by default) as "ziplists", which are essentially linked lists, optimized for memory consumption. Browsing through Redis source code confirms th