Skip to main content

Merging sorted collections

Recently I had a chance to implement an interesting optimization for combining two sorted sets into another sorted set in JavaScript (by "sets" I mean arrays with unique elements). Additional requirement for the algorithm was that when matches are found in two sets, element from the first set should always appear in the result.

The general approach of the implementation is the one used by SQL Server in its Merge physical join operator - since arrays are already sorted, both of them need to be iterated only once and there is no need for hashes to avoid duplicates. This approach would work even better with linked lists, since source items could be added directly into target efficiently without the need to create a separate array object.

This is one of those algorithms, which is so simple but so efficient at the same time, so it is a great pleasure to implement. Resulting gist is presented below:

Comments

Popular posts from this blog

Nginx secure link with node.js

Serving static files is a natural task for web servers. They, especially ones, having asynchronous architecture (like Nginx), are very good at such tasks.
However, usually there is an additional security logic, which should restrict access to files you've published. IIS, for example, offers deep integration with application layer, which allows custom .NET "middleware" logic injection into the request pipeline.
Node.js applications are very often published behind Nginx for various reasons and, with the help of Nginx "Secure Link" module, it's possible to offload static file serving tasks from node.js to Nginx, even if the files are not public. This module uses "shared secret" string (known to Nginx and the application) and expects a hash, based on this secret, to be present in the request to decide whether to proceed or return an error.

Secure Link module may work in 2 alternative modes (http://nginx.org/en/docs/http/ngx_http_secure_link_module.ht…

Performance of Redis sorted set operations

I was working on a feature recently, which involved several Redis "remove from sorted set" operations. Target production environment sorted sets are expected to be small and most of these calls would be trying to remove items, which do not exist in the sets. Although ZREM operation has documented LON(N) time complexity, ZSCORE has documented constant time complexity. This led me to believe, that Redis might have constant time complecity for ZREM calls when values to be removed do not exist in the set (if ZSCORE "knows" if item is NOT in the set in constant time, ZREM can do the same, right?).

Not in this case, apparently. ZSCORE documented constant time complexity is actually misleading (as many cases of asymptotic time complexity documentation for small data sets). Redis stores small sorted sets (up to 128 items by default) as "ziplists", which are essentially linked lists, optimized for memory consumption. Browsing through Redis source code confirms the…

Redis Lua scripts are not really transactions

Redis support of Lua scripts is a great feature. We use it a lot to build fast reliable queues with some very interesting requirements. You need it every time you want to decide your next Redis command, based on the result of a previous command, while guaranteeing that no one else has done anything with this result or anything else has changed in Redis. That is, the whole Redis script is an "atomic" operation.
However, I put it in quotes intentionally. My understanding of phrase "atomic operation" is that not only no one else can see it half complete while it is executing (that works so great in Redis). It should also mean, that it should never be left half complete if an error occurs in the middle (or at least, that is my wishful thinking:) ).
Yea, exactly, the second point doesn't work in Redis and there is no warning in the official docs. To be more polite (or precise), there is no rollback in Redis (referring to a comment in this SO question - http://stack…