Skip to main content

Posts

Showing posts from 2016

Performance of Redis sorted set operations

I was working on a feature recently, which involved several Redis "remove from sorted set" operations. Target production environment sorted sets are expected to be small and most of these calls would be trying to remove items, which do not exist in the sets. Although ZREM operation has documented LON(N) time complexity, ZSCORE has documented constant time complexity. This led me to believe, that Redis might have constant time complecity for ZREM calls when values to be removed do not exist in the set (if ZSCORE "knows" if item is NOT in the set in constant time, ZREM can do the same, right?). Not in this case, apparently. ZSCORE documented constant time complexity is actually misleading (as many cases of asymptotic time complexity documentation for small data sets). Redis stores small sorted sets (up to 128 items by default) as "ziplists", which are essentially linked lists, optimized for memory consumption. Browsing through Redis source code confirms th...

Finding random subsets

Suppose you have a large set of records and you need to process them in random batches over longer period of time. By "random batches" I mean subsets, containing random elements from the full set. The solution we've found working good for us is based on the following steps: Load unprocessed record ids into memory; Periodically extract a random batch of ids; Process extracted records and persist them as processed. The tricky part of the process is step 2). How do you efficiently find random subset of a big set? It turns out there is a ready-made algorithm for this - https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle . Following is the JavaScript implementation, which proved to be useful in the context of the aforementioned task:

Finding random geographical point around given coordinates

In my current project, we are using Mongo 2dsphere index to sort records by distance from certain geographical point. It's very useful to sort large data sets by distance with good performance, but unfortunately it is not possible to use it if you need to extend your sorting rules (that is - sort by other fields as well). What we needed was to randomize search results in a certain way, while retaining "sort by distance first" rule. Below is a function (in TypeScript), which proved to be very useful for this task. It returns a random geographical point anywhere around given coordinates with range limit in km. It was very interesting to find our that longitude and latitude are just sphere vector angles ( https://en.wikipedia.org/wiki/Latitude ), so a bit of trigonometry is expected.