Web dev tips

Posts

AWS EFS and ENOTEMPTY (not reading your own writes)

We often perform non-empty directory removal in our Node.js application. Obviously, one can not do this directly, directory contents first need to be cleared recursively. Of course, we could have used an NPM module like rimraf. However, we use our own simple, Promise-based recursive function with the only special error handling being ignoring ENOENT. It's been working for years without problems until we switched to Elastic File System (Amazon EFS). We started receiving ENOTEMPTY error when trying to remove a directory immediately after clearing it. It wasn't immediately clear what was happening. However, in retrospect, it was so obvious - we were observing a classic distributed system inconsistency - we were not "reading our own writes". The only solution is to retry a few times with an increasing delay. That's actually exactly what rimraf is doing. This is how it looks in our case:

Lost promises

I love Promises . I think they make modern JavaScript possible, especially on the server side. But promises are, you know, promises and some of them are literally lost! I would even say that lost promises are, to a certain degree, the buffer overflow of JavaScript. OK, it's not as widespread and it hasn't cost as many billions of dollars, but it still may be as subtle, as difficult to notice and just as devastating. At least I have encountered this issue a few times and it works like that: In the code above we simply forget to add "return" keyword before call to sideEffect3 function. This is totally OK, except when you rely on the fact that the Promise returned from giveMePromise is resolved after "side effect 3" can be observed. In our case, Promise was given, but it was lost. That sideEffect3 function is trying in vain, because it's work will never be used. I think this is just a danger of asynchronous code and such errors can only be detecte

Performance of Redis sorted set operations

I was working on a feature recently, which involved several Redis "remove from sorted set" operations. Target production environment sorted sets are expected to be small and most of these calls would be trying to remove items, which do not exist in the sets. Although ZREM operation has documented LON(N) time complexity, ZSCORE has documented constant time complexity. This led me to believe, that Redis might have constant time complecity for ZREM calls when values to be removed do not exist in the set (if ZSCORE "knows" if item is NOT in the set in constant time, ZREM can do the same, right?). Not in this case, apparently. ZSCORE documented constant time complexity is actually misleading (as many cases of asymptotic time complexity documentation for small data sets). Redis stores small sorted sets (up to 128 items by default) as "ziplists", which are essentially linked lists, optimized for memory consumption. Browsing through Redis source code confirms th

Finding random subsets

Suppose you have a large set of records and you need to process them in random batches over longer period of time. By "random batches" I mean subsets, containing random elements from the full set. The solution we've found working good for us is based on the following steps: Load unprocessed record ids into memory; Periodically extract a random batch of ids; Process extracted records and persist them as processed. The tricky part of the process is step 2). How do you efficiently find random subset of a big set? It turns out there is a ready-made algorithm for this - https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle . Following is the JavaScript implementation, which proved to be useful in the context of the aforementioned task:

Finding random geographical point around given coordinates

In my current project, we are using Mongo 2dsphere index to sort records by distance from certain geographical point. It's very useful to sort large data sets by distance with good performance, but unfortunately it is not possible to use it if you need to extend your sorting rules (that is - sort by other fields as well). What we needed was to randomize search results in a certain way, while retaining "sort by distance first" rule. Below is a function (in TypeScript), which proved to be very useful for this task. It returns a random geographical point anywhere around given coordinates with range limit in km. It was very interesting to find our that longitude and latitude are just sphere vector angles ( https://en.wikipedia.org/wiki/Latitude ), so a bit of trigonometry is expected.

Nginx secure link with node.js

Serving static files is a natural task for web servers. They, especially ones, having asynchronous architecture (like Nginx), are very good at such tasks. However, usually there is an additional security logic, which should restrict access to files you've published. IIS, for example, offers deep integration with application layer, which allows custom .NET "middleware" logic injection into the request pipeline. Node.js applications are very often published behind Nginx for various reasons and, with the help of Nginx "Secure Link" module, it's possible to offload static file serving tasks from node.js to Nginx, even if the files are not public. This module uses "shared secret" string (known to Nginx and the application) and expects a hash, based on this secret, to be present in the request to decide whether to proceed or return an error. Secure Link module may work in 2 alternative modes ( http://nginx.org/en/docs/http/ngx_http_secure_link_modul