Suppose you have a large set of records and you need to process them in random batches over longer period of time. By "random batches" I mean subsets, containing random elements from the full set. The solution we've found working good for us is based on the following steps:
- Load unprocessed record ids into memory;
- Periodically extract a random batch of ids;
- Process extracted records and persist them as processed.
Comments