I came across a system issue recently that some of our batch jobs written as Ruby Rake tasks ended up having duplicated invocations, which causes either duplicated records in database or duplicated alerts in Slack. It is definitely an infra issue but until it is fixed from infra side, I need figure out a way to address it from the coding perspective. Trying to rewrite everything as idempotent as possible is not an option and neither is enhancing application level validations or database uniqueness constraints, because not all jobs can be idempotent and not all jobs write to databases, besides the issues are caused by race conditions not from two threads but from two processes. After some research and experiment, I decided to use Redis
as a simple distribution lock solution, which is already available in our application and needs zero extra setup, and it is a single instance therefore no replication sync concern. Redis
is single-threaded architecture so no matter how many concurrent clients try to write to it, it is guaranteed to be in sequential and atomic. I figure out I could make use of this nature to fix the issue faced.
A list of commands to do a quick migration for redis.