Setup:
4 threads, T0, T1, T2 & T3 trying to acquire lock.
Note: time stamp ordering is ts0 < ts1 < ts2 < ts3
Description of race:
1. T0, T1 and T2 simultaneously tries to acquire lock and got inconsistent updates in the config server. The state will end up like this:
Config0: T0 got this
Config1: T1 got this
Config2: T2 got this
2. T1 checks the current document at config0.
3. T2 checks the current document at config0.
4. T1 takes over the lock by overriding the ts with it's own timestamp. Note: query is
, update is { ts: { $set: { ts: ts1 }}.
5. T2 tries to take over the lock by overriding the ts with it's own timestamp. Note: query is
, update is { ts: { $set: { ts: ts2 }}. But since T1 already updated the document, this update ends up modifying nothing.
6. After seeing that T2's higher timestamp in config2, T1 backs out, registers itself for "deletion" by the lock pinger.
7. T2 finished checking each config server and determines that it has the highest timestamp, declares that it won the tournament and prepares to finalize the lock acquisition.
8. Lock pinger picks up T1's entry and sets the lock state to 0 (unlocked)
9. T3 trues tries to acquire lock, sees the lock document once touched by T1 is in state 0 and tries to grab it. Note: query is
, update is set state to 1, ts to ts3.
10. T3 gets update not consistent exception since config2 already has T2's timestamp.
11. T2 sets all config server lock documents to be owned by T2 to state 2.
13. T2 thinks he already own the lock so goes ahead and grabs it.
14. T3 goes to the tournament round and since it has a higher timestamp than T2, it sets all config server lock documents to be have a timestamp of ts3.
15. T3 wins the tournament and grabs the lock.
Now, both T2 and T3 thinks they have the lock!
To reproduce this race more easily, simply add a sleepsecs(2) right on this line and run the sync6.js test:
https://github.com/mongodb/mongo/blob/r2.6.0/src/mongo/s/distlock.cpp#L984