[SERVER-13585] Race in dist lock after winning "tournament" round Created: 14/Apr/14  Updated: 11/Jul/16  Resolved: 17/Apr/14

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 2.6.0
Fix Version/s: 2.7.0

Type: Bug Priority: Major - P3
Reporter: Randolph Tan Assignee: Randolph Tan
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Operating System: ALL
Participants:
Linked BF Score: 0

 Description   

Setup:
4 threads, T0, T1, T2 & T3 trying to acquire lock.
Note: time stamp ordering is ts0 < ts1 < ts2 < ts3
Description of race:
1. T0, T1 and T2 simultaneously tries to acquire lock and got inconsistent updates in the config server. The state will end up like this:

Config0: T0 got this
Config1: T1 got this
Config2: T2 got this

2. T1 checks the current document at config0.
3. T2 checks the current document at config0.
4. T1 takes over the lock by overriding the ts with it's own timestamp. Note: query is

{ ts: ts0 }

, update is { ts: { $set: { ts: ts1 }}.
5. T2 tries to take over the lock by overriding the ts with it's own timestamp. Note: query is

{ ts: ts0 }

, update is { ts: { $set: { ts: ts2 }}. But since T1 already updated the document, this update ends up modifying nothing.
6. After seeing that T2's higher timestamp in config2, T1 backs out, registers itself for "deletion" by the lock pinger.
7. T2 finished checking each config server and determines that it has the highest timestamp, declares that it won the tournament and prepares to finalize the lock acquisition.
8. Lock pinger picks up T1's entry and sets the lock state to 0 (unlocked)
9. T3 trues tries to acquire lock, sees the lock document once touched by T1 is in state 0 and tries to grab it. Note: query is

{ ts: ts1 }

, update is set state to 1, ts to ts3.
10. T3 gets update not consistent exception since config2 already has T2's timestamp.
11. T2 sets all config server lock documents to be owned by T2 to state 2.
13. T2 thinks he already own the lock so goes ahead and grabs it.
14. T3 goes to the tournament round and since it has a higher timestamp than T2, it sets all config server lock documents to be have a timestamp of ts3.
15. T3 wins the tournament and grabs the lock.

Now, both T2 and T3 thinks they have the lock!

To reproduce this race more easily, simply add a sleepsecs(2) right on this line and run the sync6.js test:
https://github.com/mongodb/mongo/blob/r2.6.0/src/mongo/s/distlock.cpp#L984



 Comments   
Comment by Githook User [ 17/Apr/14 ]

Author:

{u'username': u'renctan', u'name': u'Randolph Tan', u'email': u'randolph@10gen.com'}

Message: SERVER-13585 Race in dist lock after winning "tournament" round

Remove the potential state transition from 1 to 0.
Branch: master
https://github.com/mongodb/mongo/commit/5e737823b0cd4e56e894fb504c406caa28d8fd34

Generated at Thu Feb 08 03:32:12 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.