[SERVER-7493] Possible for read starvation to cause migration to get stuck in critical section Created: 27/Oct/12  Updated: 11/Jul/16  Resolved: 16/Nov/12

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 2.2.0
Fix Version/s: 2.2.2, 2.3.1

Type: Bug Priority: Major - P3
Reporter: Spencer Brody (Inactive) Assignee: Spencer Brody (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
related to SERVER-7298 thousands of "waiting till out of cri... Closed
related to SERVER-7472 Replication lag can cause cluster to ... Closed
is related to SERVER-7361 segfault in mongod after failed moveC... Closed
is related to SERVER-8099 use condition instead of hard loop fo... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Participants:

 Description   

If a migration aborts it calls the done() method on MigrateFromStatus, which is what takes that server out of the critical section. That method, however, tries to acquire the database read lock on the database for which the migration is taking place. While in the critical section, however, all requests on that collection hang in running setShardVersion, which waits for the server to be out of the critical section. setShardVersion, however, takes the database's write-lock. So if you have a lot of queries coming in to that namespace on a lot of different threads, all the setShardVersion commands can cause read starvation on the database lock, preventing the migration from ever finishing.

Proposed fix is to change MigrateFromStatus::done to use a write lock rather than a read lock so that the lock acquisition will be greedy.



 Comments   
Comment by auto [ 16/Nov/12 ]

Author:

{u'date': u'2012-11-05T19:19:11Z', u'email': u'spencer@10gen.com', u'name': u'Spencer T Brody'}

Message: Use global lock when exiting critical section because it is greedier. Also add verbose logging around exiting critical section. SERVER-7500 SERVER-7493
Branch: v2.2
https://github.com/mongodb/mongo/commit/fed35f0c0829626dddeef23c3d9b9e373fe9353f

Comment by auto [ 16/Nov/12 ]

Author:

{u'date': u'2012-11-05T19:19:11Z', u'email': u'spencer@10gen.com', u'name': u'Spencer T Brody'}

Message: Use global lock when exiting critical section because it is greedier. Also add verbose logging around exiting critical section. SERVER-7500 SERVER-7493
Branch: master
https://github.com/mongodb/mongo/commit/7eb3fc28fbd9cb0464cbf6dba5ffb497ba2088e9

Comment by Spencer Brody (Inactive) [ 27/Oct/12 ]

https://github.com/mongodb/mongo/commit/4b50937dd119852b6c076902b748286b50306401

Generated at Thu Feb 08 03:14:42 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.