[SERVER-11376] cleanupOrphaned errMsg and log contains moveChunk Created: 25/Oct/13 Updated: 11/Jul/16 Resolved: 13/Jan/14 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 2.5.3 |
| Fix Version/s: | 2.5.5 |
| Type: | Bug | Priority: | Minor - P4 |
| Reporter: | A. Jesse Jiryu Davis | Assignee: | Randolph Tan |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Participants: | |||||||||||||
| Description |
|
Original title: Original Description: cleanupOrphaned uses a range deleter, which always waits up to an hour for a majority of the replica set to catch up before returning from deleteRange(). https://github.com/mongodb/mongo/blob/master/src/mongo/db/range_deleter_db_env.cpp#L121 cleanupOrphaned with the secondaryThrottle parameter also waits up to a minute after deleting each document for one secondary to replicate the delete (w=2). https://github.com/mongodb/mongo/blob/master/src/mongo/db/dbhelpers.cpp#L405 This is an odd combination of not-quite-identical write concerns; I'm opening this ticket to investigate if this behavior is by design. |
| Comments |
| Comment by Githook User [ 13/Jan/14 ] |
|
Author: {u'username': u'renctan', u'name': u'Randolph Tan', u'email': u'randolph@10gen.com'}Message: |
| Comment by Daniel Pasette (Inactive) [ 08/Jan/14 ] |
|
cleanup the messaging only. |
| Comment by Greg Studer [ 25/Oct/13 ] |
|
The 1hr timeout logic is meant to prevent users from removing the next range of data until the majority of secondaries have caught up, but it's not well encapsulated. It does prevent the user from accidentally introducing a ton of secondary lag from using cleanupOrphaned, not sure that's a bad thing. Idea is secondaryThrottle ensures that we're replicating somewhere during a delete, and this check ensures we replicate everywhere afterwards. This behavior is as-designed, but subject to improvement |
| Comment by Randolph Tan [ 25/Oct/13 ] |
|
Not sure what was the motivation behind the 1 hr timeout. It was basically ported from the old code as is: |
| Comment by Greg Studer [ 25/Oct/13 ] |
|
It's at least a bug because the range deleter error message is hardcoded to report "moveChunk". I agree the WC behavior is weird - let's bounce this off renctan to see what the original motivation for the 1hr loop was in general. |