[SERVER-11376] cleanupOrphaned errMsg and log contains moveChunk Created: 25/Oct/13  Updated: 11/Jul/16  Resolved: 13/Jan/14

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 2.5.3
Fix Version/s: 2.5.5

Type: Bug Priority: Minor - P4
Reporter: A. Jesse Jiryu Davis Assignee: Randolph Tan
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to DOCS-1838 Document cleanup orphan data cmd Closed
is related to SERVER-8598 Add command to cleanup orphaned data ... Closed
Participants:

 Description   

Original title:
"cleanupOrphaned with secondaryThrottle is both w=2 and w=majority"

Original Description:

cleanupOrphaned uses a range deleter, which always waits up to an hour for a majority of the replica set to catch up before returning from deleteRange().

https://github.com/mongodb/mongo/blob/master/src/mongo/db/range_deleter_db_env.cpp#L121

cleanupOrphaned with the secondaryThrottle parameter also waits up to a minute after deleting each document for one secondary to replicate the delete (w=2).

https://github.com/mongodb/mongo/blob/master/src/mongo/db/dbhelpers.cpp#L405

This is an odd combination of not-quite-identical write concerns; I'm opening this ticket to investigate if this behavior is by design.



 Comments   
Comment by Githook User [ 13/Jan/14 ]

Author:

{u'username': u'renctan', u'name': u'Randolph Tan', u'email': u'randolph@10gen.com'}

Message: SERVER-11376 cleanupOrphaned errMsg and log contains moveChunk
Branch: master
https://github.com/mongodb/mongo/commit/24e794f5d3515469685a45703ca0d0505d4b5687

Comment by Daniel Pasette (Inactive) [ 08/Jan/14 ]

cleanup the messaging only.

Comment by Greg Studer [ 25/Oct/13 ]

The 1hr timeout logic is meant to prevent users from removing the next range of data until the majority of secondaries have caught up, but it's not well encapsulated.

It does prevent the user from accidentally introducing a ton of secondary lag from using cleanupOrphaned, not sure that's a bad thing. Idea is secondaryThrottle ensures that we're replicating somewhere during a delete, and this check ensures we replicate everywhere afterwards.

This behavior is as-designed, but subject to improvement . Messaging is definitely a bug though.

Comment by Randolph Tan [ 25/Oct/13 ]

Not sure what was the motivation behind the 1 hr timeout. It was basically ported from the old code as is:

https://github.com/mongodb/mongo/blob/4bf8648b4196aff618c68ca2d814a1a13f48c3d2/src/mongo/s/d_migrate.cpp#L217-223

Comment by Greg Studer [ 25/Oct/13 ]

It's at least a bug because the range deleter error message is hardcoded to report "moveChunk".

I agree the WC behavior is weird - let's bounce this off renctan to see what the original motivation for the 1hr loop was in general.

Generated at Thu Feb 08 03:25:38 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.