[SERVER-16806] After 1 hour timeout, rangeDeleter incorrectly reports "timed out after 0 seconds" Created: 12/Jan/15  Updated: 23/Jan/15  Resolved: 21/Jan/15

Status: Closed
Project: Core Server
Component/s: Logging
Affects Version/s: 2.8.0-rc4
Fix Version/s: 3.0.0-rc6

Type: Bug Priority: Minor - P4
Reporter: Kevin Pulo Assignee: Andy Schwerin
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

Repro is basically identical to that in SERVER-16805, i.e. deliberately prevent w:majority on the TO shard for a migration:

mlaunch init --replicaset --nodes 2 --arbiter --smallfiles --oplogSize 100 --sharded 2 --config 1 --mongos 1 --port 33333
mlaunch stop shard02 secondary
mongo --port 33333
db.getSiblingDB("config").settings.update( { "_id" : "balancer" }, { $set : { "_secondaryThrottle" : false } }, { upsert : true })
sh.enableSharding("test")
sh.shardCollection("test.test", {_id:1})
db.test.insert({_id:-1})
db.test.insert({_id:1})
sh.splitAt("test.test", {_id:0})

Participants:

 Description   

2015-01-12T21:31:51.993+1100 I SHARDING [migrateThread] Deleter starting delete for: test.test from { _id: MinKey } -> { _id: 0.0 }, with opId: 749
2015-01-12T21:31:51.993+1100 I SHARDING [migrateThread] rangeDeleter deleted 0 documents for test.test from { _id: MinKey } -> { _id: 0.0 }
2015-01-12T22:31:51.995+1100 I SHARDING [migrateThread] rangeDeleter timed out after 0 seconds while waiting for deletions to be replicated to majority nodes
2015-01-12T22:31:51.995+1100 W SHARDING [migrateThread] Failed to queue delete for migrate abort: rangeDeleter timed out after 0 seconds while waiting for deletions to be replicated to majority nodes

Note the timestamps — clearly exactly 1 hr has passed (kWTimeoutMillis in db/range_deleter.cpp:271), yet the message says "timed out after 0 seconds".



 Comments   
Comment by Githook User [ 21/Jan/15 ]

Author:

{u'username': u'andy10gen', u'name': u'Andy Schwerin', u'email': u'schwerin@mongodb.com'}

Message: SERVER-16806 Correctly report elapsed time in messages in range_deleter.cpp.
Branch: master
https://github.com/mongodb/mongo/commit/399f07467a56ac2d7334c9b627815c8610c8a8ee

Comment by Kevin Pulo [ 12/Jan/15 ]

schwerin, this w:majority (with 1hr timeout) for rangeDeleter seems to be new behaviour that wasn't present in 2.6. The base commit was 017fc448ba39e0ee731b84b21991c7a3646a8724 for SERVER-14274. So there are no equivalent messages in 2.6, because there is no w:majority and no 1 hour timeout.

Comment by Andy Schwerin [ 12/Jan/15 ]

kevin.pulo@10gen.com, can you check to see if this is a regression from 2.6?

Generated at Thu Feb 08 03:42:20 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.