[SERVER-16837] Migrations should time out after 10 hrs waiting for replication Created: 14/Jan/15  Updated: 02/Feb/15  Resolved: 21/Jan/15

Status: Closed
Project: Core Server
Component/s: Internal Code, Sharding
Affects Version/s: 2.8.0-rc4
Fix Version/s: 3.0.0-rc6

Type: Bug Priority: Major - P3
Reporter: Kevin Pulo Assignee: Randolph Tan
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File 2.6.6.tar.gz     File 2.8.0-rc4.tar.gz    
Issue Links:
Depends
depends on SERVER-16880 Timer overflows at approximately 9223... Closed
Tested
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:
  • Start a 2 shard cluster with each shard being 2 data bearers and an arbiter.
  • Shutdown the secondary on the second shard.
  • db.getSiblingDB("config").settings.update( { "_id" : "balancer" }, { $set : { "_secondaryThrottle" : false } }, { upsert : true })
    sh.enableSharding("test")
    sh.shardCollection("test.test", {_id:1})
    db.test.insert({_id:-1})
    db.test.insert({_id:1})
    sh.splitAt("test.test", {_id:0})

  • Watch the logs and check the "when" field in the locks.

    db.getSiblingDB("config").locks.find()

Participants:

 Description   

2.8.0-rc4 does not time out migrations after they have been waiting for the pre-commit w:majority for 10 hours. This is a regression from 2.6, where such migrations are correctly aborted after 10 hours.

Full logs and config dumps are attached. Relevant bits + output are below:

2.8.0-rc4

2015-01-13T02:11:12.428+0000 I SHARDING [migrateThread] starting receiving-end of migration of chunk { _id: MinKey } -> { _id: 0.0 } for collection test.test from shard_0/kev1-0.kevincm1.8331.mongodbdns.com:28000,kev1-0.kevincm1.8331.mongodbdns.com:28001 at epoch 54b4704956127074c739b3f6
2015-01-13T02:11:12.429+0000 I SHARDING [migrateThread] Deleter starting delete for: test.test from { _id: MinKey } -> { _id: 0.0 }, with opId: 31603
2015-01-13T02:11:12.429+0000 I SHARDING [migrateThread] rangeDeleter deleted 0 documents for test.test from { _id: MinKey } -> { _id: 0.0 }
2015-01-13T02:11:12.429+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-13T02:11:13.430+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-13T02:11:14.430+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section
...
2015-01-13T12:11:04.816+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-13T12:11:05.816+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-13T12:11:05.816+0000 W SHARDING [migrateThread] migrate commit waiting for a majority of slaves for 'test.test' { _id: MinKey } -> { _id: 0.0 } waiting for: 54b47ec0:1
2015-01-13T12:11:06.816+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-13T12:11:07.817+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-13T12:11:08.817+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-13T12:11:09.817+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-13T12:11:10.817+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-13T12:11:11.817+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-13T12:11:12.818+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-13T12:11:13.818+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-13T12:11:14.818+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-13T12:11:15.818+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-13T12:11:16.818+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-13T12:11:17.818+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-13T12:11:18.819+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-13T12:11:19.819+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-13T12:11:20.819+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-13T12:11:20.819+0000 W SHARDING [migrateThread] migrate commit waiting for a majority of slaves for 'test.test' { _id: MinKey } -> { _id: 0.0 } waiting for: 54b47ec0:1
2015-01-13T12:11:21.819+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-13T12:11:22.819+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section
...
2015-01-14T02:47:43.070+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-14T02:47:44.071+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-14T02:47:45.071+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-14T02:47:46.071+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-14T02:47:46.071+0000 W SHARDING [migrateThread] migrate commit waiting for a majority of slaves for 'test.test' { _id: MinKey } -> { _id: 0.0 } waiting for: 54b47ec0:1
2015-01-14T02:47:47.071+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-14T02:47:48.071+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-14T02:47:49.071+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-14T02:47:50.072+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-14T02:47:51.072+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section

mongos> printjson(ISODate()); db.getSiblingDB("config").locks.find()
ISODate("2015-01-13T04:11:44.821Z")
{ "_id" : "configUpgrade", "state" : 0, "who" : "kev1-0:27017:1421111141:1804289383:mongosMain:846930886", "ts" : ObjectId("54b46f6556127074c739b3bf"), "process" : "kev1-0:27017:1421111141:1804289383", "when" : ISODate("2015-01-13T01:05:41.265Z"), "why" : "upgrading config database to new format v6" }
{ "_id" : "balancer", "state" : 2, "who" : "kev1-0:27017:1421111141:1804289383:Balancer:1681692777", "ts" : ObjectId("54b47ec056127074c739b40c"), "process" : "kev1-0:27017:1421111141:1804289383", "when" : ISODate("2015-01-13T02:11:12.413Z"), "why" : "doing balance round" }
{ "_id" : "test.test", "state" : 2, "who" : "kev1-0:28000:1421111452:808384670:conn14:1257153344", "ts" : ObjectId("54b47ec073e56bbf64caa124"), "process" : "kev1-0:28000:1421111452:808384670", "when" : ISODate("2015-01-13T02:11:12.422Z"),
"why" : "migrating chunk [{ : MinKey }, { : MaxKey }) in test.test" }
mongos> printjson(ISODate()); db.getSiblingDB("config").locks.find()
ISODate("2015-01-14T02:24:24.884Z")
{ "_id" : "configUpgrade", "state" : 0, "who" : "kev1-0:27017:1421111141:1804289383:mongosMain:846930886", "ts" : ObjectId("54b46f6556127074c739b3bf"), "process" : "kev1-0:27017:1421111141:1804289383", "when" : ISODate("2015-01-13T01:05:41.265Z"), "why" : "upgrading config database to new format v6" }
{ "_id" : "balancer", "state" : 2, "who" : "kev1-0:27017:1421111141:1804289383:Balancer:1681692777", "ts" : ObjectId("54b47ec056127074c739b40c"), "process" : "kev1-0:27017:1421111141:1804289383", "when" : ISODate("2015-01-13T02:11:12.413Z"), "why" : "doing balance round" }
{ "_id" : "test.test", "state" : 2, "who" : "kev1-0:28000:1421111452:808384670:conn14:1257153344", "ts" : ObjectId("54b47ec073e56bbf64caa124"), "process" : "kev1-0:28000:1421111452:808384670", "when" : ISODate("2015-01-13T02:11:12.422Z"),
"why" : "migrating chunk [{ : MinKey }, { : MaxKey }) in test.test" }

NOTE the when field for the balancer and collection locks hasn't changed in 24 hours.

2.6.6

2015-01-13T06:03:31.460+0000 [migrateThread] starting receiving-end of migration of chunk { _id: MinKey } -> { _id: 0.0 } for collection test.test from shard26_0/kev1-0.kevincm1.8331.mongodbdns.com:38000,kev1-0.kevincm1.8331.mongodbdns.com:38001 at epoch 54b4b3437d941a7cc774cb02
2015-01-13T06:03:31.460+0000 [migrateThread] starting new replica set monitor for replica set shard26_0 with seeds kev1-0.kevincm1.8331.mongodbdns.com:38000,kev1-0.kevincm1.8331.mongodbdns.com:38001
2015-01-13T06:03:31.462+0000 [migrateThread] allocating new ns file /data/balancer-majority-timeout-26_shard26_1_96/test.ns, filling with zeroes...
2015-01-13T06:03:31.575+0000 [migrateThread] build index on: test.test properties: { v: 1, key: { _id: 1 }, name: "_id_", ns: "test.test" }
2015-01-13T06:03:31.575+0000 [migrateThread]     added index to empty collection
2015-01-13T06:03:31.576+0000 [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-13T06:03:32.576+0000 [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-13T06:03:33.577+0000 [migrateThread] Waiting for replication to catch up before entering critical section
...
2015-01-13T16:03:22.105+0000 [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-13T16:03:23.105+0000 [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-13T16:03:23.105+0000 [migrateThread] warning: migrate commit waiting for 2 slaves for 'test.test' { _id: MinKey } -> { _id: 0.0 } waiting for: 54b4b533:2
2015-01-13T16:03:24.105+0000 [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-13T16:03:25.105+0000 [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-13T16:03:26.105+0000 [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-13T16:03:27.106+0000 [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-13T16:03:28.106+0000 [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-13T16:03:29.106+0000 [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-13T16:03:30.106+0000 [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-13T16:03:31.106+0000 [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-13T16:03:32.903+0000 [migrateThread] warning: migrate commit waiting for 2 slaves for 'test.test' { _id: MinKey } -> { _id: 0.0 } waiting for: 54b4b533:2
2015-01-13T16:03:32.905+0000 [migrateThread] warning: migrate commit waiting for 2 slaves for 'test.test' { _id: MinKey } -> { _id: 0.0 } waiting for: 54b4b533:2
2015-01-13T16:03:32.906+0000 [migrateThread] warning: migrate commit waiting for 2 slaves for 'test.test' { _id: MinKey } -> { _id: 0.0 } waiting for: 54b4b533:2
...
2015-01-13T16:04:02.894+0000 [migrateThread] warning: migrate commit waiting for 2 slaves for 'test.test' { _id: MinKey } -> { _id: 0.0 } waiting for: 54b4b533:2
2015-01-13T16:04:02.895+0000 [migrateThread] warning: migrate commit waiting for 2 slaves for 'test.test' { _id: MinKey } -> { _id: 0.0 } waiting for: 54b4b533:2
2015-01-13T16:04:02.897+0000 [migrateThread] warning: migrate commit waiting for 2 slaves for 'test.test' { _id: MinKey } -> { _id: 0.0 } waiting for: 54b4b533:2
2015-01-13T16:04:02.898+0000 [migrateThread] scoped connection to shard26_0/kev1-0.kevincm1.8331.mongodbdns.com:38000,kev1-0.kevincm1.8331.mongodbdns.com:38001 not being returned to the pool
2015-01-13T16:04:02.898+0000 [migrateThread] about to log metadata event: { _id: "kev1-0-2015-01-13T16:04:02-54b541f29fcdf3ea989d3fae", server: "kev1-0", clientAddr: ":27017", time: new Date(1421165042898), what: "moveChunk.to", ns: "test.test", details: { min: { _id: MinKey }, max: { _id: 0.0 }, step 1 of 5: 115, step 2 of 5: 0, step 3 of 5: 0, step 4 of 5: 0, note: "aborted", errmsg: "timed out waiting for commit" } }
2015-01-13T16:04:08.924+0000 [migrateThread] starting receiving-end of migration of chunk { _id: MinKey } -> { _id: 0.0 } for collection test.test from shard26_0/kev1-0.kevincm1.8331.mongodbdns.com:38000,kev1-0.kevincm1.8331.mongodbdns.com:38001 at epoch 54b4b3437d941a7cc774cb02
2015-01-13T16:04:08.926+0000 [migrateThread] warning: moveChunkCmd deleted data already in chunk # objects: 1
2015-01-13T16:04:08.926+0000 [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-13T16:04:09.926+0000 [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-13T16:04:09.927+0000 [migrateThread] warning: migrate commit waiting for 2 slaves for 'test.test' { _id: MinKey } -> { _id: 0.0 } waiting for: 54b541f8:3
2015-01-13T16:04:10.927+0000 [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-13T16:04:11.927+0000 [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-13T16:04:12.927+0000 [migrateThread] Waiting for replication to catch up before entering critical section
...
2015-01-14T02:04:00.576+0000 [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-14T02:04:01.577+0000 [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-14T02:04:01.577+0000 [migrateThread] warning: migrate commit waiting for 2 slaves for 'test.test' { _id: MinKey } -> { _id: 0.0 } waiting for: 54b541f8:3
2015-01-14T02:04:02.577+0000 [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-14T02:04:03.577+0000 [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-14T02:04:04.577+0000 [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-14T02:04:05.577+0000 [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-14T02:04:06.578+0000 [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-14T02:04:07.578+0000 [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-14T02:04:08.578+0000 [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-14T02:04:09.733+0000 [migrateThread] warning: migrate commit waiting for 2 slaves for 'test.test' { _id: MinKey } -> { _id: 0.0 } waiting for: 54b541f8:3
2015-01-14T02:04:09.734+0000 [migrateThread] warning: migrate commit waiting for 2 slaves for 'test.test' { _id: MinKey } -> { _id: 0.0 } waiting for: 54b541f8:3
2015-01-14T02:04:09.735+0000 [migrateThread] warning: migrate commit waiting for 2 slaves for 'test.test' { _id: MinKey } -> { _id: 0.0 } waiting for: 54b541f8:3
...
2015-01-14T02:04:39.723+0000 [migrateThread] warning: migrate commit waiting for 2 slaves for 'test.test' { _id: MinKey } -> { _id: 0.0 } waiting for: 54b541f8:3
2015-01-14T02:04:39.725+0000 [migrateThread] warning: migrate commit waiting for 2 slaves for 'test.test' { _id: MinKey } -> { _id: 0.0 } waiting for: 54b541f8:3
2015-01-14T02:04:39.727+0000 [migrateThread] warning: migrate commit waiting for 2 slaves for 'test.test' { _id: MinKey } -> { _id: 0.0 } waiting for: 54b541f8:3
2015-01-14T02:04:39.727+0000 [migrateThread] scoped connection to shard26_0/kev1-0.kevincm1.8331.mongodbdns.com:38000,kev1-0.kevincm1.8331.mongodbdns.com:38001 not being returned to the pool
2015-01-14T02:04:39.727+0000 [migrateThread] about to log metadata event: { _id: "kev1-0-2015-01-14T02:04:39-54b5ceb79fcdf3ea989d3faf", server: "kev1-0", clientAddr: ":27017", time: new Date(1421201079727), what: "moveChunk.to", ns: "test.test", details: { min: { _id: MinKey }, max: { _id: 0.0 }, step 1 of 5: 1, step 2 of 5: 0, step 3 of 5: 0, step 4 of 5: 0, note: "aborted", errmsg: "timed out waiting for commit" } }
2015-01-14T02:04:45.754+0000 [migrateThread] starting receiving-end of migration of chunk { _id: MinKey } -> { _id: 0.0 } for collection test.test from shard26_0/kev1-0.kevincm1.8331.mongodbdns.com:38000,kev1-0.kevincm1.8331.mongodbdns.com:38001 at epoch 54b4b3437d941a7cc774cb02
2015-01-14T02:04:45.754+0000 [migrateThread] warning: moveChunkCmd deleted data already in chunk # objects: 1
2015-01-14T02:04:45.755+0000 [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-14T02:04:46.755+0000 [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-14T02:04:47.755+0000 [migrateThread] Waiting for replication to catch up before entering critical section
...

mongos> printjson(ISODate()); db.getSiblingDB("config").locks.find()
ISODate("2015-01-13T08:03:21.334Z")
{ "_id" : "configUpgrade", "state" : 0, "who" : "kev1-0:37017:1421128191:1804289383:mongosMain:846930886", "ts" : ObjectId("54b4b1ff7d941a7cc774cac3"), "process" : "kev1-0:37017:1421128191:1804289383", "when" : ISODate("2015-01-13T05:49:51.669Z"), "why" : "upgrading config database to new format v5" }
{ "_id" : "balancer", "state" : 2, "who" : "kev1-0:37017:1421128191:1804289383:Balancer:1681692777", "ts" : ObjectId("54b4b5337d941a7cc774cb56"), "process" : "kev1-0:37017:1421128191:1804289383", "when" : ISODate("2015-01-13T06:03:31.442Z"), "why" : "doing balance round" }
{ "_id" : "test.test", "state" : 2, "who" : "kev1-0:38000:1421129010:1611849556:conn23:745631594", "ts" : ObjectId("54b4b53363588f4e90ed5a12"), "process" : "kev1-0:38000:1421129010:1611849556", "when" : ISODate("2015-01-13T06:03:31.451Z"), "why" : "migrate-{ _id: MinKey }" }
mongos> printjson(ISODate()); db.getSiblingDB("config").locks.find()
ISODate("2015-01-14T01:41:32.572Z")
{ "_id" : "configUpgrade", "state" : 0, "who" : "kev1-0:37017:1421128191:1804289383:mongosMain:846930886", "ts" : ObjectId("54b4b1ff7d941a7cc774cac3"), "process" : "kev1-0:37017:1421128191:1804289383", "when" : ISODate("2015-01-13T05:49:51.669Z"), "why" : "upgrading config database to new format v5" }
{ "_id" : "balancer", "state" : 2, "who" : "kev1-0:37017:1421128191:1804289383:Balancer:1681692777", "ts" : ObjectId("54b541f87d941a7cc774cb57"), "process" : "kev1-0:37017:1421128191:1804289383", "when" : ISODate("2015-01-13T16:04:08.907Z"), "why" : "doing balance round" }
{ "_id" : "test.test", "state" : 2, "who" : "kev1-0:38000:1421129010:1611849556:conn23:745631594", "ts" : ObjectId("54b541f863588f4e90ed5a15"), "process" : "kev1-0:38000:1421129010:1611849556", "when" : ISODate("2015-01-13T16:04:08.916Z"), "why" : "migrate-{ _id: MinKey }" }
mongos> printjson(ISODate()); db.getSiblingDB("config").locks.find()
ISODate("2015-01-14T02:53:25.690Z")
{ "_id" : "configUpgrade", "state" : 0, "who" : "kev1-0:37017:1421128191:1804289383:mongosMain:846930886", "ts" : ObjectId("54b4b1ff7d941a7cc774cac3"), "process" : "kev1-0:37017:1421128191:1804289383", "when" : ISODate("2015-01-13T05:49:51.669Z"), "why" : "upgrading config database to new format v5" }
{ "_id" : "balancer", "state" : 2, "who" : "kev1-0:37017:1421128191:1804289383:Balancer:1681692777", "ts" : ObjectId("54b5cebd7d941a7cc774cb58"), "process" : "kev1-0:37017:1421128191:1804289383", "when" : ISODate("2015-01-14T02:04:45.736Z"), "why" : "doing balance round" }
{ "_id" : "test.test", "state" : 2, "who" : "kev1-0:38000:1421129010:1611849556:conn23:745631594", "ts" : ObjectId("54b5cebd63588f4e90ed5a18"), "process" : "kev1-0:38000:1421129010:1611849556", "when" : ISODate("2015-01-14T02:04:45.746Z"), "why" : "migrate-{ _id: MinKey }" }

NOTE the when field for the balancer and collection locks changes every 10 hours when the migration fails and restarts.



 Comments   
Comment by Randolph Tan [ 21/Jan/15 ]

Confirmed that SERVER-16880 fixes this.

Comment by Kevin Pulo [ 14/Jan/15 ]

Looks like the lock has now been auto-forced (after ~24hrs), and the balancer is trying to re-run and re-migrate the chunk, which is of course failing because there is already a migration in progress.

mongos> printjson(ISODate()); db.getSiblingDB("config").locks.find()
ISODate("2015-01-14T10:24:34.171Z")
{ "_id" : "configUpgrade", "state" : 0, "who" : "kev1-0:27017:1421111141:1804289383:mongosMain:846930886", "ts" : ObjectId("54b46f6556127074c739b3bf"), "process" : "kev1-0:27017:1421111141:1804289383", "when" : ISODate("2015-01-13T01:05:41.265Z"), "why" : "upgrading config database to new format v6" }
{ "_id" : "balancer", "state" : 0, "who" : "kev1-0:27017:1421111141:1804289383:Balancer:1681692777", "ts" : ObjectId("54b643e056127074c739c97a"), "process" : "kev1-0:27017:1421111141:1804289383", "when" : ISODate("2015-01-14T10:24:32.314Z"), "why" : "doing balance round" }
{ "_id" : "test.test", "state" : 0, "who" : "kev1-0:28000:1421111452:808384670:conn14:1257153344", "ts" : ObjectId("54b643e073e56bbf64cac149"), "process" : "kev1-0:28000:1421111452:808384670", "when" : ISODate("2015-01-14T10:24:32.323Z"), "why" : "migrating chunk [{ : MinKey }, { : MaxKey }) in test.test" }
mongos> printjson(ISODate()); db.getSiblingDB("config").locks.find()
ISODate("2015-01-14T10:27:07.297Z")
{ "_id" : "configUpgrade", "state" : 0, "who" : "kev1-0:27017:1421111141:1804289383:mongosMain:846930886", "ts" : ObjectId("54b46f6556127074c739b3bf"), "process" : "kev1-0:27017:1421111141:1804289383", "when" : ISODate("2015-01-13T01:05:41.265Z"), "why" : "upgrading config database to new format v6" }
{ "_id" : "balancer", "state" : 0, "who" : "kev1-0:27017:1421111141:1804289383:Balancer:1681692777", "ts" : ObjectId("54b6447656127074c739c998"), "process" : "kev1-0:27017:1421111141:1804289383", "when" : ISODate("2015-01-14T10:27:02.741Z"), "why" : "doing balance round" }
{ "_id" : "test.test", "state" : 0, "who" : "kev1-0:28000:1421111452:808384670:conn14:1257153344", "ts" : ObjectId("54b6447673e56bbf64cac176"), "process" : "kev1-0:28000:1421111452:808384670", "when" : ISODate("2015-01-14T10:27:02.750Z"), "why" : "migrating chunk [{ : MinKey }, { : MaxKey }) in test.test" }

Shard 1 primary (FROM-shard):

2015-01-14T10:24:22.296+0000 I SHARDING [conn14] received moveChunk request: { moveChunk: "test.test", from: "shard_0/kev1-0.kevincm1.8331.mongodbdns.com:28000,kev1-0.kevincm1.8331.mongodbdns.com:28001", to: "shard_1/kev1-0.kevincm1.8331.mongodbdns.com:28003,kev1-0.kevincm1.8331.mongodbdns.com:28004", fromShard: "shard_0", toShard: "shard_1", min: { _id: MinKey }, max: { _id: 0.0 }, maxChunkSizeBytes: 67108864, shardId: "test.test-_id_MinKey", configdb: "balancer-majority-timeout-config-0.kevincm1.8331.mongodbdns.com:27019", secondaryThrottle: false, waitForDelete: false, maxTimeMS: 0, epoch: ObjectId('54b4704956127074c739b3f6') }2015-01-14T10:24:22.297+0000 I SHARDING [conn14] distributed lock 'test.test/kev1-0:28000:1421111452:808384670' acquired, ts : 54b643d673e56bbf64cac146
2015-01-14T10:24:22.297+0000 I SHARDING [conn14] about to log metadata event: { _id: "kev1-0-2015-01-14T10:24:22-54b643d673e56bbf64cac147", server: "kev1-0", clientAddr: "172.31.2.226:33312", time: new Date(1421231062297), what: "moveChunk.start", ns: "test.test", details: { min: { _id: MinKey }, max: { _id: 0.0 }, from: "shard_0", to: "shard_1" } }
2015-01-14T10:24:22.302+0000 I SHARDING [conn14] remotely refreshing metadata for test.test based on current shard version 1|2||54b4704956127074c739b3f6, current metadata version is 1|2||54b4704956127074c739b3f6
2015-01-14T10:24:22.302+0000 I SHARDING [conn14] metadata of collection test.test already up to date (shard version : 1|2||54b4704956127074c739b3f6, took 0ms)
2015-01-14T10:24:22.302+0000 I SHARDING [conn14] moveChunk request accepted at version 1|2||54b4704956127074c739b3f6
2015-01-14T10:24:22.302+0000 I SHARDING [conn14] moveChunk number of documents: 1
2015-01-14T10:24:22.302+0000 W SHARDING [conn14] moveChunk failed to engage TO-shard in the data transfer: migrate already in progress
2015-01-14T10:24:22.302+0000 I SHARDING [conn14] MigrateFromStatus::done About to acquire global lock to exit critical section
2015-01-14T10:24:22.302+0000 I SHARDING [conn14] distributed lock 'test.test/kev1-0:28000:1421111452:808384670' unlocked.
2015-01-14T10:24:22.302+0000 I SHARDING [conn14] about to log metadata event: { _id: "kev1-0-2015-01-14T10:24:22-54b643d673e56bbf64cac148", server: "kev1-0", clientAddr: "172.31.2.226:33312", time: new Date(1421231062302), what: "moveChunk.from", ns: "test.test", details: { min: { _id: MinKey }, max: { _id: 0.0 }, step 1 of 6: 0, step 2 of 6: 5, to: "shard_1", from: "shard_0", note: "aborted", errmsg: "moveChunk failed to engage TO-shard in the data transfer: migrate already in progress" } }
2015-01-14T10:24:25.703+0000 I SHARDING [LockPinger] cluster balancer-majority-timeout-config-0.kevincm1.8331.mongodbdns.com:27019 pinged successfully at Wed Jan 14 10:24:25 2015 by distributed lock pinger 'balancer-majority-timeout-config-0.kevincm1.8331.mongodbdns.com:27019/kev1-0:28000:1421111452:808384670', sleeping for 30000ms
2015-01-14T10:24:29.763+0000 W NETWORK  [ReplicaSetMonitorWatcher] Failed to connect to 172.31.2.226:28004, reason: errno:111 Connection refused
2015-01-14T10:24:32.323+0000 I SHARDING [conn14] received moveChunk request: { moveChunk: "test.test", from: "shard_0/kev1-0.kevincm1.8331.mongodbdns.com:28000,kev1-0.kevincm1.8331.mongodbdns.com:28001", to: "shard_1/kev1-0.kevincm1.8331.mongodbdns.com:28003,kev1-0.kevincm1.8331.mongodbdns.com:28004", fromShard: "shard_0", toShard: "shard_1", min: { _id: MinKey }, max: { _id: 0.0 }, maxChunkSizeBytes: 67108864, shardId: "test.test-_id_MinKey", configdb: "balancer-majority-timeout-config-0.kevincm1.8331.mongodbdns.com:27019", secondaryThrottle: false, waitForDelete: false, maxTimeMS: 0, epoch: ObjectId('54b4704956127074c739b3f6') }2015-01-14T10:24:32.324+0000 I SHARDING [conn14] distributed lock 'test.test/kev1-0:28000:1421111452:808384670' acquired, ts : 54b643e073e56bbf64cac149
2015-01-14T10:24:32.324+0000 I SHARDING [conn14] about to log metadata event: { _id: "kev1-0-2015-01-14T10:24:32-54b643e073e56bbf64cac14a", server: "kev1-0", clientAddr: "172.31.2.226:33312", time: new Date(1421231072324), what: "moveChunk.start", ns: "test.test", details: { min: { _id: MinKey }, max: { _id: 0.0 }, from: "shard_0", to: "shard_1" } }
2015-01-14T10:24:32.329+0000 I SHARDING [conn14] remotely refreshing metadata for test.test based on current shard version 1|2||54b4704956127074c739b3f6, current metadata version is 1|2||54b4704956127074c739b3f6
2015-01-14T10:24:32.329+0000 I SHARDING [conn14] metadata of collection test.test already up to date (shard version : 1|2||54b4704956127074c739b3f6, took 0ms)
2015-01-14T10:24:32.329+0000 I SHARDING [conn14] moveChunk request accepted at version 1|2||54b4704956127074c739b3f6
2015-01-14T10:24:32.329+0000 I SHARDING [conn14] moveChunk number of documents: 1
2015-01-14T10:24:32.329+0000 W SHARDING [conn14] moveChunk failed to engage TO-shard in the data transfer: migrate already in progress
2015-01-14T10:24:32.329+0000 I SHARDING [conn14] MigrateFromStatus::done About to acquire global lock to exit critical section
2015-01-14T10:24:32.329+0000 I SHARDING [conn14] distributed lock 'test.test/kev1-0:28000:1421111452:808384670' unlocked.
2015-01-14T10:24:32.329+0000 I SHARDING [conn14] about to log metadata event: { _id: "kev1-0-2015-01-14T10:24:32-54b643e073e56bbf64cac14b", server: "kev1-0", clientAddr: "172.31.2.226:33312", time: new Date(1421231072329), what: "moveChunk.from", ns: "test.test", details: { min: { _id: MinKey }, max: { _id: 0.0 }, step 1 of 6: 0, step 2 of 6: 5, to: "shard_1", from: "shard_0", note: "aborted", errmsg: "moveChunk failed to engage TO-shard in the data transfer: migrate already in progress" } }

Shard 2 primary (TO-shard):

2015-01-14T10:25:46.132+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-14T10:25:47.132+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-14T10:25:48.132+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-14T10:25:49.132+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-14T10:25:50.132+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-14T10:25:51.133+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-14T10:25:51.133+0000 W SHARDING [migrateThread] migrate commit waiting for a majority of slaves for 'test.test' { _id: MinKey } -> { _id: 0.0 } waiting for: 54b47ec0:1
2015-01-14T10:25:52.133+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-14T10:25:53.133+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-14T10:25:54.133+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-14T10:25:55.133+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-14T10:25:56.134+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-14T10:25:57.134+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section

Mongos:

2015-01-14T02:45:35.365+0000 I NETWORK  [mongosMain] connection accepted from 172.31.2.226:53633 #18233 (8 connections now open)
2015-01-14T02:45:36.297+0000 W NETWORK  [ReplicaSetMonitorWatcher] Failed to connect to 172.31.2.226:28004, reason: errno:111 Connection refused
2015-01-14T02:45:40.373+0000 I NETWORK  [mongosMain] connection accepted from 172.31.2.226:53729 #18234 (9 connections now open)
2015-01-14T02:45:44.116+0000 I SHARDING [LockPinger] cluster balancer-majority-timeout-config-0.kevincm1.8331.mongodbdns.com:27019 pinged successfully at Wed Jan 14 02:45:44 2015 by distributed lock pinger 'balancer-majority-timeout-config-0.kevincm1.8331.mongodbdns.com:27019/kev1-0:27017:1421111141:1804289383', sleeping for 30000ms
2015-01-14T02:45:45.484+0000 I NETWORK  [conn18233] end connection 172.31.2.226:53633 (8 connections now open)
2015-01-14T02:45:45.484+0000 I NETWORK  [conn18234] end connection 172.31.2.226:53729 (7 connections now open)
2015-01-14T02:45:45.484+0000 I NETWORK  [mongosMain] connection accepted from 172.31.2.226:53882 #18235 (8 connections now open)
2015-01-14T02:45:46.298+0000 W NETWORK  [ReplicaSetMonitorWatcher] Failed to connect to 172.31.2.226:28004, reason: errno:111 Connection refused
2015-01-14T02:45:50.485+0000 I NETWORK  [mongosMain] connection accepted from 172.31.2.226:53986 #18236 (9 connections now open)
2015-01-14T02:45:55.605+0000 I NETWORK  [conn18235] end connection 172.31.2.226:53882 (8 connections now open)
2015-01-14T02:45:55.605+0000 I NETWORK  [conn18236] end connection 172.31.2.226:53986 (8 connections now open)
2015-01-14T02:45:55.605+0000 I NETWORK  [mongosMain] connection accepted from 172.31.2.226:54119 #18237 (8 connections now open)
2015-01-14T02:45:56.299+0000 W NETWORK  [ReplicaSetMonitorWatcher] Failed to connect to 172.31.2.226:28004, reason: errno:111 Connection refused
2015-01-14T02:46:00.606+0000 I NETWORK  [mongosMain] connection accepted from 172.31.2.226:54233 #18238 (9 connections now open)
2015-01-14T02:46:05.722+0000 I NETWORK  [conn18237] end connection 172.31.2.226:54119 (8 connections now open)
2015-01-14T02:46:05.722+0000 I NETWORK  [conn18238] end connection 172.31.2.226:54233 (8 connections now open)
2015-01-14T02:46:05.723+0000 I NETWORK  [mongosMain] connection accepted from 172.31.2.226:54383 #18239 (8 connections now open)
2015-01-14T02:46:06.300+0000 W NETWORK  [ReplicaSetMonitorWatcher] Failed to connect to 172.31.2.226:28004, reason: errno:111 Connection refused
2015-01-14T02:46:06.893+0000 I SHARDING [Balancer] moveChunk result: { cause: { active: true, ns: "test.test", from: "shard_0/kev1-0.kevincm1.8331.mongodbdns.com:28000,kev1-0.kevincm1.8331.mongodbdns.com:28001", min: { _id: MinKey }, max: { _id: 0.0 }, shardKeyPattern: { _id: 1.0 }, state: "catchup", counts: { cloned: 1, clonedBytes: 18, catchup: 0, steady: 0 }, ok: 0.0, errmsg: "" }, ok: 0.0, errmsg: "_recvChunkCommit failed!" }
2015-01-14T02:46:06.894+0000 I SHARDING [Balancer] balancer move failed: { cause: { active: true, ns: "test.test", from: "shard_0/kev1-0.kevincm1.8331.mongodbdns.com:28000,kev1-0.kevincm1.8331.mongodbdns.com:28001", min: { _id: MinKey }, max: { _id: 0.0 }, shardKeyPattern: { _id: 1.0 }, state: "catchup", counts: { cloned: 1, clonedBytes: 18, catchup: 0, steady: 0 }, ok: 0.0, errmsg: "" }, ok: 0.0, errmsg: "_recvChunkCommit failed!" } from: shard_0 to: shard_1 chunk:  min: { _id: MinKey } max: { _id: 0.0 }
2015-01-14T02:46:06.900+0000 I SHARDING [Balancer] distributed lock 'balancer/kev1-0:27017:1421111141:1804289383' unlocked.
2015-01-14T02:46:10.731+0000 I NETWORK  [mongosMain] connection accepted from 172.31.2.226:54479 #18240 (9 connections now open)
2015-01-14T02:46:14.117+0000 I SHARDING [LockPinger] cluster balancer-majority-timeout-config-0.kevincm1.8331.mongodbdns.com:27019 pinged successfully at Wed Jan 14 02:46:14 2015 by distributed lock pinger 'balancer-majority-timeout-config-0.kevincm1.8331.mongodbdns.com:27019/kev1-0:27017:1421111141:1804289383', sleeping for 30000ms
2015-01-14T02:46:15.844+0000 I NETWORK  [conn18239] end connection 172.31.2.226:54383 (8 connections now open)
2015-01-14T02:46:15.845+0000 I NETWORK  [conn18240] end connection 172.31.2.226:54479 (7 connections now open)
2015-01-14T02:46:15.845+0000 I NETWORK  [mongosMain] connection accepted from 172.31.2.226:54627 #18241 (8 connections now open)
2015-01-14T02:46:16.301+0000 W NETWORK  [ReplicaSetMonitorWatcher] Failed to connect to 172.31.2.226:28004, reason: errno:111 Connection refused
2015-01-14T02:46:16.902+0000 I SHARDING [Balancer] distributed lock 'balancer/kev1-0:27017:1421111141:1804289383' acquired, ts : 54b5d87856127074c739b40e
2015-01-14T02:46:16.909+0000 I SHARDING [Balancer]  ns: test.test going to move { _id: "test.test-_id_MinKey", ns: "test.test", min: { _id: MinKey }, max: { _id: 0.0 }, version: Timestamp 1000|1, versionEpoch: ObjectId('54b4704956127074c739b3f6'), lastmod: Timestamp 1000|1, lastmodEpoch: ObjectId('54b4704956127074c739b3f6'), shard: "shard_0" } from: shard_0 to: shard_1 tag []
2015-01-14T02:46:16.909+0000 I SHARDING [Balancer] moving chunk ns: test.test moving ( ns: test.test, shard: shard_0:shard_0/kev1-0.kevincm1.8331.mongodbdns.com:28000,kev1-0.kevincm1.8331.mongodbdns.com:28001, lastmod: 1|1||000000000000000000000000, min: { _id: MinKey }, max: { _id: 0.0 }) shard_0:shard_0/kev1-0.kevincm1.8331.mongodbdns.com:28000,kev1-0.kevincm1.8331.mongodbdns.com:28001 -> shard_1:shard_1/kev1-0.kevincm1.8331.mongodbdns.com:28003,kev1-0.kevincm1.8331.mongodbdns.com:28004
2015-01-14T02:46:16.922+0000 I SHARDING [Balancer] moveChunk result: { cause: { ok: 0.0, errmsg: "migrate already in progress" }, ok: 0.0, errmsg: "moveChunk failed to engage TO-shard in the data transfer: migrate already in progress" }
2015-01-14T02:46:16.922+0000 I SHARDING [Balancer] balancer move failed: { cause: { ok: 0.0, errmsg: "migrate already in progress" }, ok: 0.0, errmsg: "moveChunk failed to engage TO-shard in the data transfer: migrate already in progress" } from: shard_0 to: shard_1 chunk:  min: { _id: MinKey } max: { _id: 0.0 }
2015-01-14T02:46:16.927+0000 I SHARDING [Balancer] distributed lock 'balancer/kev1-0:27017:1421111141:1804289383' unlocked.

Generated at Thu Feb 08 03:42:26 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.