|
Looks like the lock has now been auto-forced (after ~24hrs), and the balancer is trying to re-run and re-migrate the chunk, which is of course failing because there is already a migration in progress.
mongos> printjson(ISODate()); db.getSiblingDB("config").locks.find()
|
ISODate("2015-01-14T10:24:34.171Z")
|
{ "_id" : "configUpgrade", "state" : 0, "who" : "kev1-0:27017:1421111141:1804289383:mongosMain:846930886", "ts" : ObjectId("54b46f6556127074c739b3bf"), "process" : "kev1-0:27017:1421111141:1804289383", "when" : ISODate("2015-01-13T01:05:41.265Z"), "why" : "upgrading config database to new format v6" }
|
{ "_id" : "balancer", "state" : 0, "who" : "kev1-0:27017:1421111141:1804289383:Balancer:1681692777", "ts" : ObjectId("54b643e056127074c739c97a"), "process" : "kev1-0:27017:1421111141:1804289383", "when" : ISODate("2015-01-14T10:24:32.314Z"), "why" : "doing balance round" }
|
{ "_id" : "test.test", "state" : 0, "who" : "kev1-0:28000:1421111452:808384670:conn14:1257153344", "ts" : ObjectId("54b643e073e56bbf64cac149"), "process" : "kev1-0:28000:1421111452:808384670", "when" : ISODate("2015-01-14T10:24:32.323Z"), "why" : "migrating chunk [{ : MinKey }, { : MaxKey }) in test.test" }
|
mongos> printjson(ISODate()); db.getSiblingDB("config").locks.find()
|
ISODate("2015-01-14T10:27:07.297Z")
|
{ "_id" : "configUpgrade", "state" : 0, "who" : "kev1-0:27017:1421111141:1804289383:mongosMain:846930886", "ts" : ObjectId("54b46f6556127074c739b3bf"), "process" : "kev1-0:27017:1421111141:1804289383", "when" : ISODate("2015-01-13T01:05:41.265Z"), "why" : "upgrading config database to new format v6" }
|
{ "_id" : "balancer", "state" : 0, "who" : "kev1-0:27017:1421111141:1804289383:Balancer:1681692777", "ts" : ObjectId("54b6447656127074c739c998"), "process" : "kev1-0:27017:1421111141:1804289383", "when" : ISODate("2015-01-14T10:27:02.741Z"), "why" : "doing balance round" }
|
{ "_id" : "test.test", "state" : 0, "who" : "kev1-0:28000:1421111452:808384670:conn14:1257153344", "ts" : ObjectId("54b6447673e56bbf64cac176"), "process" : "kev1-0:28000:1421111452:808384670", "when" : ISODate("2015-01-14T10:27:02.750Z"), "why" : "migrating chunk [{ : MinKey }, { : MaxKey }) in test.test" }
|
Shard 1 primary (FROM-shard):
2015-01-14T10:24:22.296+0000 I SHARDING [conn14] received moveChunk request: { moveChunk: "test.test", from: "shard_0/kev1-0.kevincm1.8331.mongodbdns.com:28000,kev1-0.kevincm1.8331.mongodbdns.com:28001", to: "shard_1/kev1-0.kevincm1.8331.mongodbdns.com:28003,kev1-0.kevincm1.8331.mongodbdns.com:28004", fromShard: "shard_0", toShard: "shard_1", min: { _id: MinKey }, max: { _id: 0.0 }, maxChunkSizeBytes: 67108864, shardId: "test.test-_id_MinKey", configdb: "balancer-majority-timeout-config-0.kevincm1.8331.mongodbdns.com:27019", secondaryThrottle: false, waitForDelete: false, maxTimeMS: 0, epoch: ObjectId('54b4704956127074c739b3f6') }2015-01-14T10:24:22.297+0000 I SHARDING [conn14] distributed lock 'test.test/kev1-0:28000:1421111452:808384670' acquired, ts : 54b643d673e56bbf64cac146
|
2015-01-14T10:24:22.297+0000 I SHARDING [conn14] about to log metadata event: { _id: "kev1-0-2015-01-14T10:24:22-54b643d673e56bbf64cac147", server: "kev1-0", clientAddr: "172.31.2.226:33312", time: new Date(1421231062297), what: "moveChunk.start", ns: "test.test", details: { min: { _id: MinKey }, max: { _id: 0.0 }, from: "shard_0", to: "shard_1" } }
|
2015-01-14T10:24:22.302+0000 I SHARDING [conn14] remotely refreshing metadata for test.test based on current shard version 1|2||54b4704956127074c739b3f6, current metadata version is 1|2||54b4704956127074c739b3f6
|
2015-01-14T10:24:22.302+0000 I SHARDING [conn14] metadata of collection test.test already up to date (shard version : 1|2||54b4704956127074c739b3f6, took 0ms)
|
2015-01-14T10:24:22.302+0000 I SHARDING [conn14] moveChunk request accepted at version 1|2||54b4704956127074c739b3f6
|
2015-01-14T10:24:22.302+0000 I SHARDING [conn14] moveChunk number of documents: 1
|
2015-01-14T10:24:22.302+0000 W SHARDING [conn14] moveChunk failed to engage TO-shard in the data transfer: migrate already in progress
|
2015-01-14T10:24:22.302+0000 I SHARDING [conn14] MigrateFromStatus::done About to acquire global lock to exit critical section
|
2015-01-14T10:24:22.302+0000 I SHARDING [conn14] distributed lock 'test.test/kev1-0:28000:1421111452:808384670' unlocked.
|
2015-01-14T10:24:22.302+0000 I SHARDING [conn14] about to log metadata event: { _id: "kev1-0-2015-01-14T10:24:22-54b643d673e56bbf64cac148", server: "kev1-0", clientAddr: "172.31.2.226:33312", time: new Date(1421231062302), what: "moveChunk.from", ns: "test.test", details: { min: { _id: MinKey }, max: { _id: 0.0 }, step 1 of 6: 0, step 2 of 6: 5, to: "shard_1", from: "shard_0", note: "aborted", errmsg: "moveChunk failed to engage TO-shard in the data transfer: migrate already in progress" } }
|
2015-01-14T10:24:25.703+0000 I SHARDING [LockPinger] cluster balancer-majority-timeout-config-0.kevincm1.8331.mongodbdns.com:27019 pinged successfully at Wed Jan 14 10:24:25 2015 by distributed lock pinger 'balancer-majority-timeout-config-0.kevincm1.8331.mongodbdns.com:27019/kev1-0:28000:1421111452:808384670', sleeping for 30000ms
|
2015-01-14T10:24:29.763+0000 W NETWORK [ReplicaSetMonitorWatcher] Failed to connect to 172.31.2.226:28004, reason: errno:111 Connection refused
|
2015-01-14T10:24:32.323+0000 I SHARDING [conn14] received moveChunk request: { moveChunk: "test.test", from: "shard_0/kev1-0.kevincm1.8331.mongodbdns.com:28000,kev1-0.kevincm1.8331.mongodbdns.com:28001", to: "shard_1/kev1-0.kevincm1.8331.mongodbdns.com:28003,kev1-0.kevincm1.8331.mongodbdns.com:28004", fromShard: "shard_0", toShard: "shard_1", min: { _id: MinKey }, max: { _id: 0.0 }, maxChunkSizeBytes: 67108864, shardId: "test.test-_id_MinKey", configdb: "balancer-majority-timeout-config-0.kevincm1.8331.mongodbdns.com:27019", secondaryThrottle: false, waitForDelete: false, maxTimeMS: 0, epoch: ObjectId('54b4704956127074c739b3f6') }2015-01-14T10:24:32.324+0000 I SHARDING [conn14] distributed lock 'test.test/kev1-0:28000:1421111452:808384670' acquired, ts : 54b643e073e56bbf64cac149
|
2015-01-14T10:24:32.324+0000 I SHARDING [conn14] about to log metadata event: { _id: "kev1-0-2015-01-14T10:24:32-54b643e073e56bbf64cac14a", server: "kev1-0", clientAddr: "172.31.2.226:33312", time: new Date(1421231072324), what: "moveChunk.start", ns: "test.test", details: { min: { _id: MinKey }, max: { _id: 0.0 }, from: "shard_0", to: "shard_1" } }
|
2015-01-14T10:24:32.329+0000 I SHARDING [conn14] remotely refreshing metadata for test.test based on current shard version 1|2||54b4704956127074c739b3f6, current metadata version is 1|2||54b4704956127074c739b3f6
|
2015-01-14T10:24:32.329+0000 I SHARDING [conn14] metadata of collection test.test already up to date (shard version : 1|2||54b4704956127074c739b3f6, took 0ms)
|
2015-01-14T10:24:32.329+0000 I SHARDING [conn14] moveChunk request accepted at version 1|2||54b4704956127074c739b3f6
|
2015-01-14T10:24:32.329+0000 I SHARDING [conn14] moveChunk number of documents: 1
|
2015-01-14T10:24:32.329+0000 W SHARDING [conn14] moveChunk failed to engage TO-shard in the data transfer: migrate already in progress
|
2015-01-14T10:24:32.329+0000 I SHARDING [conn14] MigrateFromStatus::done About to acquire global lock to exit critical section
|
2015-01-14T10:24:32.329+0000 I SHARDING [conn14] distributed lock 'test.test/kev1-0:28000:1421111452:808384670' unlocked.
|
2015-01-14T10:24:32.329+0000 I SHARDING [conn14] about to log metadata event: { _id: "kev1-0-2015-01-14T10:24:32-54b643e073e56bbf64cac14b", server: "kev1-0", clientAddr: "172.31.2.226:33312", time: new Date(1421231072329), what: "moveChunk.from", ns: "test.test", details: { min: { _id: MinKey }, max: { _id: 0.0 }, step 1 of 6: 0, step 2 of 6: 5, to: "shard_1", from: "shard_0", note: "aborted", errmsg: "moveChunk failed to engage TO-shard in the data transfer: migrate already in progress" } }
|
Shard 2 primary (TO-shard):
2015-01-14T10:25:46.132+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section
|
2015-01-14T10:25:47.132+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section
|
2015-01-14T10:25:48.132+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section
|
2015-01-14T10:25:49.132+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section
|
2015-01-14T10:25:50.132+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section
|
2015-01-14T10:25:51.133+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section
|
2015-01-14T10:25:51.133+0000 W SHARDING [migrateThread] migrate commit waiting for a majority of slaves for 'test.test' { _id: MinKey } -> { _id: 0.0 } waiting for: 54b47ec0:1
|
2015-01-14T10:25:52.133+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section
|
2015-01-14T10:25:53.133+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section
|
2015-01-14T10:25:54.133+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section
|
2015-01-14T10:25:55.133+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section
|
2015-01-14T10:25:56.134+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section
|
2015-01-14T10:25:57.134+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section
|
Mongos:
2015-01-14T02:45:35.365+0000 I NETWORK [mongosMain] connection accepted from 172.31.2.226:53633 #18233 (8 connections now open)
|
2015-01-14T02:45:36.297+0000 W NETWORK [ReplicaSetMonitorWatcher] Failed to connect to 172.31.2.226:28004, reason: errno:111 Connection refused
|
2015-01-14T02:45:40.373+0000 I NETWORK [mongosMain] connection accepted from 172.31.2.226:53729 #18234 (9 connections now open)
|
2015-01-14T02:45:44.116+0000 I SHARDING [LockPinger] cluster balancer-majority-timeout-config-0.kevincm1.8331.mongodbdns.com:27019 pinged successfully at Wed Jan 14 02:45:44 2015 by distributed lock pinger 'balancer-majority-timeout-config-0.kevincm1.8331.mongodbdns.com:27019/kev1-0:27017:1421111141:1804289383', sleeping for 30000ms
|
2015-01-14T02:45:45.484+0000 I NETWORK [conn18233] end connection 172.31.2.226:53633 (8 connections now open)
|
2015-01-14T02:45:45.484+0000 I NETWORK [conn18234] end connection 172.31.2.226:53729 (7 connections now open)
|
2015-01-14T02:45:45.484+0000 I NETWORK [mongosMain] connection accepted from 172.31.2.226:53882 #18235 (8 connections now open)
|
2015-01-14T02:45:46.298+0000 W NETWORK [ReplicaSetMonitorWatcher] Failed to connect to 172.31.2.226:28004, reason: errno:111 Connection refused
|
2015-01-14T02:45:50.485+0000 I NETWORK [mongosMain] connection accepted from 172.31.2.226:53986 #18236 (9 connections now open)
|
2015-01-14T02:45:55.605+0000 I NETWORK [conn18235] end connection 172.31.2.226:53882 (8 connections now open)
|
2015-01-14T02:45:55.605+0000 I NETWORK [conn18236] end connection 172.31.2.226:53986 (8 connections now open)
|
2015-01-14T02:45:55.605+0000 I NETWORK [mongosMain] connection accepted from 172.31.2.226:54119 #18237 (8 connections now open)
|
2015-01-14T02:45:56.299+0000 W NETWORK [ReplicaSetMonitorWatcher] Failed to connect to 172.31.2.226:28004, reason: errno:111 Connection refused
|
2015-01-14T02:46:00.606+0000 I NETWORK [mongosMain] connection accepted from 172.31.2.226:54233 #18238 (9 connections now open)
|
2015-01-14T02:46:05.722+0000 I NETWORK [conn18237] end connection 172.31.2.226:54119 (8 connections now open)
|
2015-01-14T02:46:05.722+0000 I NETWORK [conn18238] end connection 172.31.2.226:54233 (8 connections now open)
|
2015-01-14T02:46:05.723+0000 I NETWORK [mongosMain] connection accepted from 172.31.2.226:54383 #18239 (8 connections now open)
|
2015-01-14T02:46:06.300+0000 W NETWORK [ReplicaSetMonitorWatcher] Failed to connect to 172.31.2.226:28004, reason: errno:111 Connection refused
|
2015-01-14T02:46:06.893+0000 I SHARDING [Balancer] moveChunk result: { cause: { active: true, ns: "test.test", from: "shard_0/kev1-0.kevincm1.8331.mongodbdns.com:28000,kev1-0.kevincm1.8331.mongodbdns.com:28001", min: { _id: MinKey }, max: { _id: 0.0 }, shardKeyPattern: { _id: 1.0 }, state: "catchup", counts: { cloned: 1, clonedBytes: 18, catchup: 0, steady: 0 }, ok: 0.0, errmsg: "" }, ok: 0.0, errmsg: "_recvChunkCommit failed!" }
|
2015-01-14T02:46:06.894+0000 I SHARDING [Balancer] balancer move failed: { cause: { active: true, ns: "test.test", from: "shard_0/kev1-0.kevincm1.8331.mongodbdns.com:28000,kev1-0.kevincm1.8331.mongodbdns.com:28001", min: { _id: MinKey }, max: { _id: 0.0 }, shardKeyPattern: { _id: 1.0 }, state: "catchup", counts: { cloned: 1, clonedBytes: 18, catchup: 0, steady: 0 }, ok: 0.0, errmsg: "" }, ok: 0.0, errmsg: "_recvChunkCommit failed!" } from: shard_0 to: shard_1 chunk: min: { _id: MinKey } max: { _id: 0.0 }
|
2015-01-14T02:46:06.900+0000 I SHARDING [Balancer] distributed lock 'balancer/kev1-0:27017:1421111141:1804289383' unlocked.
|
2015-01-14T02:46:10.731+0000 I NETWORK [mongosMain] connection accepted from 172.31.2.226:54479 #18240 (9 connections now open)
|
2015-01-14T02:46:14.117+0000 I SHARDING [LockPinger] cluster balancer-majority-timeout-config-0.kevincm1.8331.mongodbdns.com:27019 pinged successfully at Wed Jan 14 02:46:14 2015 by distributed lock pinger 'balancer-majority-timeout-config-0.kevincm1.8331.mongodbdns.com:27019/kev1-0:27017:1421111141:1804289383', sleeping for 30000ms
|
2015-01-14T02:46:15.844+0000 I NETWORK [conn18239] end connection 172.31.2.226:54383 (8 connections now open)
|
2015-01-14T02:46:15.845+0000 I NETWORK [conn18240] end connection 172.31.2.226:54479 (7 connections now open)
|
2015-01-14T02:46:15.845+0000 I NETWORK [mongosMain] connection accepted from 172.31.2.226:54627 #18241 (8 connections now open)
|
2015-01-14T02:46:16.301+0000 W NETWORK [ReplicaSetMonitorWatcher] Failed to connect to 172.31.2.226:28004, reason: errno:111 Connection refused
|
2015-01-14T02:46:16.902+0000 I SHARDING [Balancer] distributed lock 'balancer/kev1-0:27017:1421111141:1804289383' acquired, ts : 54b5d87856127074c739b40e
|
2015-01-14T02:46:16.909+0000 I SHARDING [Balancer] ns: test.test going to move { _id: "test.test-_id_MinKey", ns: "test.test", min: { _id: MinKey }, max: { _id: 0.0 }, version: Timestamp 1000|1, versionEpoch: ObjectId('54b4704956127074c739b3f6'), lastmod: Timestamp 1000|1, lastmodEpoch: ObjectId('54b4704956127074c739b3f6'), shard: "shard_0" } from: shard_0 to: shard_1 tag []
|
2015-01-14T02:46:16.909+0000 I SHARDING [Balancer] moving chunk ns: test.test moving ( ns: test.test, shard: shard_0:shard_0/kev1-0.kevincm1.8331.mongodbdns.com:28000,kev1-0.kevincm1.8331.mongodbdns.com:28001, lastmod: 1|1||000000000000000000000000, min: { _id: MinKey }, max: { _id: 0.0 }) shard_0:shard_0/kev1-0.kevincm1.8331.mongodbdns.com:28000,kev1-0.kevincm1.8331.mongodbdns.com:28001 -> shard_1:shard_1/kev1-0.kevincm1.8331.mongodbdns.com:28003,kev1-0.kevincm1.8331.mongodbdns.com:28004
|
2015-01-14T02:46:16.922+0000 I SHARDING [Balancer] moveChunk result: { cause: { ok: 0.0, errmsg: "migrate already in progress" }, ok: 0.0, errmsg: "moveChunk failed to engage TO-shard in the data transfer: migrate already in progress" }
|
2015-01-14T02:46:16.922+0000 I SHARDING [Balancer] balancer move failed: { cause: { ok: 0.0, errmsg: "migrate already in progress" }, ok: 0.0, errmsg: "moveChunk failed to engage TO-shard in the data transfer: migrate already in progress" } from: shard_0 to: shard_1 chunk: min: { _id: MinKey } max: { _id: 0.0 }
|
2015-01-14T02:46:16.927+0000 I SHARDING [Balancer] distributed lock 'balancer/kev1-0:27017:1421111141:1804289383' unlocked.
|
|