|
The log level is high enough on the config servers that they report information about every find command they receive. I grepped the log file for every find command on the testDB.bar namespace, and every single one of them had nreturned:1. This means that there was no query against the config servers that ever returned more than one chunk for that namespace, thus there's no way the shard could ever have seen it look like the split had been successful. I feel like there must be something else going on to explain why the splitChunk command on the shard didn't fail when it should have.
> grep "find" migration_ignore_interrupts_logs.txt | grep "chunks" | grep "testDB.bar" | grep "nreturned"
|
[js_test:migration_ignore_interrupts] 2016-09-23T16:06:15.383+0000 c23265| 2016-09-23T16:06:14.704+0000 I COMMAND [conn14] command config.chunks appName: "MongoDB Shell" command: find { find: "chunks", filter: { ns: "testDB.bar", lastmod: { $gte: Timestamp 0|0 } }, sort: { lastmod: 1 }, readConcern: { level: "majority", afterOpTime: { ts: Timestamp 1474646774000|4, t: 3 } }, maxTimeMS: 30000 } planSummary: IXSCAN { ns: 1, lastmod: 1 } keysExamined:1 docsExamined:1 cursorExhausted:1 numYields:0 nreturned:1 reslen:518 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_command 0ms
|
[js_test:migration_ignore_interrupts] 2016-09-23T16:06:26.519+0000 c23264| 2016-09-23T16:06:14.707+0000 I COMMAND [conn18] command config.chunks appName: "MongoDB Shell" command: find { find: "chunks", filter: { ns: "testDB.bar", lastmod: { $gte: Timestamp 0|0 } }, sort: { lastmod: 1 }, readConcern: { level: "majority", afterOpTime: { ts: Timestamp 1474646774000|5, t: 3 } }, maxTimeMS: 30000 } planSummary: IXSCAN { ns: 1, lastmod: 1 } keysExamined:1 docsExamined:1 cursorExhausted:1 numYields:0 nreturned:1 reslen:518 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_command 0ms
|
[js_test:migration_ignore_interrupts] 2016-09-23T16:06:26.622+0000 c23264| 2016-09-23T16:06:14.721+0000 I COMMAND [conn12] command config.chunks appName: "MongoDB Shell" command: find { find: "chunks", filter: { ns: "testDB.bar", lastmod: { $gte: Timestamp 0|0 } }, sort: { lastmod: 1 }, readConcern: { level: "majority", afterOpTime: { ts: Timestamp 1474646774000|7, t: 3 } }, maxTimeMS: 30000 } planSummary: IXSCAN { ns: 1, lastmod: 1 } keysExamined:1 docsExamined:1 cursorExhausted:1 numYields:0 nreturned:1 reslen:518 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_command 0ms
|
[js_test:migration_ignore_interrupts] 2016-09-23T16:06:26.631+0000 c23264| 2016-09-23T16:06:14.723+0000 I COMMAND [conn12] command config.chunks appName: "MongoDB Shell" command: find { find: "chunks", filter: { ns: "testDB.bar" }, sort: { lastmod: -1 }, readConcern: { level: "majority", afterOpTime: { ts: Timestamp 1474646774000|7, t: 3 } }, limit: 1, maxTimeMS: 30000 } planSummary: IXSCAN { ns: 1, lastmod: 1 } keysExamined:1 docsExamined:1 cursorExhausted:1 numYields:0 nreturned:1 reslen:518 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_command 0ms
|
[js_test:migration_ignore_interrupts] 2016-09-23T16:06:26.673+0000 c23264| 2016-09-23T16:06:14.730+0000 I COMMAND [conn18] command config.chunks appName: "MongoDB Shell" command: find { find: "chunks", filter: { ns: "testDB.bar", lastmod: { $gte: Timestamp 0|0 } }, sort: { lastmod: 1 }, readConcern: { level: "majority", afterOpTime: { ts: Timestamp 1474646774000|8, t: 3 } }, maxTimeMS: 30000 } planSummary: IXSCAN { ns: 1, lastmod: 1 } keysExamined:1 docsExamined:1 cursorExhausted:1 numYields:0 nreturned:1 reslen:518 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_command 0ms
|
[js_test:migration_ignore_interrupts] 2016-09-23T16:06:26.992+0000 c23265| 2016-09-23T16:06:26.350+0000 I COMMAND [conn18] command config.chunks appName: "MongoDB Shell" command: find { find: "chunks", filter: { ns: "testDB.bar", lastmod: { $gte: Timestamp 0|0 } }, sort: { lastmod: 1 }, readConcern: { level: "majority", afterOpTime: { ts: Timestamp 1474646782000|3, t: 4 } }, maxTimeMS: 30000 } planSummary: IXSCAN { ns: 1, lastmod: 1 } keysExamined:1 docsExamined:1 cursorExhausted:1 numYields:0 nreturned:1 reslen:518 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_command 0ms
|
[js_test:migration_ignore_interrupts] 2016-09-23T16:06:27.740+0000 c23266| 2016-09-23T16:06:26.360+0000 I COMMAND [conn21] command config.chunks command: find { find: "chunks", filter: { ns: "testDB.bar" }, sort: { lastmod: -1 }, readConcern: { level: "majority", afterOpTime: { ts: Timestamp 1474646786000|1, t: 4 } }, limit: 1, maxTimeMS: 30000 } planSummary: IXSCAN { ns: 1, lastmod: 1 } keysExamined:1 docsExamined:1 cursorExhausted:1 numYields:0 nreturned:1 reslen:498 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_command 0ms
|
[js_test:migration_ignore_interrupts] 2016-09-23T16:06:44.144+0000 c23266| 2016-09-23T16:06:44.099+0000 I COMMAND [conn21] command config.chunks command: find { find: "chunks", filter: { ns: "testDB.bar" }, sort: { lastmod: -1 }, readConcern: { level: "majority", afterOpTime: { ts: Timestamp 1474646803000|1, t: 5 } }, limit: 1, maxTimeMS: 30000 } planSummary: IXSCAN { ns: 1, lastmod: 1 } keysExamined:1 docsExamined:1 cursorExhausted:1 numYields:0 nreturned:1 reslen:498 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_command 0ms
|
[js_test:migration_ignore_interrupts] 2016-09-23T16:07:49.118+0000 c23264| 2016-09-23T16:07:48.966+0000 I COMMAND [conn12] command config.chunks appName: "MongoDB Shell" command: find { find: "chunks", filter: { ns: "testDB.bar" }, sort: { lastmod: -1 }, readConcern: { level: "majority", afterOpTime: { ts: Timestamp 1474646868000|3, t: 10 } }, limit: 1, maxTimeMS: 30000 } planSummary: IXSCAN { ns: 1, lastmod: 1 } keysExamined:1 docsExamined:1 cursorExhausted:1 numYields:0 nreturned:1 reslen:518 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_command 0ms
|
[js_test:migration_ignore_interrupts] 2016-09-23T16:07:49.347+0000 c23266| 2016-09-23T16:07:48.985+0000 I COMMAND [conn17] command config.chunks appName: "MongoDB Shell" command: find { find: "chunks", filter: { ns: "testDB.bar", lastmod: { $gte: Timestamp 0|0 } }, sort: { lastmod: 1 }, readConcern: { level: "majority", afterOpTime: { ts: Timestamp 1474646868000|5, t: 10 } }, maxTimeMS: 30000 } planSummary: IXSCAN { ns: 1, lastmod: 1 } keysExamined:1 docsExamined:1 cursorExhausted:1 numYields:0 nreturned:1 reslen:518 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_command 0ms
|
[js_test:migration_ignore_interrupts] 2016-09-23T16:07:49.583+0000 c23266| 2016-09-23T16:07:49.491+0000 I COMMAND [conn20] command config.chunks command: find { find: "chunks", filter: { ns: "testDB.bar", lastmod: { $gte: Timestamp 0|0 } }, sort: { lastmod: 1 }, readConcern: { level: "majority", afterOpTime: { ts: Timestamp 1474646869000|1, t: 10 } }, maxTimeMS: 30000 } planSummary: IXSCAN { ns: 1, lastmod: 1 } keysExamined:1 docsExamined:1 cursorExhausted:1 numYields:0 nreturned:1 reslen:518 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_command 0ms
|
[js_test:migration_ignore_interrupts] 2016-09-23T16:07:50.065+0000 c23264| 2016-09-23T16:07:49.812+0000 I COMMAND [conn20] command config.chunks appName: "MongoDB Shell" command: find { find: "chunks", filter: { ns: "testDB.bar", lastmod: { $gte: Timestamp 0|0 } }, sort: { lastmod: 1 }, readConcern: { level: "majority", afterOpTime: { ts: Timestamp 1474646869000|4, t: 10 } }, maxTimeMS: 30000 } planSummary: IXSCAN { ns: 1, lastmod: 1 } keysExamined:1 docsExamined:1 cursorExhausted:1 numYields:0 nreturned:1 reslen:518 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_command 0ms
|
[js_test:migration_ignore_interrupts] 2016-09-23T16:07:50.319+0000 c23266| 2016-09-23T16:07:49.809+0000 I COMMAND [conn17] command config.chunks appName: "MongoDB Shell" command: find { find: "chunks", filter: { ns: "testDB.bar", lastmod: { $gte: Timestamp 0|0 } }, sort: { lastmod: 1 }, readConcern: { level: "majority", afterOpTime: { ts: Timestamp 1474646869000|3, t: 10 } }, maxTimeMS: 30000 } planSummary: IXSCAN { ns: 1, lastmod: 1 } keysExamined:1 docsExamined:1 cursorExhausted:1 numYields:0 nreturned:1 reslen:518 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_command 0ms
|
|
|
Shard code running
Term 3
_configsvrCommitChunkSplit: 1474646774000|10, config c23264 primary, waits for but presumably never reaches this opTime
Term 4
find: config.chunks: 1474646782000|3, config c23265 secondary, opTime with which the find with read concern majority runs
find: config.chunks: 1474646786000|1, mongos sends command with this opTime, after the split command succeeds
// c23264 becomes primary!!!
|
|
2016-09-23T16:06:14.135+0000 c23264| 2016-09-23T16:06:12.665+0000 I REPL [rsSync] transition to primary complete; database writes are now permitted
|
|
|
// Mongos sends split command to shard 23261 (request 218)
|
|
2016-09-23T16:06:26.576+0000 s23267| 2016-09-23T16:06:14.723+0000 I COMMAND [conn1] splitting chunk [{ a: MinKey },{ a: MaxKey }) in collection testDB.bar on shard shard0000
|
2016-09-23T16:06:26.577+0000 s23267| 2016-09-23T16:06:14.723+0000 D ASIO [conn1] startCommand: RemoteCommand 218 -- target:ip-10-136-48-200:23261 db:admin cmd:{ splitChunk: "testDB.bar", configdb: "test-configRS/ip-10-136-48-200:23264,ip-10-136-48-200:23265,ip-10-136-48-200:23266", from: "shard0000", keyPattern: { a: 1.0 }, shardVersion: [ Timestamp 1000|0, ObjectId('57e552f69434a18aef7df006') ], min: { a: MinKey }, max: { a: MaxKey }, chunkVersion: [ Timestamp 1000|0, ObjectId('57e552f69434a18aef7df006') ], splitKeys: [ { a: 10.0 } ] }
|
|
|
// c23264 receives split chunk commit (presumably from shard 23261) _configsvrCommitChunkSplit sent
|
|
2016-09-23T16:06:26.674+0000 c23264| 2016-09-23T16:06:14.730+0000 D COMMAND [conn18] run command admin.$cmd { _configsvrCommitChunkSplit: "testDB.bar", collEpoch: ObjectId('57e552f69434a18aef7df006'), min: { a: MinKey }, max: { a: MaxKey }, splitPoints: [ { a: 10.0 } ], shard: "shard0000", w: "majority", wtimeout: 15000 }
|
|
|
// c23264 runs applyOps as part of _configsvrCommitChunkSplit command
|
|
2016-09-23T16:06:26.678+0000 c23264| 2016-09-23T16:06:14.730+0000 D COMMAND [conn18] run command config.$cmd { applyOps: [ { op: "u", b: true, ns: "config.chunks", o: { _id: "testDB.bar-a_MinKey", lastmod: Timestamp 1000|1, lastmodEpoch: ObjectId('57e552f69434a18aef7df006'), ns: "testDB.bar", min: { a: MinKey }, max: { a: 10.0 }, shard: "shard0000" }, o2: { _id: "testDB.bar-a_MinKey" } }, { op: "u", b: true, ns: "config.chunks", o: { _id: "testDB.bar-a_10.0", lastmod: Timestamp 1000|2, lastmodEpoch: ObjectId('57e552f69434a18aef7df006'), ns: "testDB.bar", min: { a: 10.0 }, max: { a: MaxKey }, shard: "shard0000" }, o2: { _id: "testDB.bar-a_10.0" } } ], preCondition: [ { ns: "config.chunks", q: { query: { ns: "testDB.bar", min: { a: MinKey }, max: { a: MaxKey } }, orderby: { lastmod: -1 } }, res: { lastmodEpoch: ObjectId('57e552f69434a18aef7df006'), shard: "shard0000" } } ], writeConcern: { w: 0, wtimeout: 0 } }
|
2016-09-23T16:06:26.684+0000 c23264| 2016-09-23T16:06:14.731+0000 I COMMAND [conn18] command config.chunks appName: "MongoDB Shell" command: applyOps { applyOps: [ { op: "u", b: true, ns: "config.chunks", o: { _id: "testDB.bar-a_MinKey", lastmod: Timestamp 1000|1, lastmodEpoch: ObjectId('57e552f69434a18aef7df006'), ns: "testDB.bar", min: { a: MinKey }, max: { a: 10.0 }, shard: "shard0000" }, o2: { _id: "testDB.bar-a_MinKey" } }, { op: "u", b: true, ns: "config.chunks", o: { _id: "testDB.bar-a_10.0", lastmod: Timestamp 1000|2, lastmodEpoch: ObjectId('57e552f69434a18aef7df006'), ns: "testDB.bar", min: { a: 10.0 }, max: { a: MaxKey }, shard: "shard0000" }, o2: { _id: "testDB.bar-a_10.0" } } ], preCondition: [ { ns: "config.chunks", q: { query: { ns: "testDB.bar", min: { a: MinKey }, max: { a: MaxKey } }, orderby: { lastmod: -1 } }, res: { lastmodEpoch: ObjectId('57e552f69434a18aef7df006'), shard: "shard0000" } } ], writeConcern: { w: 0, wtimeout: 0 } } numYields:0 reslen:152 locks:{ Global: { acquireCount: { r: 9, w: 2, W: 3 } }, Database: { acquireCount: { r: 2, w: 1, W: 1 } }, Collection: { acquireCount: { r: 2, W: 1 } }, Metadata: { acquireCount: { w: 1 } }, Mutex: { acquireCount: { W: 1 } }, oplog: { acquireCount: { w: 1 } } } protocol:op_query 0ms
|
|
|
// c23264 is waiting for the _configsvrCommitChunkSplit's last write to make it to majority...
|
|
2016-09-23T16:06:26.691+0000 c23264| 2016-09-23T16:06:14.731+0000 D REPL [conn18] Waiting for write concern. OpTime: { ts: Timestamp 1474646774000|10, t: 3 }, write concern: { w: "majority", wtimeout: 30000 }
|
|
....
|
// c23266 becomes primary!!!
|
|
2016-09-23T16:06:27.424+0000 c23266| 2016-09-23T16:06:22.733+0000 I REPL [rsSync] transition to primary complete; database writes are now permitted
|
|
|
// c23264 still thinks it is primary and is still waiting for opTime majority...
|
|
2016-09-23T16:06:26.726+0000 c23264| 2016-09-23T16:06:26.346+0000 D REPL [conn18] Required snapshot optime: { ts: Timestamp 1474646774000|10, t: 3 } is not yet part of the current 'committed' snapshot: { optime: { ts: Timestamp 1474646774000|8, t: 3 }, name-id: "105" }
|
|
|
// c23264 gets told to step down (this test is running in the stepdown suite).
|
|
2016-09-23T16:06:26.708+0000 c23264| 2016-09-23T16:06:26.345+0000 D COMMAND [conn26] run command admin.$cmd { replSetStepDown: 10.0, force: true }
|
2016-09-23T16:06:26.809+0000 c23264| 2016-09-23T16:06:26.345+0000 D COMMAND [conn26] command: replSetStepDown
|
|
|
// c23264 also discovers the other primary via a new term #4
|
|
2016-09-23T16:06:27.117+0000 c23264| 2016-09-23T16:06:26.347+0000 I REPL [ReplicationExecutor] can't see a majority of the set, relinquishing primary
|
2016-09-23T16:06:27.118+0000 c23264| 2016-09-23T16:06:26.347+0000 I REPL [ReplicationExecutor] Stepping down from primary in response to heartbeat
|
2016-09-23T16:06:27.118+0000 c23264| 2016-09-23T16:06:26.347+0000 D REPL [ReplicationExecutor] Updating _lastCommittedOpTime to { ts: Timestamp 1474646782000|3, t: 4 }
|
2016-09-23T16:06:27.119+0000 c23264| 2016-09-23T16:06:26.347+0000 I REPL [ReplicationExecutor] stepping down from primary, because a new term has begun: 4
|
2016-09-23T16:06:27.122+0000 c23264| 2016-09-23T16:06:26.347+0000 I NETWORK [conn35] received client metadata from 10.136.48.200:49170 conn35: { driver: { name: "MongoDB Internal Client", version: "3.3.14-3-g948f978" }, os: { type: "Linux", name: "Red Hat Enterprise Linux Server release 6.2 (Santiago)", architecture: "x86_64", version: "Kernel 2.6.32-220.el6.x86_64" } }
|
2016-09-23T16:06:27.129+0000 c23264| 2016-09-23T16:06:26.347+0000 I COMMAND [conn26] Attempting to step down in response to replSetStepDown command
|
2016-09-23T16:06:27.334+0000 c23264| 2016-09-23T16:06:26.349+0000 I REPL [ReplicationExecutor] stepping down from primary, because a new term has begun: 4
|
2016-09-23T16:06:27.335+0000 c23264| 2016-09-23T16:06:26.349+0000 I REPL [ReplicationExecutor] Member ip-10-136-48-200:23266 is now in state PRIMARY
|
|
|
// c23264 manages to return _configsvrCommitChunkSplit, rather than dropping the connection first. Presumably the response includes a write concern error.
|
|
2016-09-23T16:06:27.341+0000 c23264| 2016-09-23T16:06:26.349+0000 I COMMAND [conn18] command admin.$cmd appName: "MongoDB Shell" command: _configsvrCommitChunkSplit { _configsvrCommitChunkSplit: "testDB.bar", collEpoch: ObjectId('57e552f69434a18aef7df006'), min: { a: MinKey }, max: { a: MaxKey }, splitPoints: [ { a: 10.0 } ], shard: "shard0000", w: "majority", wtimeout: 15000 } numYields:0 reslen:308 locks:{ Global: { acquireCount: { r: 11, w: 4, W: 3 } }, Database: { acquireCount: { r: 2, w: 3, W: 1 } }, Collection: { acquireCount: { r: 2, w: 1, W: 1 } }, Metadata: { acquireCount: { w: 3, W: 1 } }, Mutex: { acquireCount: { W: 1 } }, oplog: { acquireCount: { w: 2 } } } protocol:op_command 11619ms
|
|
|
// c23265 (a secondary) runs a find with majority read concern, presumably from shard 23261 (running the splitChunk command). splitChunk on shard 23261 runs a find with read concern majority immediately after the _configsvrCommitChunkSplit command, regardless of result.
|
// In order for splitChunk to be a success, despite any errors in the _configsvrCommitChunkSplit response, this find command must find the documents resultant from a successful _configsvrCommitChunkSplit command.
|
|
2016-09-23T16:06:26.983+0000 c23265| 2016-09-23T16:06:26.350+0000 D COMMAND [conn18] run command config.$cmd { find: "chunks", filter: { ns: "testDB.bar", lastmod: { $gte: Timestamp 0|0 } }, sort: { lastmod: 1 }, readConcern: { level: "majority", afterOpTime: { ts: Timestamp 1474646782000|3, t: 4 } }, maxTimeMS: 30000 }
|
2016-09-23T16:06:26.992+0000 c23265| 2016-09-23T16:06:26.350+0000 I COMMAND [conn18] command config.chunks appName: "MongoDB Shell" command: find { find: "chunks", filter: { ns: "testDB.bar", lastmod: { $gte: Timestamp 0|0 } }, sort: { lastmod: 1 }, readConcern: { level: "majority", afterOpTime: { ts: Timestamp 1474646782000|3, t: 4 } }, maxTimeMS: 30000 } planSummary: IXSCAN { ns: 1, lastmod: 1 } keysExamined:1 docsExamined:1 cursorExhausted:1 numYields:0 nreturned:1 reslen:518 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_command 0ms
|
|
|
// Mongos gets response ok for splitChunk on shard 23261 (request 218).
|
|
2016-09-23T16:06:26.747+0000 s23267| 2016-09-23T16:06:26.357+0000 D ASIO [NetworkInterfaceASIO-ShardRegistry-0] Request 218 finished with response: { ok: 1.0 }
|
|
|
// Mongos reloads chunk metadata, which might either have [MinKey, MaxKey) (no split) or [MinKey, 10) and (10, MaxKey] (split happened).
|
|
2016-09-23T16:06:26.748+0000 s23267| 2016-09-23T16:06:26.357+0000 D ASIO [conn1] startCommand: RemoteCommand 230 -- target:ip-10-136-48-200:23266 db:config expDate:2016-09-23T16:06:56.357+0000 cmd:{ find: "chunks", filter: { ns: "testDB.bar" }, sort: { lastmod: -1 }, readConcern: { level: "majority", afterOpTime: { ts: Timestamp 1474646786000|1, t: 4 } }, limit: 1, maxTimeMS: 30000 }
|
2016-09-23T16:06:26.752+0000 s23267| 2016-09-23T16:06:26.360+0000 D ASIO [NetworkInterfaceASIO-ShardRegistry-0] Request 230 finished with response: { cursor: { firstBatch: [ { _id: "testDB.bar-a_MinKey", ns: "testDB.bar", min: { a: MinKey }, max: { a: MaxKey }, shard: "shard0000", lastmod: Timestamp 1000|0, lastmodEpoch: ObjectId('57e552f69434a18aef7df006') } ], id: 0, ns: "config.chunks" }, ok: 1.0 }
|
|
...
|
// c23265 becomes primary!!!
|
|
2016-09-23T16:06:40.373+0000 c23265| 2016-09-23T16:06:40.364+0000 I REPL [rsSync] transition to primary complete; database writes are now permitted
|
|
|
// Rollback of c23264!
|
|
2016-09-23T16:06:41.371+0000 c23264| 2016-09-23T16:06:41.350+0000 I REPL [rsBackgroundSync] Starting rollback due to OplogStartMissing: our last op time fetched: { ts: Timestamp 1474646786000|3, t: 3 }. source's GTE: { ts: Timestamp 1474646786000|3, t: 4 } hashes: (3095623275416609491/80873941472298891)
|
2016-09-23T16:06:41.371+0000 c23264| 2016-09-23T16:06:41.350+0000 I REPL [rsBackgroundSync] beginning rollback
|
2016-09-23T16:06:41.371+0000 c23264| 2016-09-23T16:06:41.350+0000 I REPL [rsBackgroundSync] rollback 0
|
2016-09-23T16:06:41.372+0000 c23264| 2016-09-23T16:06:41.350+0000 I REPL [rsBackgroundSync] transition to ROLLBACK
|
2016-09-23T16:06:41.388+0000 c23264| 2016-09-23T16:06:41.351+0000 I REPL [rsBackgroundSync] rollback 1
|
2016-09-23T16:06:41.396+0000 c23264| 2016-09-23T16:06:41.353+0000 I REPL [rsBackgroundSync] rollback 2 FindCommonPoint
|
2016-09-23T16:06:41.401+0000 c23264| 2016-09-23T16:06:41.353+0000 I REPL [rsBackgroundSync] rollback our last optime: Sep 23 16:06:26:3
|
2016-09-23T16:06:41.401+0000 c23264| 2016-09-23T16:06:41.353+0000 I REPL [rsBackgroundSync] rollback their last optime: Sep 23 16:06:40:4
|
2016-09-23T16:06:41.401+0000 c23264| 2016-09-23T16:06:41.353+0000 I REPL [rsBackgroundSync] rollback diff in end of log times: -14 seconds
|
2016-09-23T16:06:41.402+0000 c23264| 2016-09-23T16:06:41.353+0000 I REPL [rsBackgroundSync] rollback 3 fixup
|
2016-09-23T16:06:41.419+0000 c23264| 2016-09-23T16:06:41.356+0000 I REPL [rsBackgroundSync] rollback 4 n:3
|
2016-09-23T16:06:41.420+0000 c23264| 2016-09-23T16:06:41.356+0000 I REPL [rsBackgroundSync] minvalid={ ts: Timestamp 1474646800000|4, t: 5 }
|
2016-09-23T16:06:41.420+0000 c23264| 2016-09-23T16:06:41.356+0000 D QUERY [rsBackgroundSync] Only one plan is available; it will be run but will not be cached. query: {} sort: {} projection: {}, planSummary: COLLSCAN
|
2016-09-23T16:06:41.421+0000 c23264| 2016-09-23T16:06:41.356+0000 D QUERY [rsBackgroundSync] Only one plan is available; it will be run but will not be cached. query: {} sort: {} projection: {}, planSummary: COLLSCAN
|
2016-09-23T16:06:41.421+0000 c23264| 2016-09-23T16:06:41.356+0000 I REPL [rsBackgroundSync] rollback 4.6
|
2016-09-23T16:06:41.422+0000 c23264| 2016-09-23T16:06:41.356+0000 I REPL [rsBackgroundSync] rollback 4.7
|
2016-09-23T16:06:41.422+0000 c23264| 2016-09-23T16:06:41.356+0000 D QUERY [rsBackgroundSync] Using idhack: query: { _id: "ip-10-136-48-200-2016-09-23T16:06:14.731+0000-57e552f6a80d2dd9117d7d93" } sort: {} projection: {}
|
2016-09-23T16:06:41.423+0000 c23264| 2016-09-23T16:06:41.356+0000 D QUERY [rsBackgroundSync] Using idhack: query: { _id: "ip-10-136-48-200-2016-09-23T16:06:14.731+0000-57e552f6a80d2dd9117d7d93" } sort: {} projection: {}
|
2016-09-23T16:06:41.423+0000 c23264| 2016-09-23T16:06:41.356+0000 D QUERY [rsBackgroundSync] Using idhack: query: { _id: "testDB.bar-a_10.0" } sort: {} projection: {}
|
2016-09-23T16:06:41.424+0000 c23264| 2016-09-23T16:06:41.357+0000 D QUERY [rsBackgroundSync] Using idhack: { _id: "testDB.bar-a_10.0" }
|
2016-09-23T16:06:41.424+0000 c23264| 2016-09-23T16:06:41.357+0000 D QUERY [rsBackgroundSync] Using idhack: query: { _id: "testDB.bar-a_MinKey" } sort: {} projection: {}
|
2016-09-23T16:06:41.424+0000 c23264| 2016-09-23T16:06:41.357+0000 D QUERY [rsBackgroundSync] Using idhack: { _id: "testDB.bar-a_MinKey" }
|
2016-09-23T16:06:41.425+0000 c23264| 2016-09-23T16:06:41.357+0000 D QUERY [rsBackgroundSync] Using idhack: query: { _id: "ip-10-136-48-200:23261:1474646751:-1877369597973131435" } sort: {} projection: {}
|
2016-09-23T16:06:41.426+0000 c23264| 2016-09-23T16:06:41.357+0000 D QUERY [rsBackgroundSync] Using idhack: { _id: "ip-10-136-48-200:23261:1474646751:-1877369597973131435" }
|
2016-09-23T16:06:41.426+0000 c23264| 2016-09-23T16:06:41.357+0000 D QUERY [rsBackgroundSync] Using idhack: query: { _id: "ip-10-136-48-200:23263:1474646754:1666493946181649802" } sort: {} projection: {}
|
2016-09-23T16:06:41.427+0000 c23264| 2016-09-23T16:06:41.357+0000 D QUERY [rsBackgroundSync] Using idhack: { _id: "ip-10-136-48-200:23263:1474646754:1666493946181649802" }
|
2016-09-23T16:06:41.427+0000 c23264| 2016-09-23T16:06:41.357+0000 D QUERY [rsBackgroundSync] Using idhack: query: { _id: "ip-10-136-48-200:23267:1474646747:-6264026123060590303" } sort: {} projection: {}
|
2016-09-23T16:06:41.427+0000 c23264| 2016-09-23T16:06:41.357+0000 D QUERY [rsBackgroundSync] Using idhack: { _id: "ip-10-136-48-200:23267:1474646747:-6264026123060590303" }
|
2016-09-23T16:06:41.428+0000 c23264| 2016-09-23T16:06:41.357+0000 I REPL [rsBackgroundSync] rollback 5 d:2 u:4
|
2016-09-23T16:06:41.428+0000 c23264| 2016-09-23T16:06:41.357+0000 I REPL [rsBackgroundSync] rollback 6
|
2016-09-23T16:06:41.428+0000 c23264| 2016-09-23T16:06:41.357+0000 D REPL [rsBackgroundSync] rollback truncate oplog after { ts: Timestamp 1474646774000|8, t: 3 }
|
2016-09-23T16:06:41.429+0000 c23264| 2016-09-23T16:06:41.357+0000 D QUERY [rsBackgroundSync] Running query: query: {} sort: {} projection: {}
|
2016-09-23T16:06:41.430+0000 c23264| 2016-09-23T16:06:41.357+0000 D QUERY [rsBackgroundSync] Collection admin.system.roles does not exist. Using EOF plan: query: {} sort: {} projection: {}
|
2016-09-23T16:06:41.431+0000 c23264| 2016-09-23T16:06:41.358+0000 I COMMAND [rsBackgroundSync] query admin.system.roles planSummary: EOF ntoreturn:0 ntoskip:0 keysExamined:0 docsExamined:0 cursorExhausted:1 numYields:0 nreturned:0 reslen:20 locks:{ Global: { acquireCount: { r: 23, w: 10, W: 1 } }, Database: { acquireCount: { r: 6, w: 1, W: 9 } }, Collection: { acquireCount: { r: 5 } }, oplog: { acquireCount: { R: 1, W: 1 } } } 0ms
|
2016-09-23T16:06:41.431+0000 c23264| 2016-09-23T16:06:41.358+0000 I REPL [rsBackgroundSync] rollback done
|
2016-09-23T16:06:41.432+0000 c23264| 2016-09-23T16:06:41.358+0000 I REPL [rsBackgroundSync] rollback finished
|
2016-09-23T16:06:41.433+0000 c23264| 2016-09-23T16:06:41.358+0000 D REPL [replication-0] Reporter sending slave oplog progress to upstream updater ip-10-136-48-200:23265: { replSetUpdatePosition: 1, optimes: [ { durableOpTime: { ts: Timestamp 1474646774000|8, t: 3 }, appliedOpTime: { ts: Timestamp 1474646774000|8, t: 3 }, memberId: 0, cfgver: 1 }, { durableOpTime: { ts: Timestamp 1474646786000|3, t: 4 }, appliedOpTime: { ts: Timestamp 1474646786000|3, t: 4 }, memberId: 1, cfgver: 1 }, { durableOpTime: { ts: Timestamp 1474646786000|3, t: 4 }, appliedOpTime: { ts: Timestamp 1474646786000|3, t: 4 }, memberId: 2, cfgver: 1 } ], $replData: { term: 5, lastOpCommitted: { ts: Timestamp 1474646786000|3, t: 4 }, lastOpVisible: { ts: Timestamp 0|0, t: -1 }, configVersion: 1, replicaSetId: ObjectId('57e552d4a80d2dd9117d7d89'), primaryIndex: 1, syncSourceIndex: 1 } }
|
2016-09-23T16:06:41.436+0000 c23264| 2016-09-23T16:06:41.358+0000 D ASIO [replication-0] startCommand: RemoteCommand 343 -- target:ip-10-136-48-200:23265 db:admin cmd:{ replSetUpdatePosition: 1, optimes: [ { durableOpTime: { ts: Timestamp 1474646774000|8, t: 3 }, appliedOpTime: { ts: Timestamp 1474646774000|8, t: 3 }, memberId: 0, cfgver: 1 }, { durableOpTime: { ts: Timestamp 1474646786000|3, t: 4 }, appliedOpTime: { ts: Timestamp 1474646786000|3, t: 4 }, memberId: 1, cfgver: 1 }, { durableOpTime: { ts: Timestamp 1474646786000|3, t: 4 }, appliedOpTime: { ts: Timestamp 1474646786000|3, t: 4 }, memberId: 2, cfgver: 1 } ], $replData: { term: 5, lastOpCommitted: { ts: Timestamp 1474646786000|3, t: 4 }, lastOpVisible: { ts: Timestamp 0|0, t: -1 }, configVersion: 1, replicaSetId: ObjectId('57e552d4a80d2dd9117d7d89'), primaryIndex: 1, syncSourceIndex: 1 } }
|
2016-09-23T16:06:41.436+0000 c23264| 2016-09-23T16:06:41.358+0000 D REPL [rsSync] Can't go live (tryToGoLiveAsASecondary) as state != recovering.
|
|
...
|
// Mongos reloads the chunk metadata a few times in the meantime...
|
|
...
|
|
// Mongos sends a moveChunk command, which find [MinKey, MaxKey) in the chunk metadata. This leads to the JS test failure, because the JS test later asserts that [MinKey, 10) was moved.
|
|
[js_test:migration_ignore_interrupts] 2016-09-23T16:07:49.121+0000 s23267| 2016-09-23T16:07:48.966+0000 D ASIO [conn6] startCommand: RemoteCommand 382 -- target:ip-10-136-48-200:23265 db:admin cmd:{ _configsvrMoveChunk: 1, _id: "testDB.bar-a_MinKey", ns: "testDB.bar", min: { a: MinKey }, max: { a: MaxKey }, shard: "shard0000", lastmod: Timestamp 1000|0, lastmodEpoch: ObjectId('57e552f69434a18aef7df006'), toShard: "shard0001", maxChunkSizeBytes: 67108864, secondaryThrottle: {}, waitForDelete: true, writeConcern: { w: "majority", wtimeout: 15000 } }
|
|
The split command on testDB.bar succeeds, splitting testDB.bar at {a:10}
|
2016-09-23T16:06:26.576+0000 s23267| 2016-09-23T16:06:14.723+0000 I COMMAND [conn1] splitting chunk [{ a: MinKey },{ a: MaxKey }) in collection testDB.bar on shard shard0000
|
2016-09-23T16:06:26.577+0000 s23267| 2016-09-23T16:06:14.723+0000 D ASIO [conn1] startCommand: RemoteCommand 218 -- target:ip-10-136-48-200:23261 db:admin cmd:{ splitChunk: "testDB.bar", configdb: "test-configRS/ip-10-136-48-200:23264,ip-10-136-48-200:23265,ip-10-136-48-200:23266", from: "shard0000", keyPattern: { a: 1.0 }, shardVersion: [ Timestamp 1000|0, ObjectId('57e552f69434a18aef7df006') ], min: { a: MinKey }, max: { a: MaxKey }, chunkVersion: [ Timestamp 1000|0, ObjectId('57e552f69434a18aef7df006') ], splitKeys: [ { a: 10.0 } ] }
|
...
|
2016-09-23T16:06:26.747+0000 s23267| 2016-09-23T16:06:26.357+0000 D ASIO [NetworkInterfaceASIO-ShardRegistry-0] Request 218 finished with response: { ok: 1.0 }
|
Then the moveChunk is issued on whichever chunk contains {a:0}, which should be chunk [MinKey, 10), but is instead chunk [MinKey, MaxKey), as if the split never happened
|
2016-09-23T16:07:49.121+0000 s23267| 2016-09-23T16:07:48.966+0000 D ASIO [conn6] startCommand: RemoteCommand 382 -- target:ip-10-136-48-200:23265 db:admin cmd:{ _configsvrMoveChunk: 1, _id: "testDB.bar-a_MinKey", ns: "testDB.bar", min: { a: MinKey }, max: { a: MaxKey }, shard: "shard0000", lastmod: Timestamp 1000|0, lastmodEpoch: ObjectId('57e552f69434a18aef7df006'), toShard: "shard0001", maxChunkSizeBytes: 67108864, secondaryThrottle: {}, waitForDelete: true, writeConcern: { w: "majority", wtimeout: 15000 } }
|
2016-09-23T16:07:50.085+0000 c23265| 2016-09-23T16:07:49.807+0000 D ASIO [conn15] startCommand: RemoteCommand 1537 -- target:ip-10-136-48-200:23261 db:admin cmd:{ moveChunk: "testDB.bar", shardVersion: [ Timestamp 1000|0, ObjectId('57e552f69434a18aef7df006') ], epoch: ObjectId('57e552f69434a18aef7df006'), configdb: "test-configRS/ip-10-136-48-200:23264,ip-10-136-48-200:23265,ip-10-136-48-200:23266", fromShard: "shard0000", toShard: "shard0001", min: { a: MinKey }, max: { a: MaxKey }, chunkVersion: [ Timestamp 1000|0, ObjectId('57e552f69434a18aef7df006') ], maxChunkSizeBytes: 67108864, waitForDelete: true, takeDistLock: false }
|
This happens because the split is rolled back,
|
2016-09-23T16:06:40.373+0000 c23265| 2016-09-23T16:06:40.364+0000 I REPL [rsSync] transition to primary complete; database writes are now permitted
|
|
2016-09-23T16:06:41.371+0000 c23264| 2016-09-23T16:06:41.350+0000 I REPL [rsBackgroundSync] Starting rollback due to OplogStartMissing: our last op time fetched: { ts: Timestamp 1474646786000|3, t: 3 }. source's GTE: { ts: Timestamp 1474646786000|3, t: 4 } hashes: (3095623275416609491/80873941472298891)
|
2016-09-23T16:06:41.371+0000 c23264| 2016-09-23T16:06:41.350+0000 I REPL [rsBackgroundSync] beginning rollback
|
2016-09-23T16:06:41.371+0000 c23264| 2016-09-23T16:06:41.350+0000 I REPL [rsBackgroundSync] rollback 0
|
2016-09-23T16:06:41.372+0000 c23264| 2016-09-23T16:06:41.350+0000 I REPL [rsBackgroundSync] transition to ROLLBACK
|
2016-09-23T16:06:41.388+0000 c23264| 2016-09-23T16:06:41.351+0000 I REPL [rsBackgroundSync] rollback 1
|
2016-09-23T16:06:41.396+0000 c23264| 2016-09-23T16:06:41.353+0000 I REPL [rsBackgroundSync] rollback 2 FindCommonPoint
|
2016-09-23T16:06:41.401+0000 c23264| 2016-09-23T16:06:41.353+0000 I REPL [rsBackgroundSync] rollback our last optime: Sep 23 16:06:26:3
|
2016-09-23T16:06:41.401+0000 c23264| 2016-09-23T16:06:41.353+0000 I REPL [rsBackgroundSync] rollback their last optime: Sep 23 16:06:40:4
|
2016-09-23T16:06:41.401+0000 c23264| 2016-09-23T16:06:41.353+0000 I REPL [rsBackgroundSync] rollback diff in end of log times: -14 seconds
|
2016-09-23T16:06:41.402+0000 c23264| 2016-09-23T16:06:41.353+0000 I REPL [rsBackgroundSync] rollback 3 fixup
|
2016-09-23T16:06:41.417+0000 c23264| 2016-09-23T16:06:41.356+0000 I REPL [rsBackgroundSync] rollback 3.5
|
2016-09-23T16:06:41.419+0000 c23264| 2016-09-23T16:06:41.356+0000 I REPL [rsBackgroundSync] rollback 4 n:3
|
2016-09-23T16:06:41.420+0000 c23264| 2016-09-23T16:06:41.356+0000 I REPL [rsBackgroundSync] minvalid={ ts: Timestamp 1474646800000|4, t: 5 }
|
2016-09-23T16:06:41.420+0000 c23264| 2016-09-23T16:06:41.356+0000 D QUERY [rsBackgroundSync] Only one plan is available; it will be run but will not be cached. query: {} sort: {} projection: {}, planSummary: COLLSCAN
|
2016-09-23T16:06:41.421+0000 c23264| 2016-09-23T16:06:41.356+0000 D QUERY [rsBackgroundSync] Only one plan is available; it will be run but will not be cached. query: {} sort: {} projection: {}, planSummary: COLLSCAN
|
2016-09-23T16:06:41.421+0000 c23264| 2016-09-23T16:06:41.356+0000 I REPL [rsBackgroundSync] rollback 4.6
|
2016-09-23T16:06:41.422+0000 c23264| 2016-09-23T16:06:41.356+0000 I REPL [rsBackgroundSync] rollback 4.7
|
2016-09-23T16:06:41.422+0000 c23264| 2016-09-23T16:06:41.356+0000 D QUERY [rsBackgroundSync] Using idhack: query: { _id: "ip-10-136-48-200-2016-09-23T16:06:14.731+0000-57e552f6a80d2dd9117d7d93" } sort: {} projection: {}
|
2016-09-23T16:06:41.423+0000 c23264| 2016-09-23T16:06:41.356+0000 D QUERY [rsBackgroundSync] Using idhack: query: { _id: "ip-10-136-48-200-2016-09-23T16:06:14.731+0000-57e552f6a80d2dd9117d7d93" } sort: {} projection: {}
|
2016-09-23T16:06:41.423+0000 c23264| 2016-09-23T16:06:41.356+0000 D QUERY [rsBackgroundSync] Using idhack: query: { _id: "testDB.bar-a_10.0" } sort: {} projection: {}
|
2016-09-23T16:06:41.424+0000 c23264| 2016-09-23T16:06:41.357+0000 D QUERY [rsBackgroundSync] Using idhack: { _id: "testDB.bar-a_10.0" }
|
2016-09-23T16:06:41.424+0000 c23264| 2016-09-23T16:06:41.357+0000 D QUERY [rsBackgroundSync] Using idhack: query: { _id: "testDB.bar-a_MinKey" } sort: {} projection: {}
|
2016-09-23T16:06:41.424+0000 c23264| 2016-09-23T16:06:41.357+0000 D QUERY [rsBackgroundSync] Using idhack: { _id: "testDB.bar-a_MinKey" }
|
2016-09-23T16:06:41.425+0000 c23264| 2016-09-23T16:06:41.357+0000 D QUERY [rsBackgroundSync] Using idhack: query: { _id: "ip-10-136-48-200:23261:1474646751:-1877369597973131435" } sort: {} projection: {}
|
2016-09-23T16:06:41.426+0000 c23264| 2016-09-23T16:06:41.357+0000 D QUERY [rsBackgroundSync] Using idhack: { _id: "ip-10-136-48-200:23261:1474646751:-1877369597973131435" }
|
2016-09-23T16:06:41.426+0000 c23264| 2016-09-23T16:06:41.357+0000 D QUERY [rsBackgroundSync] Using idhack: query: { _id: "ip-10-136-48-200:23263:1474646754:1666493946181649802" } sort: {} projection: {}
|
2016-09-23T16:06:41.427+0000 c23264| 2016-09-23T16:06:41.357+0000 D QUERY [rsBackgroundSync] Using idhack: { _id: "ip-10-136-48-200:23263:1474646754:1666493946181649802" }
|
2016-09-23T16:06:41.427+0000 c23264| 2016-09-23T16:06:41.357+0000 D QUERY [rsBackgroundSync] Using idhack: query: { _id: "ip-10-136-48-200:23267:1474646747:-6264026123060590303" } sort: {} projection: {}
|
2016-09-23T16:06:41.427+0000 c23264| 2016-09-23T16:06:41.357+0000 D QUERY [rsBackgroundSync] Using idhack: { _id: "ip-10-136-48-200:23267:1474646747:-6264026123060590303" }
|
2016-09-23T16:06:41.428+0000 c23264| 2016-09-23T16:06:41.357+0000 I REPL [rsBackgroundSync] rollback 5 d:2 u:4
|
2016-09-23T16:06:41.428+0000 c23264| 2016-09-23T16:06:41.357+0000 I REPL [rsBackgroundSync] rollback 6
|
2016-09-23T16:06:41.428+0000 c23264| 2016-09-23T16:06:41.357+0000 D REPL [rsBackgroundSync] rollback truncate oplog after { ts: Timestamp 1474646774000|8, t: 3 }
|
2016-09-23T16:06:41.429+0000 c23264| 2016-09-23T16:06:41.357+0000 D QUERY [rsBackgroundSync] Running query: query: {} sort: {} projection: {}
|
2016-09-23T16:06:41.430+0000 c23264| 2016-09-23T16:06:41.357+0000 D QUERY [rsBackgroundSync] Collection admin.system.roles does not exist. Using EOF plan: query: {} sort: {} projection: {}
|
2016-09-23T16:06:41.431+0000 c23264| 2016-09-23T16:06:41.358+0000 I COMMAND [rsBackgroundSync] query admin.system.roles planSummary: EOF ntoreturn:0 ntoskip:0 keysExamined:0 docsExamined:0 cursorExhausted:1 numYields:0 nreturned:0 reslen:20 locks:{ Global: { acquireCount: { r: 23, w: 10, W: 1 } }, Database: { acquireCount: { r: 6, w: 1, W: 9 } }, Collection: { acquireCount: { r: 5 } }, oplog: { acquireCount: { R: 1, W: 1 } } } 0ms
|
2016-09-23T16:06:41.431+0000 c23264| 2016-09-23T16:06:41.358+0000 I REPL [rsBackgroundSync] rollback done
|
2016-09-23T16:06:41.432+0000 c23264| 2016-09-23T16:06:41.358+0000 I REPL [rsBackgroundSync] rollback finished
|
which makes no sense. splitChunk sends majority write concern with the split commit command to the config server. And if majority write fails, the command can only succeed if refreshed chunk metadata on the shard shows that the split went through alright. Refreshing the chunk metadata uses a majority read concern to read config.chunks on the config server.
|
|