[SERVER-26328] When a source shard aborts a migration due to killOp, the destination shard cloned two documents rather than one Created: 26/Sep/16  Updated: 06/Jan/17  Resolved: 06/Jan/17

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Dianna Hohensee (Inactive) Assignee: Dianna Hohensee (Inactive)
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Operating System: ALL
Sprint: Sharding 2016-10-10, Sharding 2016-10-31, Sharding 2016-11-21, Sharding 2016-12-12, Sharding 2017-01-02
Participants:
Linked BF Score: 0

 Description   

I'm not sure how this managed to happen. Occurs in BF-3608

2016-09-23T16:07:49.719+0000 d23261| 2016-09-23T16:07:49.707+0000 I COMMAND  [conn4] going to kill op: 497
2016-09-23T16:07:49.720+0000 d23261| 2016-09-23T16:07:49.711+0000 W COMMAND  [conn1] failpoint: moveChunkHangAtStep3 set to: { mode: 0, data: {} }
 
 
 
2016-09-23T16:07:49.795+0000 d23261| 2016-09-23T16:07:49.793+0000 I SHARDING [conn11] moveChunk data transfer progress: { active: true, sessionId: "shard0000_shard0001_57e553552ab4c67e7602e6d7", ns: "testDB.bar", from: "ip-10-136-48-200:23261", min: { a: MinKey }, max: { a: MaxKey }, shardKeyPattern: { a: 1.0 }, state: "clone", counts: { cloned: 2, clonedBytes: 66, catchup: 0, steady: 0 }, ok: 1.0 } my mem used: 0

"cloned" is 2 here, if relevant. The failpoint is released as well, right after the killOp, possibly starting something before the killOp marking the txn takes affect. But there's only one document in that chunk, so shouldn't matter...

Then a .count() on the destination shard's collection is also 2 when it should be only 1 document that was cloned.

2016-09-23T16:07:50.770+0000 [thread1] Error: [1] != [2] are not equal : shard1 accessed the xfermods log despite donor migration abortion :



 Comments   
Comment by Dianna Hohensee (Inactive) [ 06/Jan/17 ]

The BF this is related to is not reproducible, nor has it ever appeared again since September 2016. After spending a long time trying to diagnose, and failing, this is no longer useful to pursue.

Comment by Dianna Hohensee (Inactive) [ 03/Nov/16 ]

A few of the queries have

{ ..., sort: { lastmod: -1 }, ... , limit: 1, ...}

. I'll look more into the command sent to refresh the metadata.

Comment by Spencer Brody (Inactive) [ 03/Nov/16 ]

The log level is high enough on the config servers that they report information about every find command they receive. I grepped the log file for every find command on the testDB.bar namespace, and every single one of them had nreturned:1. This means that there was no query against the config servers that ever returned more than one chunk for that namespace, thus there's no way the shard could ever have seen it look like the split had been successful. I feel like there must be something else going on to explain why the splitChunk command on the shard didn't fail when it should have.

> grep "find" migration_ignore_interrupts_logs.txt | grep "chunks" | grep "testDB.bar" | grep "nreturned"
[js_test:migration_ignore_interrupts] 2016-09-23T16:06:15.383+0000 c23265| 2016-09-23T16:06:14.704+0000 I COMMAND  [conn14] command config.chunks appName: "MongoDB Shell" command: find { find: "chunks", filter: { ns: "testDB.bar", lastmod: { $gte: Timestamp 0|0 } }, sort: { lastmod: 1 }, readConcern: { level: "majority", afterOpTime: { ts: Timestamp 1474646774000|4, t: 3 } }, maxTimeMS: 30000 } planSummary: IXSCAN { ns: 1, lastmod: 1 } keysExamined:1 docsExamined:1 cursorExhausted:1 numYields:0 nreturned:1 reslen:518 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_command 0ms
[js_test:migration_ignore_interrupts] 2016-09-23T16:06:26.519+0000 c23264| 2016-09-23T16:06:14.707+0000 I COMMAND  [conn18] command config.chunks appName: "MongoDB Shell" command: find { find: "chunks", filter: { ns: "testDB.bar", lastmod: { $gte: Timestamp 0|0 } }, sort: { lastmod: 1 }, readConcern: { level: "majority", afterOpTime: { ts: Timestamp 1474646774000|5, t: 3 } }, maxTimeMS: 30000 } planSummary: IXSCAN { ns: 1, lastmod: 1 } keysExamined:1 docsExamined:1 cursorExhausted:1 numYields:0 nreturned:1 reslen:518 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_command 0ms
[js_test:migration_ignore_interrupts] 2016-09-23T16:06:26.622+0000 c23264| 2016-09-23T16:06:14.721+0000 I COMMAND  [conn12] command config.chunks appName: "MongoDB Shell" command: find { find: "chunks", filter: { ns: "testDB.bar", lastmod: { $gte: Timestamp 0|0 } }, sort: { lastmod: 1 }, readConcern: { level: "majority", afterOpTime: { ts: Timestamp 1474646774000|7, t: 3 } }, maxTimeMS: 30000 } planSummary: IXSCAN { ns: 1, lastmod: 1 } keysExamined:1 docsExamined:1 cursorExhausted:1 numYields:0 nreturned:1 reslen:518 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_command 0ms
[js_test:migration_ignore_interrupts] 2016-09-23T16:06:26.631+0000 c23264| 2016-09-23T16:06:14.723+0000 I COMMAND  [conn12] command config.chunks appName: "MongoDB Shell" command: find { find: "chunks", filter: { ns: "testDB.bar" }, sort: { lastmod: -1 }, readConcern: { level: "majority", afterOpTime: { ts: Timestamp 1474646774000|7, t: 3 } }, limit: 1, maxTimeMS: 30000 } planSummary: IXSCAN { ns: 1, lastmod: 1 } keysExamined:1 docsExamined:1 cursorExhausted:1 numYields:0 nreturned:1 reslen:518 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_command 0ms
[js_test:migration_ignore_interrupts] 2016-09-23T16:06:26.673+0000 c23264| 2016-09-23T16:06:14.730+0000 I COMMAND  [conn18] command config.chunks appName: "MongoDB Shell" command: find { find: "chunks", filter: { ns: "testDB.bar", lastmod: { $gte: Timestamp 0|0 } }, sort: { lastmod: 1 }, readConcern: { level: "majority", afterOpTime: { ts: Timestamp 1474646774000|8, t: 3 } }, maxTimeMS: 30000 } planSummary: IXSCAN { ns: 1, lastmod: 1 } keysExamined:1 docsExamined:1 cursorExhausted:1 numYields:0 nreturned:1 reslen:518 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_command 0ms
[js_test:migration_ignore_interrupts] 2016-09-23T16:06:26.992+0000 c23265| 2016-09-23T16:06:26.350+0000 I COMMAND  [conn18] command config.chunks appName: "MongoDB Shell" command: find { find: "chunks", filter: { ns: "testDB.bar", lastmod: { $gte: Timestamp 0|0 } }, sort: { lastmod: 1 }, readConcern: { level: "majority", afterOpTime: { ts: Timestamp 1474646782000|3, t: 4 } }, maxTimeMS: 30000 } planSummary: IXSCAN { ns: 1, lastmod: 1 } keysExamined:1 docsExamined:1 cursorExhausted:1 numYields:0 nreturned:1 reslen:518 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_command 0ms
[js_test:migration_ignore_interrupts] 2016-09-23T16:06:27.740+0000 c23266| 2016-09-23T16:06:26.360+0000 I COMMAND  [conn21] command config.chunks command: find { find: "chunks", filter: { ns: "testDB.bar" }, sort: { lastmod: -1 }, readConcern: { level: "majority", afterOpTime: { ts: Timestamp 1474646786000|1, t: 4 } }, limit: 1, maxTimeMS: 30000 } planSummary: IXSCAN { ns: 1, lastmod: 1 } keysExamined:1 docsExamined:1 cursorExhausted:1 numYields:0 nreturned:1 reslen:498 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_command 0ms
[js_test:migration_ignore_interrupts] 2016-09-23T16:06:44.144+0000 c23266| 2016-09-23T16:06:44.099+0000 I COMMAND  [conn21] command config.chunks command: find { find: "chunks", filter: { ns: "testDB.bar" }, sort: { lastmod: -1 }, readConcern: { level: "majority", afterOpTime: { ts: Timestamp 1474646803000|1, t: 5 } }, limit: 1, maxTimeMS: 30000 } planSummary: IXSCAN { ns: 1, lastmod: 1 } keysExamined:1 docsExamined:1 cursorExhausted:1 numYields:0 nreturned:1 reslen:498 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_command 0ms
[js_test:migration_ignore_interrupts] 2016-09-23T16:07:49.118+0000 c23264| 2016-09-23T16:07:48.966+0000 I COMMAND  [conn12] command config.chunks appName: "MongoDB Shell" command: find { find: "chunks", filter: { ns: "testDB.bar" }, sort: { lastmod: -1 }, readConcern: { level: "majority", afterOpTime: { ts: Timestamp 1474646868000|3, t: 10 } }, limit: 1, maxTimeMS: 30000 } planSummary: IXSCAN { ns: 1, lastmod: 1 } keysExamined:1 docsExamined:1 cursorExhausted:1 numYields:0 nreturned:1 reslen:518 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_command 0ms
[js_test:migration_ignore_interrupts] 2016-09-23T16:07:49.347+0000 c23266| 2016-09-23T16:07:48.985+0000 I COMMAND  [conn17] command config.chunks appName: "MongoDB Shell" command: find { find: "chunks", filter: { ns: "testDB.bar", lastmod: { $gte: Timestamp 0|0 } }, sort: { lastmod: 1 }, readConcern: { level: "majority", afterOpTime: { ts: Timestamp 1474646868000|5, t: 10 } }, maxTimeMS: 30000 } planSummary: IXSCAN { ns: 1, lastmod: 1 } keysExamined:1 docsExamined:1 cursorExhausted:1 numYields:0 nreturned:1 reslen:518 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_command 0ms
[js_test:migration_ignore_interrupts] 2016-09-23T16:07:49.583+0000 c23266| 2016-09-23T16:07:49.491+0000 I COMMAND  [conn20] command config.chunks command: find { find: "chunks", filter: { ns: "testDB.bar", lastmod: { $gte: Timestamp 0|0 } }, sort: { lastmod: 1 }, readConcern: { level: "majority", afterOpTime: { ts: Timestamp 1474646869000|1, t: 10 } }, maxTimeMS: 30000 } planSummary: IXSCAN { ns: 1, lastmod: 1 } keysExamined:1 docsExamined:1 cursorExhausted:1 numYields:0 nreturned:1 reslen:518 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_command 0ms
[js_test:migration_ignore_interrupts] 2016-09-23T16:07:50.065+0000 c23264| 2016-09-23T16:07:49.812+0000 I COMMAND  [conn20] command config.chunks appName: "MongoDB Shell" command: find { find: "chunks", filter: { ns: "testDB.bar", lastmod: { $gte: Timestamp 0|0 } }, sort: { lastmod: 1 }, readConcern: { level: "majority", afterOpTime: { ts: Timestamp 1474646869000|4, t: 10 } }, maxTimeMS: 30000 } planSummary: IXSCAN { ns: 1, lastmod: 1 } keysExamined:1 docsExamined:1 cursorExhausted:1 numYields:0 nreturned:1 reslen:518 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_command 0ms
[js_test:migration_ignore_interrupts] 2016-09-23T16:07:50.319+0000 c23266| 2016-09-23T16:07:49.809+0000 I COMMAND  [conn17] command config.chunks appName: "MongoDB Shell" command: find { find: "chunks", filter: { ns: "testDB.bar", lastmod: { $gte: Timestamp 0|0 } }, sort: { lastmod: 1 }, readConcern: { level: "majority", afterOpTime: { ts: Timestamp 1474646869000|3, t: 10 } }, maxTimeMS: 30000 } planSummary: IXSCAN { ns: 1, lastmod: 1 } keysExamined:1 docsExamined:1 cursorExhausted:1 numYields:0 nreturned:1 reslen:518 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_command 0ms

Comment by Dianna Hohensee (Inactive) [ 02/Nov/16 ]

spencer Could you take a look at this to see if it's a rollback bug? I've tried to put the relevant logs in the comments.

Comment by Dianna Hohensee (Inactive) [ 01/Nov/16 ]

Shard code running

Term 3
_configsvrCommitChunkSplit: 1474646774000|10, config c23264 primary, waits for but presumably never reaches this opTime
Term 4
find: config.chunks: 1474646782000|3, config c23265 secondary, opTime with which the find with read concern majority runs
find: config.chunks: 1474646786000|1, mongos sends command with this opTime, after the split command succeeds

// c23264 becomes primary!!!
 
2016-09-23T16:06:14.135+0000 c23264| 2016-09-23T16:06:12.665+0000 I REPL     [rsSync] transition to primary complete; database writes are now permitted
 
 
// Mongos sends split command to shard 23261 (request 218)
 
2016-09-23T16:06:26.576+0000 s23267| 2016-09-23T16:06:14.723+0000 I COMMAND  [conn1] splitting chunk [{ a: MinKey },{ a: MaxKey }) in collection testDB.bar on shard shard0000
2016-09-23T16:06:26.577+0000 s23267| 2016-09-23T16:06:14.723+0000 D ASIO     [conn1] startCommand: RemoteCommand 218 -- target:ip-10-136-48-200:23261 db:admin cmd:{ splitChunk: "testDB.bar", configdb: "test-configRS/ip-10-136-48-200:23264,ip-10-136-48-200:23265,ip-10-136-48-200:23266", from: "shard0000", keyPattern: { a: 1.0 }, shardVersion: [ Timestamp 1000|0, ObjectId('57e552f69434a18aef7df006') ], min: { a: MinKey }, max: { a: MaxKey }, chunkVersion: [ Timestamp 1000|0, ObjectId('57e552f69434a18aef7df006') ], splitKeys: [ { a: 10.0 } ] }
 
 
// c23264 receives split chunk commit (presumably from shard 23261) _configsvrCommitChunkSplit sent
 
2016-09-23T16:06:26.674+0000 c23264| 2016-09-23T16:06:14.730+0000 D COMMAND  [conn18] run command admin.$cmd { _configsvrCommitChunkSplit: "testDB.bar", collEpoch: ObjectId('57e552f69434a18aef7df006'), min: { a: MinKey }, max: { a: MaxKey }, splitPoints: [ { a: 10.0 } ], shard: "shard0000", w: "majority", wtimeout: 15000 }
 
 
// c23264 runs applyOps as part of _configsvrCommitChunkSplit command
 
2016-09-23T16:06:26.678+0000 c23264| 2016-09-23T16:06:14.730+0000 D COMMAND  [conn18] run command config.$cmd { applyOps: [ { op: "u", b: true, ns: "config.chunks", o: { _id: "testDB.bar-a_MinKey", lastmod: Timestamp 1000|1, lastmodEpoch: ObjectId('57e552f69434a18aef7df006'), ns: "testDB.bar", min: { a: MinKey }, max: { a: 10.0 }, shard: "shard0000" }, o2: { _id: "testDB.bar-a_MinKey" } }, { op: "u", b: true, ns: "config.chunks", o: { _id: "testDB.bar-a_10.0", lastmod: Timestamp 1000|2, lastmodEpoch: ObjectId('57e552f69434a18aef7df006'), ns: "testDB.bar", min: { a: 10.0 }, max: { a: MaxKey }, shard: "shard0000" }, o2: { _id: "testDB.bar-a_10.0" } } ], preCondition: [ { ns: "config.chunks", q: { query: { ns: "testDB.bar", min: { a: MinKey }, max: { a: MaxKey } }, orderby: { lastmod: -1 } }, res: { lastmodEpoch: ObjectId('57e552f69434a18aef7df006'), shard: "shard0000" } } ], writeConcern: { w: 0, wtimeout: 0 } }
2016-09-23T16:06:26.684+0000 c23264| 2016-09-23T16:06:14.731+0000 I COMMAND  [conn18] command config.chunks appName: "MongoDB Shell" command: applyOps { applyOps: [ { op: "u", b: true, ns: "config.chunks", o: { _id: "testDB.bar-a_MinKey", lastmod: Timestamp 1000|1, lastmodEpoch: ObjectId('57e552f69434a18aef7df006'), ns: "testDB.bar", min: { a: MinKey }, max: { a: 10.0 }, shard: "shard0000" }, o2: { _id: "testDB.bar-a_MinKey" } }, { op: "u", b: true, ns: "config.chunks", o: { _id: "testDB.bar-a_10.0", lastmod: Timestamp 1000|2, lastmodEpoch: ObjectId('57e552f69434a18aef7df006'), ns: "testDB.bar", min: { a: 10.0 }, max: { a: MaxKey }, shard: "shard0000" }, o2: { _id: "testDB.bar-a_10.0" } } ], preCondition: [ { ns: "config.chunks", q: { query: { ns: "testDB.bar", min: { a: MinKey }, max: { a: MaxKey } }, orderby: { lastmod: -1 } }, res: { lastmodEpoch: ObjectId('57e552f69434a18aef7df006'), shard: "shard0000" } } ], writeConcern: { w: 0, wtimeout: 0 } } numYields:0 reslen:152 locks:{ Global: { acquireCount: { r: 9, w: 2, W: 3 } }, Database: { acquireCount: { r: 2, w: 1, W: 1 } }, Collection: { acquireCount: { r: 2, W: 1 } }, Metadata: { acquireCount: { w: 1 } }, Mutex: { acquireCount: { W: 1 } }, oplog: { acquireCount: { w: 1 } } } protocol:op_query 0ms
 
 
// c23264 is waiting for the _configsvrCommitChunkSplit's last write to make it to majority...
 
2016-09-23T16:06:26.691+0000 c23264| 2016-09-23T16:06:14.731+0000 D REPL     [conn18] Waiting for write concern. OpTime: { ts: Timestamp 1474646774000|10, t: 3 }, write concern: { w: "majority", wtimeout: 30000 }
 
....

// c23266 becomes primary!!!
 
2016-09-23T16:06:27.424+0000 c23266| 2016-09-23T16:06:22.733+0000 I REPL     [rsSync] transition to primary complete; database writes are now permitted
 
 
// c23264 still thinks it is primary and is still waiting for opTime majority...
 
2016-09-23T16:06:26.726+0000 c23264| 2016-09-23T16:06:26.346+0000 D REPL     [conn18] Required snapshot optime: { ts: Timestamp 1474646774000|10, t: 3 } is not yet part of the current 'committed' snapshot: { optime: { ts: Timestamp 1474646774000|8, t: 3 }, name-id: "105" }
 
 
// c23264 gets told to step down (this test is running in the stepdown suite).
 
2016-09-23T16:06:26.708+0000 c23264| 2016-09-23T16:06:26.345+0000 D COMMAND  [conn26] run command admin.$cmd { replSetStepDown: 10.0, force: true }
2016-09-23T16:06:26.809+0000 c23264| 2016-09-23T16:06:26.345+0000 D COMMAND  [conn26] command: replSetStepDown
 
 
// c23264 also discovers the other primary via a new term #4
 
2016-09-23T16:06:27.117+0000 c23264| 2016-09-23T16:06:26.347+0000 I REPL     [ReplicationExecutor] can't see a majority of the set, relinquishing primary
2016-09-23T16:06:27.118+0000 c23264| 2016-09-23T16:06:26.347+0000 I REPL     [ReplicationExecutor] Stepping down from primary in response to heartbeat
2016-09-23T16:06:27.118+0000 c23264| 2016-09-23T16:06:26.347+0000 D REPL     [ReplicationExecutor] Updating _lastCommittedOpTime to { ts: Timestamp 1474646782000|3, t: 4 }
2016-09-23T16:06:27.119+0000 c23264| 2016-09-23T16:06:26.347+0000 I REPL     [ReplicationExecutor] stepping down from primary, because a new term has begun: 4
2016-09-23T16:06:27.122+0000 c23264| 2016-09-23T16:06:26.347+0000 I NETWORK  [conn35] received client metadata from 10.136.48.200:49170 conn35: { driver: { name: "MongoDB Internal Client", version: "3.3.14-3-g948f978" }, os: { type: "Linux", name: "Red Hat Enterprise Linux Server release 6.2 (Santiago)", architecture: "x86_64", version: "Kernel 2.6.32-220.el6.x86_64" } }
2016-09-23T16:06:27.129+0000 c23264| 2016-09-23T16:06:26.347+0000 I COMMAND  [conn26] Attempting to step down in response to replSetStepDown command
2016-09-23T16:06:27.334+0000 c23264| 2016-09-23T16:06:26.349+0000 I REPL     [ReplicationExecutor] stepping down from primary, because a new term has begun: 4
2016-09-23T16:06:27.335+0000 c23264| 2016-09-23T16:06:26.349+0000 I REPL     [ReplicationExecutor] Member ip-10-136-48-200:23266 is now in state PRIMARY
 
 
// c23264 manages to return _configsvrCommitChunkSplit, rather than dropping the connection first. Presumably the response includes a write concern error. 
 
2016-09-23T16:06:27.341+0000 c23264| 2016-09-23T16:06:26.349+0000 I COMMAND  [conn18] command admin.$cmd appName: "MongoDB Shell" command: _configsvrCommitChunkSplit { _configsvrCommitChunkSplit: "testDB.bar", collEpoch: ObjectId('57e552f69434a18aef7df006'), min: { a: MinKey }, max: { a: MaxKey }, splitPoints: [ { a: 10.0 } ], shard: "shard0000", w: "majority", wtimeout: 15000 } numYields:0 reslen:308 locks:{ Global: { acquireCount: { r: 11, w: 4, W: 3 } }, Database: { acquireCount: { r: 2, w: 3, W: 1 } }, Collection: { acquireCount: { r: 2, w: 1, W: 1 } }, Metadata: { acquireCount: { w: 3, W: 1 } }, Mutex: { acquireCount: { W: 1 } }, oplog: { acquireCount: { w: 2 } } } protocol:op_command 11619ms
 
 
// c23265 (a secondary) runs a find with majority read concern, presumably from shard 23261 (running the splitChunk command). splitChunk on shard 23261 runs a find with read concern majority immediately after the _configsvrCommitChunkSplit command, regardless of result.
// In order for splitChunk to be a success, despite any errors in the _configsvrCommitChunkSplit response, this find command must find the documents resultant from a successful _configsvrCommitChunkSplit command.
 
2016-09-23T16:06:26.983+0000 c23265| 2016-09-23T16:06:26.350+0000 D COMMAND  [conn18] run command config.$cmd { find: "chunks", filter: { ns: "testDB.bar", lastmod: { $gte: Timestamp 0|0 } }, sort: { lastmod: 1 }, readConcern: { level: "majority", afterOpTime: { ts: Timestamp 1474646782000|3, t: 4 } }, maxTimeMS: 30000 }
2016-09-23T16:06:26.992+0000 c23265| 2016-09-23T16:06:26.350+0000 I COMMAND  [conn18] command config.chunks appName: "MongoDB Shell" command: find { find: "chunks", filter: { ns: "testDB.bar", lastmod: { $gte: Timestamp 0|0 } }, sort: { lastmod: 1 }, readConcern: { level: "majority", afterOpTime: { ts: Timestamp 1474646782000|3, t: 4 } }, maxTimeMS: 30000 } planSummary: IXSCAN { ns: 1, lastmod: 1 } keysExamined:1 docsExamined:1 cursorExhausted:1 numYields:0 nreturned:1 reslen:518 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_command 0ms
 
 
// Mongos gets response ok for splitChunk on shard 23261 (request 218).
 
2016-09-23T16:06:26.747+0000 s23267| 2016-09-23T16:06:26.357+0000 D ASIO     [NetworkInterfaceASIO-ShardRegistry-0] Request 218 finished with response: { ok: 1.0 }
 
 
// Mongos reloads chunk metadata, which might either have [MinKey, MaxKey) (no split) or [MinKey, 10) and (10, MaxKey] (split happened).
 
2016-09-23T16:06:26.748+0000 s23267| 2016-09-23T16:06:26.357+0000 D ASIO     [conn1] startCommand: RemoteCommand 230 -- target:ip-10-136-48-200:23266 db:config expDate:2016-09-23T16:06:56.357+0000 cmd:{ find: "chunks", filter: { ns: "testDB.bar" }, sort: { lastmod: -1 }, readConcern: { level: "majority", afterOpTime: { ts: Timestamp 1474646786000|1, t: 4 } }, limit: 1, maxTimeMS: 30000 }
2016-09-23T16:06:26.752+0000 s23267| 2016-09-23T16:06:26.360+0000 D ASIO     [NetworkInterfaceASIO-ShardRegistry-0] Request 230 finished with response: { cursor: { firstBatch: [ { _id: "testDB.bar-a_MinKey", ns: "testDB.bar", min: { a: MinKey }, max: { a: MaxKey }, shard: "shard0000", lastmod: Timestamp 1000|0, lastmodEpoch: ObjectId('57e552f69434a18aef7df006') } ], id: 0, ns: "config.chunks" }, ok: 1.0 }
 
...

// c23265 becomes primary!!!
 
2016-09-23T16:06:40.373+0000 c23265| 2016-09-23T16:06:40.364+0000 I REPL     [rsSync] transition to primary complete; database writes are now permitted
 
 
// Rollback of c23264!
 
2016-09-23T16:06:41.371+0000 c23264| 2016-09-23T16:06:41.350+0000 I REPL     [rsBackgroundSync] Starting rollback due to OplogStartMissing: our last op time fetched: { ts: Timestamp 1474646786000|3, t: 3 }. source's GTE: { ts: Timestamp 1474646786000|3, t: 4 } hashes: (3095623275416609491/80873941472298891)
2016-09-23T16:06:41.371+0000 c23264| 2016-09-23T16:06:41.350+0000 I REPL     [rsBackgroundSync] beginning rollback
2016-09-23T16:06:41.371+0000 c23264| 2016-09-23T16:06:41.350+0000 I REPL     [rsBackgroundSync] rollback 0
2016-09-23T16:06:41.372+0000 c23264| 2016-09-23T16:06:41.350+0000 I REPL     [rsBackgroundSync] transition to ROLLBACK
2016-09-23T16:06:41.388+0000 c23264| 2016-09-23T16:06:41.351+0000 I REPL     [rsBackgroundSync] rollback 1
2016-09-23T16:06:41.396+0000 c23264| 2016-09-23T16:06:41.353+0000 I REPL     [rsBackgroundSync] rollback 2 FindCommonPoint
2016-09-23T16:06:41.401+0000 c23264| 2016-09-23T16:06:41.353+0000 I REPL     [rsBackgroundSync] rollback our last optime:   Sep 23 16:06:26:3
2016-09-23T16:06:41.401+0000 c23264| 2016-09-23T16:06:41.353+0000 I REPL     [rsBackgroundSync] rollback their last optime: Sep 23 16:06:40:4
2016-09-23T16:06:41.401+0000 c23264| 2016-09-23T16:06:41.353+0000 I REPL     [rsBackgroundSync] rollback diff in end of log times: -14 seconds
2016-09-23T16:06:41.402+0000 c23264| 2016-09-23T16:06:41.353+0000 I REPL     [rsBackgroundSync] rollback 3 fixup
2016-09-23T16:06:41.419+0000 c23264| 2016-09-23T16:06:41.356+0000 I REPL     [rsBackgroundSync] rollback 4 n:3
2016-09-23T16:06:41.420+0000 c23264| 2016-09-23T16:06:41.356+0000 I REPL     [rsBackgroundSync] minvalid={ ts: Timestamp 1474646800000|4, t: 5 }
2016-09-23T16:06:41.420+0000 c23264| 2016-09-23T16:06:41.356+0000 D QUERY    [rsBackgroundSync] Only one plan is available; it will be run but will not be cached. query: {} sort: {} projection: {}, planSummary: COLLSCAN
2016-09-23T16:06:41.421+0000 c23264| 2016-09-23T16:06:41.356+0000 D QUERY    [rsBackgroundSync] Only one plan is available; it will be run but will not be cached. query: {} sort: {} projection: {}, planSummary: COLLSCAN
2016-09-23T16:06:41.421+0000 c23264| 2016-09-23T16:06:41.356+0000 I REPL     [rsBackgroundSync] rollback 4.6
2016-09-23T16:06:41.422+0000 c23264| 2016-09-23T16:06:41.356+0000 I REPL     [rsBackgroundSync] rollback 4.7
2016-09-23T16:06:41.422+0000 c23264| 2016-09-23T16:06:41.356+0000 D QUERY    [rsBackgroundSync] Using idhack: query: { _id: "ip-10-136-48-200-2016-09-23T16:06:14.731+0000-57e552f6a80d2dd9117d7d93" } sort: {} projection: {}
2016-09-23T16:06:41.423+0000 c23264| 2016-09-23T16:06:41.356+0000 D QUERY    [rsBackgroundSync] Using idhack: query: { _id: "ip-10-136-48-200-2016-09-23T16:06:14.731+0000-57e552f6a80d2dd9117d7d93" } sort: {} projection: {}
2016-09-23T16:06:41.423+0000 c23264| 2016-09-23T16:06:41.356+0000 D QUERY    [rsBackgroundSync] Using idhack: query: { _id: "testDB.bar-a_10.0" } sort: {} projection: {}
2016-09-23T16:06:41.424+0000 c23264| 2016-09-23T16:06:41.357+0000 D QUERY    [rsBackgroundSync] Using idhack: { _id: "testDB.bar-a_10.0" }
2016-09-23T16:06:41.424+0000 c23264| 2016-09-23T16:06:41.357+0000 D QUERY    [rsBackgroundSync] Using idhack: query: { _id: "testDB.bar-a_MinKey" } sort: {} projection: {}
2016-09-23T16:06:41.424+0000 c23264| 2016-09-23T16:06:41.357+0000 D QUERY    [rsBackgroundSync] Using idhack: { _id: "testDB.bar-a_MinKey" }
2016-09-23T16:06:41.425+0000 c23264| 2016-09-23T16:06:41.357+0000 D QUERY    [rsBackgroundSync] Using idhack: query: { _id: "ip-10-136-48-200:23261:1474646751:-1877369597973131435" } sort: {} projection: {}
2016-09-23T16:06:41.426+0000 c23264| 2016-09-23T16:06:41.357+0000 D QUERY    [rsBackgroundSync] Using idhack: { _id: "ip-10-136-48-200:23261:1474646751:-1877369597973131435" }
2016-09-23T16:06:41.426+0000 c23264| 2016-09-23T16:06:41.357+0000 D QUERY    [rsBackgroundSync] Using idhack: query: { _id: "ip-10-136-48-200:23263:1474646754:1666493946181649802" } sort: {} projection: {}
2016-09-23T16:06:41.427+0000 c23264| 2016-09-23T16:06:41.357+0000 D QUERY    [rsBackgroundSync] Using idhack: { _id: "ip-10-136-48-200:23263:1474646754:1666493946181649802" }
2016-09-23T16:06:41.427+0000 c23264| 2016-09-23T16:06:41.357+0000 D QUERY    [rsBackgroundSync] Using idhack: query: { _id: "ip-10-136-48-200:23267:1474646747:-6264026123060590303" } sort: {} projection: {}
2016-09-23T16:06:41.427+0000 c23264| 2016-09-23T16:06:41.357+0000 D QUERY    [rsBackgroundSync] Using idhack: { _id: "ip-10-136-48-200:23267:1474646747:-6264026123060590303" }
2016-09-23T16:06:41.428+0000 c23264| 2016-09-23T16:06:41.357+0000 I REPL     [rsBackgroundSync] rollback 5 d:2 u:4
2016-09-23T16:06:41.428+0000 c23264| 2016-09-23T16:06:41.357+0000 I REPL     [rsBackgroundSync] rollback 6
2016-09-23T16:06:41.428+0000 c23264| 2016-09-23T16:06:41.357+0000 D REPL     [rsBackgroundSync] rollback truncate oplog after { ts: Timestamp 1474646774000|8, t: 3 }
2016-09-23T16:06:41.429+0000 c23264| 2016-09-23T16:06:41.357+0000 D QUERY    [rsBackgroundSync] Running query: query: {} sort: {} projection: {}
2016-09-23T16:06:41.430+0000 c23264| 2016-09-23T16:06:41.357+0000 D QUERY    [rsBackgroundSync] Collection admin.system.roles does not exist. Using EOF plan: query: {} sort: {} projection: {}
2016-09-23T16:06:41.431+0000 c23264| 2016-09-23T16:06:41.358+0000 I COMMAND  [rsBackgroundSync] query admin.system.roles planSummary: EOF ntoreturn:0 ntoskip:0 keysExamined:0 docsExamined:0 cursorExhausted:1 numYields:0 nreturned:0 reslen:20 locks:{ Global: { acquireCount: { r: 23, w: 10, W: 1 } }, Database: { acquireCount: { r: 6, w: 1, W: 9 } }, Collection: { acquireCount: { r: 5 } }, oplog: { acquireCount: { R: 1, W: 1 } } } 0ms
2016-09-23T16:06:41.431+0000 c23264| 2016-09-23T16:06:41.358+0000 I REPL     [rsBackgroundSync] rollback done
2016-09-23T16:06:41.432+0000 c23264| 2016-09-23T16:06:41.358+0000 I REPL     [rsBackgroundSync] rollback finished
2016-09-23T16:06:41.433+0000 c23264| 2016-09-23T16:06:41.358+0000 D REPL     [replication-0] Reporter sending slave oplog progress to upstream updater ip-10-136-48-200:23265: { replSetUpdatePosition: 1, optimes: [ { durableOpTime: { ts: Timestamp 1474646774000|8, t: 3 }, appliedOpTime: { ts: Timestamp 1474646774000|8, t: 3 }, memberId: 0, cfgver: 1 }, { durableOpTime: { ts: Timestamp 1474646786000|3, t: 4 }, appliedOpTime: { ts: Timestamp 1474646786000|3, t: 4 }, memberId: 1, cfgver: 1 }, { durableOpTime: { ts: Timestamp 1474646786000|3, t: 4 }, appliedOpTime: { ts: Timestamp 1474646786000|3, t: 4 }, memberId: 2, cfgver: 1 } ], $replData: { term: 5, lastOpCommitted: { ts: Timestamp 1474646786000|3, t: 4 }, lastOpVisible: { ts: Timestamp 0|0, t: -1 }, configVersion: 1, replicaSetId: ObjectId('57e552d4a80d2dd9117d7d89'), primaryIndex: 1, syncSourceIndex: 1 } }
2016-09-23T16:06:41.436+0000 c23264| 2016-09-23T16:06:41.358+0000 D ASIO     [replication-0] startCommand: RemoteCommand 343 -- target:ip-10-136-48-200:23265 db:admin cmd:{ replSetUpdatePosition: 1, optimes: [ { durableOpTime: { ts: Timestamp 1474646774000|8, t: 3 }, appliedOpTime: { ts: Timestamp 1474646774000|8, t: 3 }, memberId: 0, cfgver: 1 }, { durableOpTime: { ts: Timestamp 1474646786000|3, t: 4 }, appliedOpTime: { ts: Timestamp 1474646786000|3, t: 4 }, memberId: 1, cfgver: 1 }, { durableOpTime: { ts: Timestamp 1474646786000|3, t: 4 }, appliedOpTime: { ts: Timestamp 1474646786000|3, t: 4 }, memberId: 2, cfgver: 1 } ], $replData: { term: 5, lastOpCommitted: { ts: Timestamp 1474646786000|3, t: 4 }, lastOpVisible: { ts: Timestamp 0|0, t: -1 }, configVersion: 1, replicaSetId: ObjectId('57e552d4a80d2dd9117d7d89'), primaryIndex: 1, syncSourceIndex: 1 } }
2016-09-23T16:06:41.436+0000 c23264| 2016-09-23T16:06:41.358+0000 D REPL     [rsSync] Can't go live (tryToGoLiveAsASecondary) as state != recovering.
 
...

// Mongos reloads the chunk metadata a few times in the meantime...
 
...
 
// Mongos sends a moveChunk command, which find [MinKey, MaxKey) in the chunk metadata. This leads to the JS test failure, because the JS test later asserts that [MinKey, 10) was moved.
 
[js_test:migration_ignore_interrupts] 2016-09-23T16:07:49.121+0000 s23267| 2016-09-23T16:07:48.966+0000 D ASIO     [conn6] startCommand: RemoteCommand 382 -- target:ip-10-136-48-200:23265 db:admin cmd:{ _configsvrMoveChunk: 1, _id: "testDB.bar-a_MinKey", ns: "testDB.bar", min: { a: MinKey }, max: { a: MaxKey }, shard: "shard0000", lastmod: Timestamp 1000|0, lastmodEpoch: ObjectId('57e552f69434a18aef7df006'), toShard: "shard0001", maxChunkSizeBytes: 67108864, secondaryThrottle: {}, waitForDelete: true, writeConcern: { w: "majority", wtimeout: 15000 } }

Comment by Dianna Hohensee (Inactive) [ 28/Oct/16 ]

The split command on testDB.bar succeeds, splitting testDB.bar at {a:10} 

2016-09-23T16:06:26.576+0000 s23267| 2016-09-23T16:06:14.723+0000 I COMMAND  [conn1] splitting chunk [{ a: MinKey },{ a: MaxKey }) in collection testDB.bar on shard shard0000
2016-09-23T16:06:26.577+0000 s23267| 2016-09-23T16:06:14.723+0000 D ASIO     [conn1] startCommand: RemoteCommand 218 -- target:ip-10-136-48-200:23261 db:admin cmd:{ splitChunk: "testDB.bar", configdb: "test-configRS/ip-10-136-48-200:23264,ip-10-136-48-200:23265,ip-10-136-48-200:23266", from: "shard0000", keyPattern: { a: 1.0 }, shardVersion: [ Timestamp 1000|0, ObjectId('57e552f69434a18aef7df006') ], min: { a: MinKey }, max: { a: MaxKey }, chunkVersion: [ Timestamp 1000|0, ObjectId('57e552f69434a18aef7df006') ], splitKeys: [ { a: 10.0 } ] }
...
2016-09-23T16:06:26.747+0000 s23267| 2016-09-23T16:06:26.357+0000 D ASIO     [NetworkInterfaceASIO-ShardRegistry-0] Request 218 finished with response: { ok: 1.0 }

Then the moveChunk is issued on whichever chunk contains {a:0}, which should be chunk [MinKey, 10), but is instead chunk [MinKey, MaxKey), as if the split never happened 

2016-09-23T16:07:49.121+0000 s23267| 2016-09-23T16:07:48.966+0000 D ASIO     [conn6] startCommand: RemoteCommand 382 -- target:ip-10-136-48-200:23265 db:admin cmd:{ _configsvrMoveChunk: 1, _id: "testDB.bar-a_MinKey", ns: "testDB.bar", min: { a: MinKey }, max: { a: MaxKey }, shard: "shard0000", lastmod: Timestamp 1000|0, lastmodEpoch: ObjectId('57e552f69434a18aef7df006'), toShard: "shard0001", maxChunkSizeBytes: 67108864, secondaryThrottle: {}, waitForDelete: true, writeConcern: { w: "majority", wtimeout: 15000 } }
2016-09-23T16:07:50.085+0000 c23265| 2016-09-23T16:07:49.807+0000 D ASIO     [conn15] startCommand: RemoteCommand 1537 -- target:ip-10-136-48-200:23261 db:admin cmd:{ moveChunk: "testDB.bar", shardVersion: [ Timestamp 1000|0, ObjectId('57e552f69434a18aef7df006') ], epoch: ObjectId('57e552f69434a18aef7df006'), configdb: "test-configRS/ip-10-136-48-200:23264,ip-10-136-48-200:23265,ip-10-136-48-200:23266", fromShard: "shard0000", toShard: "shard0001", min: { a: MinKey }, max: { a: MaxKey }, chunkVersion: [ Timestamp 1000|0, ObjectId('57e552f69434a18aef7df006') ], maxChunkSizeBytes: 67108864, waitForDelete: true, takeDistLock: false }

This happens because the split is rolled back,

2016-09-23T16:06:40.373+0000 c23265| 2016-09-23T16:06:40.364+0000 I REPL     [rsSync] transition to primary complete; database writes are now permitted
 
2016-09-23T16:06:41.371+0000 c23264| 2016-09-23T16:06:41.350+0000 I REPL     [rsBackgroundSync] Starting rollback due to OplogStartMissing: our last op time fetched: { ts: Timestamp 1474646786000|3, t: 3 }. source's GTE: { ts: Timestamp 1474646786000|3, t: 4 } hashes: (3095623275416609491/80873941472298891)
2016-09-23T16:06:41.371+0000 c23264| 2016-09-23T16:06:41.350+0000 I REPL     [rsBackgroundSync] beginning rollback
2016-09-23T16:06:41.371+0000 c23264| 2016-09-23T16:06:41.350+0000 I REPL     [rsBackgroundSync] rollback 0
2016-09-23T16:06:41.372+0000 c23264| 2016-09-23T16:06:41.350+0000 I REPL     [rsBackgroundSync] transition to ROLLBACK
2016-09-23T16:06:41.388+0000 c23264| 2016-09-23T16:06:41.351+0000 I REPL     [rsBackgroundSync] rollback 1
2016-09-23T16:06:41.396+0000 c23264| 2016-09-23T16:06:41.353+0000 I REPL     [rsBackgroundSync] rollback 2 FindCommonPoint
2016-09-23T16:06:41.401+0000 c23264| 2016-09-23T16:06:41.353+0000 I REPL     [rsBackgroundSync] rollback our last optime:   Sep 23 16:06:26:3
2016-09-23T16:06:41.401+0000 c23264| 2016-09-23T16:06:41.353+0000 I REPL     [rsBackgroundSync] rollback their last optime: Sep 23 16:06:40:4
2016-09-23T16:06:41.401+0000 c23264| 2016-09-23T16:06:41.353+0000 I REPL     [rsBackgroundSync] rollback diff in end of log times: -14 seconds
2016-09-23T16:06:41.402+0000 c23264| 2016-09-23T16:06:41.353+0000 I REPL     [rsBackgroundSync] rollback 3 fixup
2016-09-23T16:06:41.417+0000 c23264| 2016-09-23T16:06:41.356+0000 I REPL     [rsBackgroundSync] rollback 3.5
2016-09-23T16:06:41.419+0000 c23264| 2016-09-23T16:06:41.356+0000 I REPL     [rsBackgroundSync] rollback 4 n:3
2016-09-23T16:06:41.420+0000 c23264| 2016-09-23T16:06:41.356+0000 I REPL     [rsBackgroundSync] minvalid={ ts: Timestamp 1474646800000|4, t: 5 }
2016-09-23T16:06:41.420+0000 c23264| 2016-09-23T16:06:41.356+0000 D QUERY    [rsBackgroundSync] Only one plan is available; it will be run but will not be cached. query: {} sort: {} projection: {}, planSummary: COLLSCAN
2016-09-23T16:06:41.421+0000 c23264| 2016-09-23T16:06:41.356+0000 D QUERY    [rsBackgroundSync] Only one plan is available; it will be run but will not be cached. query: {} sort: {} projection: {}, planSummary: COLLSCAN
2016-09-23T16:06:41.421+0000 c23264| 2016-09-23T16:06:41.356+0000 I REPL     [rsBackgroundSync] rollback 4.6
2016-09-23T16:06:41.422+0000 c23264| 2016-09-23T16:06:41.356+0000 I REPL     [rsBackgroundSync] rollback 4.7
2016-09-23T16:06:41.422+0000 c23264| 2016-09-23T16:06:41.356+0000 D QUERY    [rsBackgroundSync] Using idhack: query: { _id: "ip-10-136-48-200-2016-09-23T16:06:14.731+0000-57e552f6a80d2dd9117d7d93" } sort: {} projection: {}
2016-09-23T16:06:41.423+0000 c23264| 2016-09-23T16:06:41.356+0000 D QUERY    [rsBackgroundSync] Using idhack: query: { _id: "ip-10-136-48-200-2016-09-23T16:06:14.731+0000-57e552f6a80d2dd9117d7d93" } sort: {} projection: {}
2016-09-23T16:06:41.423+0000 c23264| 2016-09-23T16:06:41.356+0000 D QUERY    [rsBackgroundSync] Using idhack: query: { _id: "testDB.bar-a_10.0" } sort: {} projection: {}
2016-09-23T16:06:41.424+0000 c23264| 2016-09-23T16:06:41.357+0000 D QUERY    [rsBackgroundSync] Using idhack: { _id: "testDB.bar-a_10.0" }
2016-09-23T16:06:41.424+0000 c23264| 2016-09-23T16:06:41.357+0000 D QUERY    [rsBackgroundSync] Using idhack: query: { _id: "testDB.bar-a_MinKey" } sort: {} projection: {}
2016-09-23T16:06:41.424+0000 c23264| 2016-09-23T16:06:41.357+0000 D QUERY    [rsBackgroundSync] Using idhack: { _id: "testDB.bar-a_MinKey" }
2016-09-23T16:06:41.425+0000 c23264| 2016-09-23T16:06:41.357+0000 D QUERY    [rsBackgroundSync] Using idhack: query: { _id: "ip-10-136-48-200:23261:1474646751:-1877369597973131435" } sort: {} projection: {}
2016-09-23T16:06:41.426+0000 c23264| 2016-09-23T16:06:41.357+0000 D QUERY    [rsBackgroundSync] Using idhack: { _id: "ip-10-136-48-200:23261:1474646751:-1877369597973131435" }
2016-09-23T16:06:41.426+0000 c23264| 2016-09-23T16:06:41.357+0000 D QUERY    [rsBackgroundSync] Using idhack: query: { _id: "ip-10-136-48-200:23263:1474646754:1666493946181649802" } sort: {} projection: {}
2016-09-23T16:06:41.427+0000 c23264| 2016-09-23T16:06:41.357+0000 D QUERY    [rsBackgroundSync] Using idhack: { _id: "ip-10-136-48-200:23263:1474646754:1666493946181649802" }
2016-09-23T16:06:41.427+0000 c23264| 2016-09-23T16:06:41.357+0000 D QUERY    [rsBackgroundSync] Using idhack: query: { _id: "ip-10-136-48-200:23267:1474646747:-6264026123060590303" } sort: {} projection: {}
2016-09-23T16:06:41.427+0000 c23264| 2016-09-23T16:06:41.357+0000 D QUERY    [rsBackgroundSync] Using idhack: { _id: "ip-10-136-48-200:23267:1474646747:-6264026123060590303" }
2016-09-23T16:06:41.428+0000 c23264| 2016-09-23T16:06:41.357+0000 I REPL     [rsBackgroundSync] rollback 5 d:2 u:4
2016-09-23T16:06:41.428+0000 c23264| 2016-09-23T16:06:41.357+0000 I REPL     [rsBackgroundSync] rollback 6
2016-09-23T16:06:41.428+0000 c23264| 2016-09-23T16:06:41.357+0000 D REPL     [rsBackgroundSync] rollback truncate oplog after { ts: Timestamp 1474646774000|8, t: 3 }
2016-09-23T16:06:41.429+0000 c23264| 2016-09-23T16:06:41.357+0000 D QUERY    [rsBackgroundSync] Running query: query: {} sort: {} projection: {}
2016-09-23T16:06:41.430+0000 c23264| 2016-09-23T16:06:41.357+0000 D QUERY    [rsBackgroundSync] Collection admin.system.roles does not exist. Using EOF plan: query: {} sort: {} projection: {}
2016-09-23T16:06:41.431+0000 c23264| 2016-09-23T16:06:41.358+0000 I COMMAND  [rsBackgroundSync] query admin.system.roles planSummary: EOF ntoreturn:0 ntoskip:0 keysExamined:0 docsExamined:0 cursorExhausted:1 numYields:0 nreturned:0 reslen:20 locks:{ Global: { acquireCount: { r: 23, w: 10, W: 1 } }, Database: { acquireCount: { r: 6, w: 1, W: 9 } }, Collection: { acquireCount: { r: 5 } }, oplog: { acquireCount: { R: 1, W: 1 } } } 0ms
2016-09-23T16:06:41.431+0000 c23264| 2016-09-23T16:06:41.358+0000 I REPL     [rsBackgroundSync] rollback done
2016-09-23T16:06:41.432+0000 c23264| 2016-09-23T16:06:41.358+0000 I REPL     [rsBackgroundSync] rollback finished

which makes no sense. splitChunk sends majority write concern with the split commit command to the config server. And if majority write fails, the command can only succeed if refreshed chunk metadata on the shard shows that the split went through alright. Refreshing the chunk metadata uses a majority read concern to read config.chunks on the config server.

Generated at Thu Feb 08 04:11:47 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.