[SERVER-22543] multi_write_target.js should not rely on the order of shard ids Created: 15/Jan/16 Updated: 18/Nov/16 Resolved: 10/Feb/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | 3.2.4, 3.3.2 |
| Type: | Bug | Priority: | Minor - P4 |
| Reporter: | Spencer Jackson | Assignee: | Kaloian Manassiev |
| Resolution: | Done | Votes: | 0 |
| Labels: | test-only | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Backwards Compatibility: | Fully Compatible |
| Operating System: | ALL |
| Backport Completed: | |
| Sprint: | Sharding 10 (02/19/16) |
| Participants: |
| Description |
sharding_csrs_continuous_config_stepdown_WT failed on enterprise-rhel-62-64-bitmulti_write_target.js - Logs | History BF Ticket Generated by spencer.jackson |
| Comments |
| Comment by Githook User [ 17/Feb/16 ] | ||||||||||||||||||||||||||
|
Author: {u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}Message: | ||||||||||||||||||||||||||
| Comment by Githook User [ 10/Feb/16 ] | ||||||||||||||||||||||||||
|
Author: {u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}Message: | ||||||||||||||||||||||||||
| Comment by Kaloian Manassiev [ 09/Feb/16 ] | ||||||||||||||||||||||||||
|
This test failed because this line in the test returned the shards not sorted based on their shard id, but in this order: shard0000, shard0002, shard0001. This is evident from the actual move chunk commands, which get executed later on:
Because the rest of the test relies on the order of the shards being sorted, the expected checks didn't match later on. The only explanation, which I have for this outcome is that the third config server in the CSRS set was not part of the majority when the shard entries were inserted and then when replaying the oplog, happened to insert shard0002 first, which got a lower RecordId than shard0001. This is very likely since one of the nodes in the CSRS set was lagging behind for a while:
In the above example memberID 1 is server 20514:
And later on, it is this host, which becomes the new primary after the stepdown thread runs:
Based on this evidence, the effect is expected and the test needs to be fixed to not rely on this order. | ||||||||||||||||||||||||||
| Comment by Spencer Jackson [ 15/Jan/16 ] | ||||||||||||||||||||||||||
|
https://logkeeper.mongodb.org/build/569964cb9041300b275639eb/test/5699688e9041300b2756ae2a#L7526
Looks like something with sharding? |