[SERVER-32692] Make zbigMapReduce.js, sharding_balance4.js, and bulk_shard_insert.js more resilient under slow machines Created: 12/Jan/18 Updated: 30/Oct/23 Resolved: 23/Sep/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | 4.3.1 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Jack Mulrow | Assignee: | Matthew Saltz (Inactive) |
| Resolution: | Fixed | Votes: | 1 |
| Labels: | gm-ack | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||
| Backport Requested: |
v4.0, v3.6
|
||||||||||||||||||||||||||||
| Sprint: | Sharding 2019-09-09, Sharding 2019-09-23, Sharding 2019-10-07 | ||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||
| Linked BF Score: | 37 | ||||||||||||||||||||||||||||
| Description |
|
zbigMapReduce.js fails occasionally because more than 5 migrations manage finish since the beginning of either of the two bulk writes it executes, causing the test to fail since the write never establishes a shard version. Similarly to sharding_balance4.js as of sharding_balance4.js and bulk_shard_insert.js occasionally fail because more than 10 migrations complete during the course of a find command exhausting mongos's retry attempts and failing the test. Modifying the test to retry a couple times on StaleShardVersion should make it fail less often. We can also consider making a generic override for read commands that retry on StaleShardVersion errors, so it can be load-ed into tests that involve frequent migrations. |
| Comments |
| Comment by Githook User [ 18/Nov/19 ] |
|
Author: {'username': 'saltzm', 'email': 'matthew.saltz@mongodb.com', 'name': 'Matthew Saltz'}Message: (cherry picked from commit 1e0f4f8136e640d90093476695bb07b851da2da9) |
| Comment by Githook User [ 04/Nov/19 ] |
|
Author: {'username': 'saltzm', 'email': 'matthew.saltz@mongodb.com', 'name': 'Matthew Saltz'}Message: (cherry picked from commit 1e0f4f8136e640d90093476695bb07b851da2da9) |
| Comment by Githook User [ 23/Sep/19 ] |
|
Author: {'name': 'Matthew Saltz', 'username': 'saltzm', 'email': 'matthew.saltz@mongodb.com'}Message: |
| Comment by Jack Mulrow [ 01/Aug/19 ] |
|
Yeah I think throttling the balancer for these tests would help. |
| Comment by Matthew Saltz (Inactive) [ 01/Aug/19 ] |
|
Being able to throttle the balancer actually seems like a useful feature in general - some parameter that lets you specify max migrations per second or a parameter that says how long to sleep in between rounds. Should be easy to implement and backportable too. Let me know if I should create a ticket for that. Not sure if there's a good way to do it client side? jack.mulrow does that seem like it'd help? |
| Comment by Kaloian Manassiev [ 01/Aug/19 ] |
Yes, this is what I meant. However what Randolph proposes above also seems legit. |
| Comment by Randolph Tan [ 01/Aug/19 ] |
|
If there is a way to throttle the balancer to take a small pause after each migration, then I think that would help too. |
| Comment by Matthew Saltz (Inactive) [ 01/Aug/19 ] |
|
By "lowering constants" do you mean e.g. inserting less data? |
| Comment by Matthew Saltz (Inactive) [ 01/Aug/19 ] |
|
If kMaxNumStaleVersionRetries were a server parameter I'd say we should increase that value in the test, but it's not |
| Comment by Kaloian Manassiev [ 01/Aug/19 ] |
|
matthew.saltz, are you suggesting bumping the kMaxNumStaleVersionRetries? I have no recollection how this value was reached (renctan, do you?), but I don't think it is out of the question doubling it as long as we also obey the MaxTimeMS. Alternatively, can we just lower some constants in the test so it is more lightweight? |
| Comment by Kaloian Manassiev [ 18/Jan/18 ] |
|
jack.mulrow, I am not sure that retries for these tests is the right solution, because then I think it defeats their purpose, which is to make sure no anomalies are happening under some form of stress. It is a different question how useful these tests are. max.hirschhorn, can we just blacklist these two tests in the DEBUG suites so we clear some red? |