[SERVER-29160] Sharding commonly uses write concern timeouts of 15 seconds and these are timing out in migration related operations and causing BFs Created: 12/May/17 Updated: 30/Oct/23 Resolved: 25/Sep/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | 3.6.9, 4.0.4, 4.1.4 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Dianna Hohensee (Inactive) | Assignee: | Misha Tyulenev |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | sharding-wfbf-day | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||
| Backport Requested: |
v4.0, v3.6
|
||||||||||||||||||||||||
| Sprint: | Sharding 2018-09-24, Sharding 2018-10-08 | ||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Linked BF Score: | 25 | ||||||||||||||||||||||||
| Description |
|
Sharding's writeConcern timeouts related to writes performed throughout the migration process should be bumped higher to prevent BFs. This write specifically caused the linked BF. Any other related writes that can be bumped without seriously affecting the rest of the system should be as well. Proposing a bump to 30 second timeouts rather than the 15 second timeout that's the norm in sharding. suggested fixas we have 20 different kMajorityWriteConcern values defined in the anonymous namespaces but most still connected we can add the durations to write_concern_options.h
and use as instead of
use
|
| Comments |
| Comment by Githook User [ 03/Oct/18 ] |
|
Author: {'name': 'Misha Tyulenev', 'email': 'misha@mongodb.com', 'username': 'mikety'}Message: |
| Comment by Githook User [ 03/Oct/18 ] |
|
Author: {'name': 'Misha Tyulenev', 'email': 'misha@mongodb.com', 'username': 'mikety'}Message: |
| Comment by Githook User [ 29/Sep/18 ] |
|
Author: {'name': 'Misha Tyulenev', 'email': 'misha@mongodb.com', 'username': 'mikety'}Message: |
| Comment by Githook User [ 28/Sep/18 ] |
|
Author: {'name': 'Misha Tyulenev', 'email': 'misha@mongodb.com', 'username': 'mikety'}Message: (cherry picked from commit 3f618b86df0473ab905cc4a0ad78f4be8d3428e3) |
| Comment by Githook User [ 25/Sep/18 ] |
|
Author: {'name': 'Misha Tyulenev', 'email': 'misha@mongodb.com', 'username': 'mikety'}Message: |
| Comment by Misha Tyulenev [ 10/Sep/18 ] |
|
For the default in command.cpp https://github.com/mongodb/mongo/blob/master/src/mongo/db/commands.cpp#L74 |
| Comment by Esha Maharishi (Inactive) [ 10/Sep/18 ] |
|
Looks great! One question, why do we need kWriteConcernTimeoutUserCommand? |
| Comment by Misha Tyulenev [ 10/Sep/18 ] |
|
esha.maharishi please ack the approach outlined in the description |
| Comment by Dianna Hohensee (Inactive) [ 12/Oct/17 ] |
|
Linking BF-6834 because it has similar issues, though not migration related commands. It's also a bit odd looking. It shows a {{writeConcern: { w: \"majority\", wtimeout: 15000 }}}} 15 second timeout, but takes 39 seconds to complete, and completes after the test fails. Perhaps a config set network timeout of 30 seconds, and then the write on the shard had a 15 second timeout set. |
| Comment by Dianna Hohensee (Inactive) [ 16/Jun/17 ] |
|
BF-5723's scenario is startCommit timing out after 30 seconds, followed closely by the migrateThread timing out (and failing the migration) after 15 seconds. Consider upping one of those timeouts. However |