[SERVER-36315] After stepdown the CSRS dist lock manager keeps trying to unlock locks Created: 26/Jul/18 Updated: 29/Oct/23 Resolved: 26/Sep/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 3.4.16, 3.6.6, 4.0.0 |
| Fix Version/s: | 3.6.15, 4.0.13, 4.2.1, 4.3.1, 3.4.24 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Kaloian Manassiev | Assignee: | Kevin Pulo |
| Resolution: | Fixed | Votes: | 2 |
| Labels: | neweng | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Operating System: | ALL | ||||||||
| Backport Requested: |
v4.2, v4.0, v3.6, v3.4
|
||||||||
| Sprint: | Sharding 2019-07-29, Sharding 2019-08-12, Sharding 2019-08-26, Sharding 2019-10-07 | ||||||||
| Participants: | |||||||||
| Case: | (copied to CRM) | ||||||||
| Description |
|
It looks like after stepdown of the config server primary, if the dist lock manager had any locks which it held (or failed to unlock) while it was a primary, it will keep trying to unlock them even though it is not a primary anymore. This pollutes the logs with these messages:
|
| Comments |
| Comment by Githook User [ 26/Sep/19 ] | |||
|
Author: {'username': 'devkev', 'email': 'kevin.pulo@mongodb.com', 'name': 'Kevin Pulo'}Message: (cherry picked from commit 3197ace779d6e0f023bff9e39bd80e17e3c4790d) | |||
| Comment by Githook User [ 26/Sep/19 ] | |||
|
Author: {'username': 'devkev', 'email': 'kevin.pulo@mongodb.com', 'name': 'Kevin Pulo'}Message: (cherry picked from commit 3197ace779d6e0f023bff9e39bd80e17e3c4790d) | |||
| Comment by Githook User [ 26/Sep/19 ] | |||
|
Author: {'name': 'Kevin Pulo', 'username': 'devkev', 'email': 'kevin.pulo@mongodb.com'}Message: (cherry picked from commit 3197ace779d6e0f023bff9e39bd80e17e3c4790d) | |||
| Comment by Githook User [ 26/Sep/19 ] | |||
|
Author: {'name': 'Kevin Pulo', 'username': 'devkev', 'email': 'kevin.pulo@mongodb.com'}Message: (cherry picked from commit 3197ace779d6e0f023bff9e39bd80e17e3c4790d) | |||
| Comment by Githook User [ 26/Sep/19 ] | |||
|
Author: {'username': 'devkev', 'email': 'kevin.pulo@mongodb.com', 'name': 'Kevin Pulo'}Message: | |||
| Comment by Scott Glajch [ 05/Aug/19 ] | |||
Those are some related log messages, if that helps. It seems every time that (thread58) message happens, it adds another duplicate log message per interval. | |||
| Comment by Scott Glajch [ 05/Aug/19 ] | |||
|
Actually I spoke too soon, the message is still showing up, though only on the first replica that we restarted and only at 1 message per minute (instead of at 30K messages per minute). | |||
| Comment by Scott Glajch [ 05/Aug/19 ] | |||
|
We just had this happen to us. After not knowing exactly how to get these messages to stop on the 2 secondary config server nodes, we ended up just re-initial syncing the 2 replicas, since luckily the config servers don't house much data at all and that only took a few minutes to complete. | |||
| Comment by Kaloian Manassiev [ 24/Jun/19 ] | |||
|
I do think this bug is a pretty bad experience, because it is 100% guaranteed that upon step-down of a CSRS primary, the secondaries will keep trying to unlock locks and with more and more locks that we put on the CSRS, this will keep filling the logs. We should be able to register it with the ReplicationCoordinatorExternalState and mark it as stepped up/stepped down on replication state transitions (only for the CSRS). This is still not 100% race-condition proof, but at the very least it will not try to unlock any more locks on secondaries. Putting it back in needs triage. |