[SERVER-79779] AsyncResultsMerger leaks shard cursor when getMore fails due to not primary error Created: 07/Aug/23 Updated: 29/Oct/23 Resolved: 25/Sep/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 7.2.0-rc0, 7.0.3 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Jordi Serra Torrens | Assignee: | Foteini Alvanaki |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||
| Assigned Teams: |
Query Execution
|
||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||
| Backport Requested: |
v7.1, v7.0
|
||||||||||||||||||||||||
| Sprint: | QE 2023-08-21, QE 2023-09-04, QE 2023-09-18, QE 2023-10-02 | ||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Linked BF Score: | 119 | ||||||||||||||||||||||||
| Description |
|
After mongos has established a cursor within a multi-document transaction, if a shard steps down then subsequent requests will fail with NotWritablePrimary error (or they can also fail later due to InterruptedDueToReplStateChange). In this case, AsyncResultsMerger will not attempt to clean up the shard cursors, because NotWritablePrimary/InterruptedDueToReplStateChange are not part of this list or errors. This will cause the shard cursor to be leaked. NotWritablePrimary and InterruptedDueToReplStateChange (or the NotPrimaryError category?) should be made part of that list. Edit: 'LockTimeout' errors can also occur, and AsyncResultsMerger will also not attempt to clean up cursors in this case. |
| Comments |
| Comment by Githook User [ 28/Sep/23 ] |
|
Author: {'name': 'Foteini Alvanaki', 'email': 'foteini.alvanaki@mongodb.com', 'username': ''}Message: |
| Comment by Githook User [ 26/Sep/23 ] |
|
Author: {'name': 'Billy Donahue', 'email': 'billy.donahue@mongodb.com', 'username': 'BillyDonahue'}Message: |
| Comment by Githook User [ 25/Sep/23 ] |
|
Author: {'name': 'Foteini Alvanaki', 'email': 'foteini.alvanaki@mongodb.com', 'username': ''}Message: |
| Comment by Githook User [ 22/Sep/23 ] |
|
Author: {'name': 'Foteini Alvanaki', 'email': 'foteini.alvanaki@mongodb.com', 'username': ''}Message: |
| Comment by Foteini Alvanaki [ 14/Aug/23 ] |
|
Even though the change to check for isNotPrimary category error has been merged, there have been test failures again. I am investigating in which cases cursors are left open. |
| Comment by Githook User [ 11/Aug/23 ] |
|
Author: {'name': 'Foteini Alvanaki', 'email': 'foteini.alvanaki@mongodb.com', 'username': ''}Message: |
| Comment by Kevin Cherkauer [ 08/Aug/23 ] |
|
foteini.alvanaki@mongodb.com assigning this to you as it a hot BF so it needs an owner, and it doesn't look like you have a BF currently. |
| Comment by Kevin Cherkauer [ 08/Aug/23 ] |
|
jordi.serra-torrens@mongodb.com Thank you, I moved these back to QE and will look for an owner. |
| Comment by Jordi Serra Torrens [ 07/Aug/23 ] |
|
I'm wondering what's the reasoning for the current short list of errors for which AsyncResultsMerger issues killCursors; and whether we could have a more holistic approach to it. |