[SERVER-34810] Session cache refresh can erroneously kill cursors that are still in use Created: 02/May/18 Updated: 29/Oct/23 Resolved: 03/Jul/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 3.6.6, 4.0.1, 4.1.1 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | David Storch | Assignee: | Misha Tyulenev |
| Resolution: | Fixed | Votes: | 5 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||||||
| Backport Requested: |
v4.0
|
||||||||||||||||||||||||||||||||
| Steps To Reproduce: | This is a race condition which is only easy to reproduce consistently by instrumenting the server. The following patch causes the server to sleep for some time during LogicalSessionCache::_refresh():
Start a standalone mongod with --setParameter enableTestCommands=true. From one mongo shell, force a session refresh by running the following:
While the server is sleeping inside the session refresh, run the following:
When the session refresh completes, the cursor will no longer be open. You can observe this by running cursor.itcount() and receiving a CursorNotFound error. |
||||||||||||||||||||||||||||||||
| Sprint: | Sharding 2018-05-21, Sharding 2018-06-04, Sharding 2018-06-18, Sharding 2018-07-02, Sharding 2018-07-16 | ||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||||||||||||||
| Linked BF Score: | 16 | ||||||||||||||||||||||||||||||||
| Description |
|
Session information is stored in the system.sessions collection in the config database. Information about active sessions is cached in the LogicalSessionCache. The cache is periodically refreshed, which both
Suppose that a cache refresh is happening concurrently with a startSession command. It is possible for a session's cursor to be unexpectedly killed out from under the client's feet if the session record has not yet been written out to the system.sessions collection. The cache refresh code attempts to write new sessions out to system.sessions prior to killing any cursors. However, there is no synchronization to ensure that in between writing out these new sessions and killing cursors, a new session does not come into being. This means that the following can take place:
Fix ImplementationThe issue is caused by a race in LogicalSessionCache. |
| Comments |
| Comment by Marc Smith [ 19/Dec/18 ] |
|
My problem ended up being my mongos servers were sitting behind a load balancer. I didn't know Mongo requires "sticky sessions" to mongos. Since session affinity wasn't working with Kubernetes services, My solution was to give each application servers it's own mongos server in the same Kubernetes pod.
|
| Comment by Eric Milkie [ 19/Dec/18 ] |
|
Please file a new SERVER ticket with details of the problem so we can investigate anew. This message could be produced by many different issues. |
| Comment by Julius Sakalys [ 19/Dec/18 ] |
|
I can double what Marc said (https://jira.mongodb.org/browse/SERVER-34810?focusedCommentId=2084929&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-2084929) Getting lots of these on 4.0.2 |
| Comment by Marc Smith [ 09/Dec/18 ] |
|
I am also receiving "MongoError: Cursor not found" a LOT running 4.0.4. I don't know if this is the cause but I think it warrants being reopened. |
| Comment by Githook User [ 03/Jul/18 ] |
|
Author: {'username': 'mikety', 'name': 'Misha Tyulenev', 'email': 'misha@mongodb.com'}Message: (cherry picked from commit 57d7938c49da06122d4d43054ff89e1881d0209f) |
| Comment by Githook User [ 03/Jul/18 ] |
|
Author: {'username': 'mikety', 'name': 'Misha Tyulenev', 'email': 'misha@mongodb.com'}Message: (cherry picked from commit 57d7938c49da06122d4d43054ff89e1881d0209f) |
| Comment by Githook User [ 03/Jul/18 ] |
|
Author: {'username': 'mikety', 'name': 'Misha Tyulenev', 'email': 'misha@mongodb.com'}Message: |
| Comment by Simon Tretter [ 27/Jun/18 ] |
|
Did this fix make it into the 4.0 release? Thanks
Update 28/Jun/18: It did not, or at least the issue still exists. Am I the only one who considers this bug as a deal breaker? |
| Comment by David Storch [ 02/May/18 ] |
|
kaloian.manassiev, this is causing the parallel suites to fail in Evergreen, so I think it would be a good idea to schedule a fix. |