[SERVER-38388] Reuse of ended session causes error when used again during logical sessions cache reaping Created: 04/Dec/18 Updated: 08/Jan/24 Resolved: 28/Jan/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 4.0.5, 4.1.3, 4.1.5 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Matt Broadstone | Assignee: | Blake Oler |
| Resolution: | Works as Designed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Operating System: | ALL | ||||||||
| Sprint: | Sharding 2018-12-17, Sharding 2018-12-31, Sharding 2019-01-14, Sharding 2019-01-28, Sharding 2019-02-11 | ||||||||
| Participants: | |||||||||
| Description |
|
A user provided the following repro:
Which would result in a cursor id XXXX not found error after ~3min in most runs. After investigation it appears what is happening is that the node driver creates an implicit session for the initial count command, sends an endSessions for that session after the command completes, then reuses that same session for the following aggregate and getMore commands. The error occurs when the session is used on a getMore during the session reaping process, which is more easily accomplished when using a tailable cursor given the regularity of getMore s sent to the server. |
| Comments |
| Comment by Blake Oler [ 28/Jan/19 ] |
|
Closing after offline discussion with matt.broadstone – nothing new was illuminated as a result of the discussion. We simply decided that Jason's points were valid and that pursuing further action wasn't worth investing any additional time. |
| Comment by Mira Carey [ 08/Jan/19 ] |
|
Unfortunately, there's no great way to offer a better diagnostic for this kind of thing because the contract with sessions is that individual logical session ids are never re-used. Regardless of whether a session has ended via endSessions (or timed out and been reaped), once it's gone, the server has no notion it has existed. And because we don't allocate sessions with any kind of central synchronization, we have no way of knowing (when we see a session for the first time), if we've ever seen it before. That said, the likely hood of guid duplication (outside of vms started up simultaneously without real entropy), is incredibly low. We could add a mode where we keep around a capped list of sessions that have ended for each server, but even then re-use wouldn't be a hard error (it'd be expected behavior in certain races), so all you could use it for would be diagnostics. And as for the specific bug here, it's working as designed to have your cursor killed when you call endSession. That's precisely how that's supposed to work. |
| Comment by Matt Broadstone [ 04/Dec/18 ] |
|
The driver sessions specification indicates that if the driver is using a server session pool, it should not send endSessions for each session, but rather return it to the pool to be reaped later. endSessions should only be send during client destruction, so this is a bug in the driver that exposed a likely minor (and difficult to reproduce) bug in the server. |