[SERVER-38388] Reuse of ended session causes error when used again during logical sessions cache reaping Created: 04/Dec/18  Updated: 08/Jan/24  Resolved: 28/Jan/19

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 4.0.5, 4.1.3, 4.1.5
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Matt Broadstone Assignee: Blake Oler
Resolution: Works as Designed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to NODE-1719 cursor id xxx not found [collection.w... Closed
Operating System: ALL
Sprint: Sharding 2018-12-17, Sharding 2018-12-31, Sharding 2019-01-14, Sharding 2019-01-28, Sharding 2019-02-11
Participants:

 Description   

A user provided the following repro:

const url = 'mongodb://localhost:31000/test?replicaSet=rs';
 
MongoClient.connect(url, (err, client) => {
  if (err) return console.dir(err);
 
  let db = client.db();
  let cursor = db.collection('test').find({});
  let time = new Date().getTime();
  cursor.count((err, result) => {
    let stream = client.db().collection('test').watch([], { fullDocument: 'updateLookup' });
    console.log('watching...');
    stream.on('change', change => console.dir(change));
    stream.on('error', err => {
      let errTime = new Date().getTime();
      console.log(`error occured at: ${(new Date()).toISOString()}, in: ${errTime - time}ms`);
      console.dir(err);
      client.close();
    });
  });
});

Which would result in a cursor id XXXX not found error after ~3min in most runs.

After investigation it appears what is happening is that the node driver creates an implicit session for the initial count command, sends an endSessions for that session after the command completes, then reuses that same session for the following aggregate and getMore commands. The error occurs when the session is used on a getMore during the session reaping process, which is more easily accomplished when using a tailable cursor given the regularity of getMore s sent to the server.



 Comments   
Comment by Blake Oler [ 28/Jan/19 ]

Closing after offline discussion with matt.broadstone – nothing new was illuminated as a result of the discussion. We simply decided that Jason's points were valid and that pursuing further action wasn't worth investing any additional time.

Comment by Mira Carey [ 08/Jan/19 ]

Unfortunately, there's no great way to offer a better diagnostic for this kind of thing because the contract with sessions is that individual logical session ids are never re-used. Regardless of whether a session has ended via endSessions (or timed out and been reaped), once it's gone, the server has no notion it has existed. And because we don't allocate sessions with any kind of central synchronization, we have no way of knowing (when we see a session for the first time), if we've ever seen it before. That said, the likely hood of guid duplication (outside of vms started up simultaneously without real entropy), is incredibly low.

We could add a mode where we keep around a capped list of sessions that have ended for each server, but even then re-use wouldn't be a hard error (it'd be expected behavior in certain races), so all you could use it for would be diagnostics.

And as for the specific bug here, it's working as designed to have your cursor killed when you call endSession. That's precisely how that's supposed to work.

Comment by Matt Broadstone [ 04/Dec/18 ]

The driver sessions specification indicates that if the driver is using a server session pool, it should not send endSessions for each session, but rather return it to the pool to be reaped later. endSessions should only be send during client destruction, so this is a bug in the driver that exposed a likely minor (and difficult to reproduce) bug in the server.

Generated at Thu Feb 08 04:48:49 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.