[SERVER-36831] LogicalSessionCache on mongos does not correctly report active operations Created: 23/Aug/18  Updated: 29/Oct/23  Resolved: 21/Sep/18

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.6.7, 4.0.1, 4.1.2
Fix Version/s: 3.6.9, 4.0.4, 4.1.3

Type: Task Priority: Major - P3
Reporter: Esha Maharishi (Inactive) Assignee: Blake Oler
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Backwards Compatibility: Fully Compatible
Backport Requested:
v4.0, v3.6
Sprint: Sharding 2018-09-24
Participants:

 Description   

Investigation

Generically speaking (on any node), the LogicalSessionCache on that node uses its ServiceLiaison's getActiveOpSessions() method to collect the set of currently active operations that are associated with a session.

(This is to collect the set of long-running operations that have been active since the last time the LogicalSessionCache updated the sessions records. New operations that have come in since the last time the LogicalSessionCache updated the sessions records are collected through LogicalSessionCache::vivify() in each node's service entry point).

The ServiceLiaison interface has a mongod and mongos implementation.

ServiceLiaison*Mongod*::getActiveOpSessions() walks through all Clients currently on the ServiceContext to collection the currently active operations.

However, the ServiceLiaison*Mongos*::getActiveOpSessions() simply looks at the open cursors currently in the CursorManager. As a result, even if there is no activity on these cursors, sessions with open cursors will not be reaped.

It looks like this is a copy-paste error, and that the lines in ServiceLiaisonMongos::getActiveOpSessions() should instead be here in ServiceLiaisonMongos::getOpenCursorSessions().

But, I think it's worth taking a step back to examine the ServiceLiaison on mongos and mongod to make sure it is capturing the right information - why doesn't the ServiceLiaisonMongos walk through Clients on its ServiceContext to capture long-running operations on sessions (the way ServiceLiaisonMongod does) that are not associated with a cursor and do not touch any shards?

This bug also affects 4.0.1 and 3.6.7.

Proposed Fix

We have agreed that the two service liaisons should behave in the same manner. We have also agreed that ServiceLiaison*Mongod has the correct implementation of both getActiveOpSessions() and getOpenCursorSessions().

  1. Move the implementation of ServiceLiaison*Mongos*::getActiveOpSessions() to ServiceLiaison*Mongos*::getOpenCursorSessions(), to match the ServiceLiaison*Mongod*getOpenCursorSessions() implementation.
  2. Mirror the implementation of ServiceLiaison*Mongod*::getActiveOpSessions() to match ServiceLiason*Mongod*::getActiveOpSessions().


 Comments   
Comment by Githook User [ 04/Oct/18 ]

Author:

{'name': 'Blake Oler', 'email': 'blake.oler@mongodb.com', 'username': 'BlakeIsBlake'}

Message: SERVER-36831 Report active operations correctly on the ServiceLiaison for mongos

(cherry picked from 19a5f8726fed990bd75d2a0426f53f691ee82b97)
Branch: v4.0
https://github.com/mongodb/mongo/commit/ad174d983b327a1355ec52cfb441171eb370e0f8

Comment by Githook User [ 04/Oct/18 ]

Author:

{'name': 'Blake Oler', 'email': 'blake.oler@mongodb.com', 'username': 'BlakeIsBlake'}

Message: SERVER-36831 Report active operations correctly on the ServiceLiaison for mongos

(cherry picked from 19a5f8726fed990bd75d2a0426f53f691ee82b97)
Branch: v3.6
https://github.com/mongodb/mongo/commit/6375b12faf20b84b0a2e714be91cc726d3092231

Comment by Githook User [ 21/Sep/18 ]

Author:

{'name': 'Blake Oler', 'email': 'blake.oler@mongodb.com', 'username': 'BlakeIsBlake'}

Message: SERVER-36831 Report active operations correctly on the ServiceLiaison for mongos
Branch: master
https://github.com/mongodb/mongo/commit/19a5f8726fed990bd75d2a0426f53f691ee82b97

Comment by Misha Tyulenev [ 20/Sep/18 ]

blake.oler ack the fix

Comment by Blake Oler [ 20/Sep/18 ]

misha.tyulenev ack proposed fix?

Comment by Misha Tyulenev [ 19/Sep/18 ]

I agree with esha.maharishi. mongos needs to kill cursors that are not running but handled by CursorManager.
This means that at first it needs to sync the sessions that are open and active by walking through clients in the serviceContext and then make a list of session ids to kill by taking a difference between sessions collection and currently open sessions from CursorManager.

Generated at Thu Feb 08 04:44:13 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.