[SERVER-51074] VectorClock is not gossiped correctly on newly started sessions Created: 21/Sep/20  Updated: 29/Oct/23  Resolved: 24/Sep/20

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 4.7.0
Fix Version/s: 4.8.0

Type: Bug Priority: Major - P3
Reporter: Tommaso Tocci Assignee: Tommaso Tocci
Resolution: Fixed Votes: 0
Labels: PM-1645-Milestone-3, bkp, sharding-csrs-stepdown-also
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
depends on SERVER-51102 Use new $currentOp aggregation pipeli... Closed
Documented
is documented by DOCS-13896 Investigate changes in SERVER-51074: ... Closed
Problem/Incident
is caused by SERVER-47914 Move clusterTime from LogicalClock to... Closed
Related
is related to SERVER-49970 Prefer the VectorClock's ConfigTime t... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.7
Sprint: Sharding 2020-10-05
Participants:
Linked BF Score: 28

 Description   

In order to gossip the different components of the vector clock, we attach them to both commands requests and their replies before to send them over the wire. This is done by calling the VectorClock::gossipOut() function.

This function inspects the tags of the current session to decide which components of the vector clock will be gossiped. If no session is found the default tags (passed to the function) will be used for this decision.

Now the problem is that a session can exists but it can be still in the early kPending state, in such a case it won't have any of the other tags.

This case it is not considered by the VectorClock::gossipOut() function. In fact if a session in a pending state is found we always threat the communication as external and the the default tags (passed to the function) will be ignored.

This results in the VectorClock components not being gossiped on all the newly started sessions.



 Comments   
Comment by Githook User [ 24/Sep/20 ]

Author:

{'name': 'Tommaso Tocci', 'email': 'tommaso.tocci@mongodb.com', 'username': 'toto-dev'}

Message: SERVER-51074 VectorClock is not gossiped correctly on newly started sessions
Branch: master
https://github.com/mongodb/mongo/commit/ca5fb3dfb5dc0edef0bc92ab7c2e9aeb4a95b9be

Comment by Tommaso Tocci [ 23/Sep/20 ]

This fix requires the changes of SERVER-51102 otherwise it will make the count_plan_summary.js test fail. In fact as a side effect of gossiping more components of the vector clock, the size of the command description in the currentOp log will increase. The currentOp command can't be used anymore to inspect the current operation in a reliable way, instead its aggregation pipeline operator counterpart should be used.

Comment by Tommaso Tocci [ 22/Sep/20 ]

Yes kevin.pulo this was the idea. A pending session needs to be treated as if it was absent.

Comment by Kevin Pulo [ 22/Sep/20 ]

So I guess the solution is to treat a session in kPending the same as if it was absent (ie. honour the default tags), right?

Comment by Tommaso Tocci [ 21/Sep/20 ]

Due to this bug, since SERVER-49970, the configOpTime is not propagated properly through the cluster, causing several bugs.

Generated at Thu Feb 08 05:24:26 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.