[SERVER-46308] Investigate dependency between commit point (lastCommitted) and cluster time Created: 21/Feb/20  Updated: 29/Oct/23  Resolved: 04/Oct/21

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 5.1.0-rc0

Type: Bug Priority: Major - P3
Reporter: Lingzhi Deng Assignee: Randolph Tan
Resolution: Fixed Votes: 0
Labels: sharding-wfbf-day
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File test.js    
Issue Links:
Related
related to SERVER-35663 Replication recovery does not update ... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Sharding 2020-04-20, Sharding 2020-05-04, Sharding 2020-05-18, Sharding 2020-07-13, Sharding 2020-06-01, Sharding 2020-06-15, Sharding 2020-06-29, Sharding 2020-07-27, Sharding 2020-08-10, Sharding 2020-08-24, Sharding 2020-09-21, Sharding 2020-10-05, Sharding 2020-10-19, Sharding 2020-11-02, Sharding 2020-11-16, Sharding 2020-11-30, Sharding 2020-12-14, Sharding 2020-12-28, Sharding 2021-01-11, Sharding 2021-01-25, Sharding 2021-02-22, Sharding 2021-03-08, Sharding 2021-03-22, Sharding 2021-04-05, Sharding 2021-04-19, Sharding 2021-05-03
Participants:

 Description   

computeOperationTime returns the replication lastCommittedOpTime for majority reads or the lastAppliedOpTime otherwise. And we have this dassert that makes sure that neither of them should be beyond the cluster time.

We have a dassert in ReplicationCoordinatorImpl::_setMyLastAppliedOpTimeAndWallTime to assert that the lastAppliedOpTime should never advance beyond the cluster time. However, we do not have a corresponding dassert for the lastCommittedOpTime.

Normally, for internal communications between replset members, we have the LogicalTimeMetadataHook to parse cluster time metadata and advance the cluster time on receiving a network message. For example, when a node receives a heartbeat response, it parses the cluster time metadata as part of the network interface hooks before handing it off to repl to process the heartbeat response. And when the node processes the heartbeat, it could advance the commit point on hearing a more recent commit point. So the assumption that the commit point is never ahead of the cluster time is normally correct because we parse the cluster time metadata first.

However, if a heartbeat response comes from an arbiter, it could contain a more recent commit point without cluster time metadata, simply because logical clock is disabled for arbiters. So theoretically, the following could happen:
1. secondary's current knowledge of the cluster time is 90
2. secondary receives a heartbeat response without cluster time metadata from an arbiter that has a commit point 100
3. secondary processes the heartbeat and advances its commit point (lastCommitted) to 100.
4. A majority read on the secondary returns operation time 100 from computeOperationTime
5. dassert is hit because the secondary's logical clock is 90, being < the operation time.

The fact that computeOperationTime returns lastCommitted for majority reads seems weird. Because for majority reads, they dont actually read at lastCommitted but the committed snapshot. And repl guarantees that the committed snapshot is never ahead of the lastApplied, (whereas lastCommitted could). So I think it is more correct for computeOperationTime to return the committed snapshot for majority reads.

This ticket should investigate whether what mentioned above could actually happen with arbiters. No matter which time we decide computeOperationTime should return for majority reads, we should add a corresponding dassert in repl like we did for the lastApplied to enforce that assumption. And we should have a targeted test for it.



 Comments   
Comment by Vivian Ge (Inactive) [ 06/Oct/21 ]

Updating the fixversion since branching activities occurred yesterday. This ticket will be in rc0 when it’s been triggered. For more active release information, please keep an eye on #server-release. Thank you!

Comment by Githook User [ 04/Oct/21 ]

Author:

{'name': 'Randolph Tan', 'email': 'randolph@10gen.com', 'username': 'renctan'}

Message: SERVER-46308 Use current committed snapshot time instead of last comm…
Branch: master
https://github.com/mongodb/mongo/commit/e80a562c2135ad6e0b79f41efb993e3485052ac0

Comment by Randolph Tan [ 30/Sep/21 ]

Attached test.js demonstrating the scenario as described.

Generated at Thu Feb 08 05:11:06 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.