[SERVER-35377] Operations on new clients get latest in-memory clusterTime as operationTime Created: 04/Jun/18  Updated: 29/Oct/23  Resolved: 14/Jun/18

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 3.6.7, 4.0.0-rc6, 4.1.1

Type: Bug Priority: Major - P3
Reporter: Jack Mulrow Assignee: Misha Tyulenev
Resolution: Fixed Votes: 0
Labels: Sharding36Regression
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Duplicate
is duplicated by SERVER-35156 secondary reads return cluster time a... Closed
Problem/Incident
is caused by SERVER-34843 Mongod can return operationTime great... Closed
Related
related to SERVER-31887 clusterTime advanced on primary witho... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.0, v3.6
Sprint: Sharding 2018-06-18
Participants:
Case:
Linked BF Score: 60

 Description   

When computing operationTime for a response, the "client operation time" at the start of a command is compared to the time at the end. There are some error contexts where there is no start operation time, so if the start time is a null logical time, the latest in-memory cluster time is returned.

The start time is computed by getting the opTime of the last op on the current client. If a client hasn't been used before though, this can be null, which results in a null start time and the returned operationTime will be the latest in-memory time, even for successful operations.

This does not violate causal consistency, but it can create problems in our test infrastructure when the no-op writer is off, because it's possible the latest in-memory clusterTime is a higher than the latest opTime in a replica set primary's oplog. In particular, when secondaries force metadata refreshes on the primary, they use the operationTime of _forceRouterConfigUpdate to do a waitUntilOpTime (which bypasses forced no-op writes) and can hang forever.

Possible solutions would be:

  • Distinguishing between not having a start time at all and running on a new client (possibly with a boost::optional or new sentinel logical time value)
  • Changing computeOperationTime to return the lastAppliedOpTime instead of the latest in-memory time when there is no start time
  • Using the lastAppliedOpTime as the start client operation time if the client has no last op

This is new behavior introduced by the refactoring in SERVER-34843. It also exacerbates SERVER-31887, since successful requests can receive an operationTime not in the primary's oplog.



 Comments   
Comment by Misha Tyulenev [ 02/Aug/18 ]

glajchs i dont think that this issue is directly related, but do not exclude it completely. Check if this backport reduces amount of logging you see currently.

Comment by Scott Glajch [ 30/Jul/18 ]

Could this have anything to do with the issue I was seeing in this comment here:
https://jira.mongodb.org/browse/SERVER-30842?focusedCommentId=1926477&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-1926477

Basically after upgrading to 3.6, when I do the stepdown (so there's now a primary in the shard that is 3.6), I see 80-150K log lines in the first minute or two that look something like this:
2018-06-21T01:12:03.689+0000 I REPL [ShardServerCatalogCacheLoader-328] Not setting the last OpTime for this Client from { ts: Timestamp(1529543523, 606), t: 9 } to the current system time of { ts: Timestamp(1529543523, 588), t: 9 } as that would be moving the OpTime backwards. This should only happen if there was a rollback recently

The amount of these message do diminish over time, but don't seem to ever fully go away.

Comment by Githook User [ 27/Jul/18 ]

Author:

{'username': 'mikety', 'name': 'Misha Tyulenev', 'email': 'misha@mongodb.com'}

Message: SERVER-35377 use lastAppliedOpTime when there is clientTime available in operationTime calculation

(cherry picked from commit bf17a390cac465e19a8bd243620b1010d0bad2bb)
(cherry picked from commit 6212273ec79659f28dff13d66a57ba2f2b1aab4a)
Branch: v3.6
https://github.com/mongodb/mongo/commit/e3651194c351605fe1e5aaf34cd8feb9c8f522d4

Comment by Githook User [ 14/Jun/18 ]

Author:

{'username': 'mikety', 'name': 'Misha Tyulenev', 'email': 'misha@mongodb.com'}

Message: SERVER-35377 use lastAppliedOpTime when there is clientTime available in operationTime calculation

(cherry picked from commit bf17a390cac465e19a8bd243620b1010d0bad2bb)
Branch: v4.0
https://github.com/mongodb/mongo/commit/6212273ec79659f28dff13d66a57ba2f2b1aab4a

Comment by Githook User [ 14/Jun/18 ]

Author:

{'username': 'mikety', 'name': 'Misha Tyulenev', 'email': 'misha@mongodb.com'}

Message: SERVER-35377 use lastAppliedOpTime when there is clientTime available in operationTime calculation
Branch: master
https://github.com/mongodb/mongo/commit/bf17a390cac465e19a8bd243620b1010d0bad2bb

Comment by Jack Mulrow [ 08/Jun/18 ]

kaloian.manassiev SERVER-34843 hasn't actually been backported to 3.6 yet, so this is only currently a problem in 4.0. With the periodic no-op writer enabled this isn't a big issue (only if the first refresh triggered by a secondary uses a new client, it may have to wait for a no-op), but it's causing a lot of noise in our tests, so it would be good to get a fix in as soon as possible.

Comment by Jack Mulrow [ 04/Jun/18 ]

max.hirschhorn I think so. Before the refactoring in SERVER-34843 operationTime should always have been a value actually in some primary's oplog (last applied or committed opTime for reads, and the latest opTime on the client for writes), so if the repro in SERVER-35156 doesn't fail without SERVER-34843, it's probably caused by this issue.

Comment by Max Hirschhorn [ 04/Jun/18 ]

jack.mulrow, misha.tyulenev, is this the cause of SERVER-35156? The "find" command in the steps to reproduce would have been the first operation we sent to the secondary.

Generated at Thu Feb 08 04:39:38 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.