[SERVER-27543] Create new metadata for oplog queries Created: 29/Dec/16  Updated: 07/Apr/17  Resolved: 15/Feb/17

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 3.4.4, 3.5.4

Type: Task Priority: Major - P3
Reporter: Spencer Brody (Inactive) Assignee: Judah Schvimer
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
is depended on by SERVER-27403 Consider term and rbid when validatin... Closed
Related
related to SERVER-27668 Remove unnecessary fields in ReplSetM... Backlog
is related to SERVER-27545 Include RBID in replSet metadata of c... Closed
Backwards Compatibility: Fully Compatible
Backport Requested:
v3.4
Sprint: Repl 2017-01-23, Repl 2017-02-13, Repl 2017-03-06
Participants:
Linked BF Score: 0

 Description   

Oplog queries require different metadata from heartbeats. There are fields that each need that the other doesn't, and there are fields that one should consider that the other should not. As we are defining the role that each of these commands play in the replication process, the metadata that each replication command carries must be clarified as well. We will now have ReplSetMetadata with $replData like we have now, but we will add in a new OplogQueryMetadata with $oplogQueryData. The OplogQueryMetadata can have the primaryIndex, the lastOpCommitted, the syncSourceIndex, the lastOpApplied, and the rbid from after the query. The OplogFetcher will then request both types of metadata. For backwards compatibility we need to keep ReplSetMetadata the same as it is now, but in the future (3.8) we can strip it down to term, configVersion, replsetID, and lastOpCommitted (as long as sharding needs it outside of oplog querying), to deduplicate the fields in OplogQueryMetadata only needed for oplog querying. We will keep having the OplogFetcher request both types of metadata since it needs fields from both. We are only removing fields that other commands like heartbeating do not need.



 Comments   
Comment by Githook User [ 06/Apr/17 ]

Author:

{u'username': u'judahschvimer', u'name': u'Judah Schvimer', u'email': u'judah@mongodb.com'}

Message: SERVER-27543 Process OplogQueryMetadata with backwards and forwards compatibility

(cherry picked from commit 7284884f3d8f3cf1d1489579180c2637efcc42b2)
Branch: v3.4
https://github.com/mongodb/mongo/commit/d453e01da24dc03cb67f17e46b5ced9dbe947fbb

Comment by Githook User [ 06/Apr/17 ]

Author:

{u'username': u'judahschvimer', u'name': u'Judah Schvimer', u'email': u'judah@mongodb.com'}

Message: SERVER-27543 send OplogQueryMetadata with OplogFetcher queries

(cherry picked from commit 859dfb093328ae9129f18952df4f25b123977a38)
Branch: v3.4
https://github.com/mongodb/mongo/commit/cbe53c003584cae8efad1715aeac181c8f8d2eb7

Comment by Githook User [ 06/Apr/17 ]

Author:

{u'username': u'judahschvimer', u'name': u'Judah Schvimer', u'email': u'judah@mongodb.com'}

Message: SERVER-27543 Create new metadata for oplog queries

(cherry picked from commit 8daf13374403272ace73340f25fce573fa36a55b)
Branch: v3.4
https://github.com/mongodb/mongo/commit/b4e4f6c70b4324b8e5ffd8f292922c201ed7c92d

Comment by Githook User [ 15/Feb/17 ]

Author:

{u'username': u'judahschvimer', u'name': u'Judah Schvimer', u'email': u'judah@mongodb.com'}

Message: SERVER-27543 Process OplogQueryMetadata with backwards and forwards compatibility
Branch: master
https://github.com/mongodb/mongo/commit/7284884f3d8f3cf1d1489579180c2637efcc42b2

Comment by Githook User [ 20/Jan/17 ]

Author:

{u'username': u'judahschvimer', u'name': u'Judah Schvimer', u'email': u'judah@mongodb.com'}

Message: SERVER-27543 send OplogQueryMetadata with OplogFetcher queries
Branch: master
https://github.com/mongodb/mongo/commit/859dfb093328ae9129f18952df4f25b123977a38

Comment by Githook User [ 12/Jan/17 ]

Author:

{u'username': u'judahschvimer', u'name': u'Judah Schvimer', u'email': u'judah@mongodb.com'}

Message: SERVER-27543 Create new metadata for oplog queries
Branch: master
https://github.com/mongodb/mongo/commit/8daf13374403272ace73340f25fce573fa36a55b

Comment by Misha Tyulenev [ 09/Jan/17 ]

spencer Im leaning towards adding a separate metadata for this value in 3.6. $replData has several fields related to repl only and its turned on by request. The new metadata will be always present in the OP_COMMANDREPLY for read or write commands and will has only one field.
If this will be accepted in the design then it can be deprecated from the $replData.

Comment by Spencer Brody (Inactive) [ 09/Jan/17 ]

Looks like we use lastOpVisible in exactly once place (outside of logging) and that's for deciding whether to re-evaluate our sync source: https://github.com/mongodb/mongo/blob/cdc7af4c6d453b8c1ce2319d0cd3b50074609d87/src/mongo/db/repl/topology_coordinator_impl.cpp#L2447

Using the last applied optime there rather than the current definition of lastOpVisible would probably be totally fine (and arguably more correct), but I don't think we would want to remove that part of the check entirely. So I see the options as 1) do nothing 2) Change the meaning of the lastOpVisible field (yet again) to mean last applied op (removing the max-ing with last client optime) 3) add a new field to the ReplSetMetadata lastOpApplied, stop using lastOpVisible in 3.6, remove it in 3.8.

I'm currently leaning towards option 3, though I'd like to confirm with misha.tyulenev and kaloian.manassiev whether lastOpVisible with its current semantics is needed for the Causal Consistency work.

Comment by Siyuan Zhou [ 30/Dec/16 ]

I changed the uses of lastVisibleOpTime in sharding to lastCommittedOpTime in SERVER-24630, so I think we are fine.

Comment by Judah Schvimer [ 30/Dec/16 ]

SERVER-21059 changed lastVisibleOpTime from the above description to its current form. We need to make sure we do not create a regression there.

Comment by Siyuan Zhou [ 30/Dec/16 ]

Following up Misha's suggestion, here is the spec of lastOpVisible:

As part of the response to a read or write command, metadata can be returned that provides an optime that can be useful in subsequent readAfterOpTime reads. This optime, “lastVisibleOpTime”, will indicate an optime of certain provenance, depending on the properties of the read or write command that produced it:

read with readConcern level majority:
lastVisibleOpTime is the optime of the last op in the oplog, in the snapshot blessed as ‘committed’.

read with readConcern level local:
lastVisibleOpTime is the optime of the last op in the oplog.

write with writeConcern majority:
lastVisibleOpTime is the optime of the write (if successful) or the previous successful write at the time the write was determined to have failed or a no-op. This operation waits for the op with the lastVisibleOpTime to replicate and sync to disk to a majority of nodes and for the primary to bless a snapshot with that op as ‘committed’; this allows a subsequent readConcern level majority read on the same primary node to read the data without any need to supply a readAfterOpTime.

write with any writeConcern other than majority:
lastVisibleOpTime is the same as described for writeConcern majority, except that the operation does not wait for disk syncing or the blessing of a snapshot.

Comment by Siyuan Zhou [ 30/Dec/16 ]

We used lastOpVisible in several places and always assume it's the last applied optime. Probably, we can just change it from max(the last client optime, the last committed snapshot) to max(the last client optime, the last applied) to match its name.

Here're all the places where we use lastOpVisible, excluding tests and mocks.

mongo/mongo/src/mongo/db/repl
data_replicator_external_state_impl.cpp
- DataReplicatorExternalStateImpl::shouldStopFetching, line 75:  << source << ", OpTime " << metadata.getLastOpVisible()
WithSyncSourceHavingNoSyncSourceStopsTheOplogFetcher::_doTest, line 724:  ASSERT_EQUALS(metadata.getLastOpVisible(), dataReplicatorExternalState->syncSourceLastOpTime);
 
oplog_fetcher.cpp
- OplogFetcher::_callback, line 514:  << metadata.getLastOpVisible().toString()
 
topology_coordinator_impl.cpp
- TopologyCoordinatorImpl::shouldChangeSyncSource, line 2439:  std::max(metadata.getLastOpVisible(), _hbdata.at(currentSourceIndex).getAppliedOpTime());

With this change, SERVER-27403 could be fixed in an easier way, but we need to extra work to support multiversion, either by requiring minimal old version to upgrade or by falling back to other ways to detect rollback.

Comment by Misha Tyulenev [ 30/Dec/16 ]

Note: Causal Consistency (https://goo.gl/wsJP91) design spec expect it to satisfy the https://goo.gl/9cnVxh spec

Generated at Thu Feb 08 04:15:27 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.