[SERVER-32843] OpObserver's callback methods must not have return values Created: 22/Jan/18  Updated: 30/Oct/23  Resolved: 01/Mar/18

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 3.7.3

Type: Improvement Priority: Major - P3
Reporter: Kaloian Manassiev Assignee: ADAM Martin (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-32897 Separate ReplicationOpObserver from O... Backlog
Backwards Compatibility: Fully Compatible
Sprint: Platforms 2018-01-29, Platforms 2018-02-12, Platforms 2018-02-26, Platforms 2018-03-12
Participants:

 Description   

The OpObserverRegistry was introduced as an abstraction to allow decoupling making data modifications from the side effects, which need to happen as a result of these modifications, such as op log writes, retryable writes, etc.

Some of the OpObserver's methods currently return the OpTime which resulted from logging the operation to the replication oplog. In addition, in certain cases, the OpTime resulting from an earlier OpObserver might be needed by a later one, which is the case with retryable writes.

In order to support these requirements, the OpObserver(s) chain should have access to some common per-operation structure, where runtime information could be persisted.

The specification is:

  • Add a decoration on the `OperationContext` and place it in the protected section of `OpObserver`. It should contain an `std::vector<OpTime> reservedOpTimes`.
  • Change all `OpObserverRegistry` methods to have a scoped guard, which clears that vector.
  • Change the `repl::logOp` call to invariant that this vector is empty and populate it with the set of timestamps, which it produced.
  • Change the `OpObserverRegistry's onDropCollection` and `onRenameCollection` methods to not call `_forEachObserver`, but instead loop and invariant that the return values of the chained calls are null `OpTimes` (`isNull()`) and instead end in:
    `if (reservedOpTimes.empty()) return OpTime();
    else { invariant(reservedOpTimes.size() == 1); return reservedOpTimes[0];}

    `
    Change `OpObserverImpl`'s `onDropCollection` and `onRenameCollection` methods to return null `OpTimes`.



 Comments   
Comment by Githook User [ 01/Mar/18 ]

Author:

{'email': 'adam.martin@10gen.com', 'name': 'ADAM David Alan Martin', 'username': 'adamlsd'}

Message: SERVER-32843 Allow multiple times in OpObservers

The OpObserverRegistry was introduced as an abstraction to allow
decoupling making data modifications from the side effects, which
need to happen as a result of these modifications, such as op log
writes, retryable writes, etc.

Some of the OpObserver's methods currently return the OpTime which
resulted from logging the operation to the replication oplog. In
addition, in certain cases, the OpTime resulting from an earlier
OpObserver might be needed by a later one, which is the case with
retryable writes.

In order to support these requirements, the OpObserver(s) chain
should have access to some common per-operation structure, where
runtime information could be persisted.
Branch: master
https://github.com/mongodb/mongo/commit/296fde1259d29e081069fde1c69bb9ae083932b1

Comment by Spencer Brody (Inactive) [ 25/Jan/18 ]

lgtm

Comment by Kaloian Manassiev [ 25/Jan/18 ]

judah.schvimer, spencer to your questions - I just updated the comment.

Comment by Spencer Brody (Inactive) [ 25/Jan/18 ]

I don't think repl::logOp should be responsible for clearing the vector, I think that should be done by the OpObserverRegistry after running the last observer and saving aside whatever it needs to return. repl::logOp should invariant that the vector is empty when it starts.

I also don't like the term 'uncommittedOpTimes' as the term 'uncommitted' can mean different things at different layers. I'd prefer either 'reservedOpTimes', 'generatedOpTimes', or something similar.

Comment by Judah Schvimer [ 25/Jan/18 ]

So all OpObservers that the registry calls will return OpTime() for onDropCollection and onRenameCollection?

Comment by Kaloian Manassiev [ 25/Jan/18 ]

The final specification we came up with is:

  • Add a decoration on the OperationContext and place it in the protected section of OpObserver. It should contains an std::vector<OpTime> reservedOpTimes.
  • Change all OpObserverRegistry methods to have a scoped guard, which clears that vector.
  • Change the repl::logOp call to invariant that this vector is empty and populate it with the set of timestamps, which it produced.
  • Change the OpObserverRegistry's onDropCollection and onRenameCollection methods to not call _forEachObserver, but use loop and invariant that the return values of the chained calls are null OpTimes (isNull()) and instead end in:
    • if (reservedOpTimes.empty()) return OpTime();
    • else { invariant(reservedOpTimes.size() == 1); return reservedOpTimes[0];}
  • Change OpObserverImpl's onDropCollection and onRenameCollection methods to return null OpTimes.
Comment by Spencer Brody (Inactive) [ 25/Jan/18 ]

The overall approach sounds good, though I'm not sold on the name TransactionContext. Counterproposal: how about we just decorate the OperationContext directly with the vector of OpTimes, and call that vector 'reservedOpTimes'?

For this to work we'll have to ensure that whatever OpObserver generates the OpTimes (currently OpObserverImpl, but likely to soon be a separate ReplicationOpObserver) runs first. I think we can do this by making sure the OpObserverRegistry guarantees that it will run observers in the order they are registered, and make sure to always register the observer that calls logOp first.

Comment by Kaloian Manassiev [ 25/Jan/18 ]

It is the latter, yes.

If we can define that one invocation of the OpObserver chain belongs to one "logical write" and than one WUOW can have multiple "logical writes", we need OpObservers later in the chain to access the OpTimes that OpObservers earlier in the chain generated for the same "logical write".

I am saying OpTimes (in plural) because onInserts for example generates more than one OpTime and if are to move the CSS to a separate OpObserver, it will need to access all of them.

Based on these requirements, does the proposal in this comment sound reasonable?

Comment by Spencer Brody (Inactive) [ 24/Jan/18 ]

kaloian.manassiev, are you saying that an OpObserver for a later write needs the optime(s) generated by an OpObserver for an earlier write? Or just that later OpObservers in the chain of registered OpObservers for a single write might need access to the OpTime of the write it is handling?

The latter makes sense to me, as it's something that the current handlers in the single OpObserver we have today have access to. If it's the former I'd like to understand better why it's needed.

Comment by Matthew Russotto [ 23/Jan/18 ]

The complication is we can have multiple OpObserver calls within the same write unit of work. If the later OpObservers need context that is specifically about a single OpObserver call (and I think they do), this wouldn't do what we want.

We do need something which records uncommitted changes for a transaction, but that needs to go elsewhere (something hung off the Session object is what the design says now)

Comment by Kaloian Manassiev [ 23/Jan/18 ]

Yes, this new decoration is for passing OpTimes (or other state) from one OpObserver call to the next and it will be essentially a stack of all OpLog entries, which happened as part of one WUOW.

Perhaps it should be called WriteUnitOfWorkContext then? It should be empty when WUOW begins, then OpObserver calls append to it and at commit time it is reset back to empty. How does that sound? Is this something also which would be useful for tracking the in-progress transaction state, because this is essentially what this is - a log of the uncommitted changes for a transaction?

Comment by Matthew Russotto [ 23/Jan/18 ]

I like the idea of putting it on the OperationContext rather the ReplClientInfo. I'm not sure it's sufficient for passing opTimes from opObserver to opObserver (is that what this is for?). Reason being there's no rule saying an opObserver be called only once in an operation context, so you wouldn't know where to start. It might be necessary to explicitly have some object hoding state for the opObserver calls.

Comment by Kaloian Manassiev [ 22/Jan/18 ]

I propose to introduce a decoration on the OperationContext called TransactionContext, which contains the following field:

  • std::vector<OpTime> uncommittedOpTimes - the set of op times which have been generated as part of this OpObserver chain invocation. In most of the cases, this field will contain only one element, but it needs to be a vector for the purposes of onInserts, which can generate more than one OpLog entry.

The logOp call will be changed to look for the OpObserver context and if there is one will append the timestamps it generates to it. At transaction commit time, the ReplClientInfo's lastCommitted time will be advanced to the highest of the uncommittedOpTimes (which should be the latest).

CC matthew.russotto

Generated at Thu Feb 08 04:31:27 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.