[SERVER-34818] changeStream option startAtClusterTime should be a raw BSON Timestamp Created: 02/May/18 Updated: 29/Oct/23 Resolved: 17/May/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Querying |
| Affects Version/s: | None |
| Fix Version/s: | 4.0.0-rc0 |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Daniel Aprahamian (Inactive) | Assignee: | Charlie Swanson |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||
| Participants: | |||||||||||||
| Description |
|
In 4.0, we will be allowing users to specify a startAtClusterTime option. Currently, the server expects the option to be a BSON document of the form:
This naming not intuitive, as a clusterTime object (as defined in server response $clusterTime.clusterTime) does not conform to this format. We should either modify the startAtClusterTime field to expect a raw BSON timestamp, or choose a more intuitive name for the field. |
| Comments |
| Comment by Githook User [ 17/May/18 ] | ||||||||||||||||
|
Author: {'email': 'charlie.swanson@mongodb.com', 'username': 'cswanson310', 'name': 'Charlie Swanson'}Message: Also changes the type of the argument from an object with a single 'ts' | ||||||||||||||||
| Comment by Spencer Brody (Inactive) [ 07/May/18 ] | ||||||||||||||||
|
We need to enable Cloud Automation to upgrade from 3.7.5 to 4.0.0 without needing to re-run setFCV. We don't need to guarantee that applications written against 3.7.5 can upgrade to 4.0.0 without a code change - that is, we're still allowed to change our public APIs up until at least RC0 (and probably could for GA if it was important enough). | ||||||||||||||||
| Comment by Jeremy Mikola [ 07/May/18 ] | ||||||||||||||||
This just stood out as a minor API inconsistency worth addressing while we still can. I understand if there are insufficient resources to address this for the 4.0 release. I'm not sure how this relates to supporting an upgrade path from 3.7.5 to 4.0.0, since we don't document the $changeStream operator or expect users to run it directly in their own aggregation pipelines – unless you referring to beta drivers that have already implemented a watch() helper (or perhaps ensuring the 3.7.5 shell can be used against a 4.0.0 server). | ||||||||||||||||
| Comment by Misha Tyulenev [ 04/May/18 ] | ||||||||||||||||
|
As long as this value will not cause the clusterTime to advance validation is not needed. In the case of afterClusterTime the validation is needed because otherwise it will hang waiting for read concern if the value is chosen far in the future. | ||||||||||||||||
| Comment by Charlie Swanson [ 04/May/18 ] | ||||||||||||||||
|
Oh, also - jmikola's guess of where this time will come from seems good to me, though asya might know more - I seem to recall some customers were asking for this, independent of our need for it for resuming a stream which hasn't seen anything. | ||||||||||||||||
| Comment by Charlie Swanson [ 04/May/18 ] | ||||||||||||||||
|
jmikola I'm not sure if this is too late - it'd certainly take some additional server effort to change the parser, and we'd likely have to support both formats because we want to support upgrade from 3.7.5 to 4.0.0 (I've been told). My hunch is yes unless you feel strongly about this. misha.tyulenev why should we validate that? We chose not to, since the only thing that would happen is that your stream would get no notifications. We don't use this value to modify the system's clock at all. We thought about adding an assert though. | ||||||||||||||||
| Comment by Misha Tyulenev [ 04/May/18 ] | ||||||||||||||||
|
Note that the server should validate that the user provided time is not greater than the clusterTime. | ||||||||||||||||
| Comment by Jeremy Mikola [ 04/May/18 ] | ||||||||||||||||
Given what we now know, is there any reason to keep the "ts" field in 4.0 or does this fall under "too late to change it now"? Looking at the current implementation of mongodb/specifications#311 for SPEC-1057, the drivers will accept a BSON timestamp as an option. I suppose drivers can simply wrap the BSON timestamp (be it from the session or a user-provided value) in a "ts" field, but I do think it'd be preferable to clean up the $changeStream API if possible.
This leads me to believe that any user-supplied value would come from one of three places: Session.getOperationTime(), an op log document (if users are iterating manually), or be a manually constructed BSON timestamp (likely a timestamp integer with an increment of zero). We can rule out Session.getClusterTime() because we tell users those are opaque documents and they generally shouldn't extract its "clusterTime" field on their own. | ||||||||||||||||
| Comment by Charlie Swanson [ 03/May/18 ] | ||||||||||||||||
|
Hello all! I hope I can provide some clarification here. It's my understanding that an 'operationTime' is one particular cluster time. That is, out of all the values of Timestamps that have been used as a "cluster time" (returned with $clusterTime), an operationTime has to be one of them. I would have proposed 'startAtTime' for the name of this option, but I wanted to make it clear that it was not an ISODate or something like that. 'startAtOperationTime' seems fine, but logically you should be able to start at any of the cluster times, so 'startAtClusterTime' seemed broader to me. I think that using either the 'operationTime' or the 'clusterTime' for this is fine from a correctness point of view, so if Misha has objections to looking within the '$clusterTime' field then using operationTime is good. To Jeremy's point, the "ts" field is absolutely superfluous, but we decided to use an object before because we weren't sure if a Timestamp would always be sufficient as a way of representing a cluster time (that project was in-flight at the time of the original '$_resumeWithClusterTime' option which we renamed here to 'startAtClusterTime'). To Jeff's question, the semantics are to include any operations that happened exactly at the provided time. If I recall the decision making processes, it didn't seem to matter much so we went with 'include it'. Finally, One point in response to Jeremy's concern is that we might expect a user to provide a timestamp gathered from the oplog (e.g. maybe they use 'the oldest thing in the oplog' or 'the time from the operation right before I created that collection'). Let me know if I missed anything. | ||||||||||||||||
| Comment by Misha Tyulenev [ 03/May/18 ] | ||||||||||||||||
Just to clarify: the purpose of the signature is to validate that the clusterTime is in fact generated by the server to prevent malicious attacks where server's oplog optime is driven to the max and then server cant take any writes. Hence its not ignored but can be bypassed in some cases for performance. | ||||||||||||||||
| Comment by Jeremy Mikola [ 03/May/18 ] | ||||||||||||||||
I'd argue that the "ts" field is superfluous. As a user-facing API, it makes sense that the $changeStream option should take one of the following two types:
This all depends on how we expect users to provide this value. If we expect users to craft their own BSON timestamps or supply one from Session.operationTime, accepting a BSON timestamp type directly make sense. If we only expect users to feed in values from a Session's clusterTime property, than the opaque cluster time document makes sense.
I concur. When a few of us talked this through yesterday I understood that startAtClusterTime was basically a stand-in for a resume token when a change stream has not yet seen any change documents (and thus hasn't yet captured a resume token). I think "after" makes more sense in that context. | ||||||||||||||||
| Comment by Misha Tyulenev [ 03/May/18 ] | ||||||||||||||||
|
I stick to my suggestion that drivers should treat $clusterTime as a blackbox object and only pass it so we will have more flexibility to make server changes without backwards compatibility dependencies. I think charlie.swanson can address change stream semantics. My take on all the afterClusterTime field variants (startAtClusterTime, atClusterTime etc) - use operationTime derived values as it was added to indicate when the last command has been performed so in the case of afterClusterTime the secondary or primary on other shard wait only as long as needed. | ||||||||||||||||
| Comment by Jeffrey Yemin [ 03/May/18 ] | ||||||||||||||||
|
I'd also like clarification on the semantics of this field. Is it similar to resumeAfter in that it includes only changes that occurred after the given timestamp? If so, use the word after in the field name rather than at, as at implies that it includes changes that occurred at a time equal to the given timestamp. | ||||||||||||||||
| Comment by Shane Harvey [ 03/May/18 ] | ||||||||||||||||
|
Copying my suggestions from SPEC-1057. In I think what the startAtClusterTime field should accept is an operationTime:
Instead of a $clusterTime:
I propose we rename the field to startAtOperationTime since it does not accept a cluster time. I think this makes sense because the operationTime field is already documented as corresponding to the timestamp of an oplog entry: https://docs.mongodb.com/manual/reference/method/db.runCommand/#command-response |