[SERVER-32496] Start tailing from the beginning of changes history Created: 31/Dec/17  Updated: 10/Mar/23

Status: Backlog
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: 3.6.0, 3.6.1
Fix Version/s: None

Type: Improvement Priority: Minor - P4
Reporter: Maurizio Casimirri Assignee: Backlog - Query Execution
Resolution: Unresolved Votes: 8
Labels: change-streams-improvements
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Duplicate
is duplicated by SERVER-45126 Allow change stream to resume from th... Closed
Related
related to CSHARP-2146 System.IO.EndOfStreamException when i... Closed
is related to SERVER-50903 No error when change stream's startAt... Closed
Assigned Teams:
Query Execution
Backwards Compatibility: Minor Change
Participants:
Case:

 Description   

As far as I understand currently, change streams allows to tail either from resumeAfter token or from the tail of the stream.

For many uses that's not enough, we may want to resume from the beginning if a client is not in possession of a resume token.

Something like {{

{resumeAfter: 'oldest'}

}} for the $changeStream would be great. Although the post-update lookup would lack proper semantic for old events, streaming ETL with aggregation in external pipelines would definitely benefit from that.

It's also possible to return always the currently available version of the document in the lookup field.

I believe it would simplify the initial ingestion of events without resorting to other means (like starting the $changeStream the first time and touching all the documents).

In case the $changeStream is relying on oplog.rs then ideally the same kind of option should be applied as a fallback for a token referring to an operation that has been truncated from the oplog (restart from the first operation found or from the tail?).

Such a feature is quite important to build a reliable ETL that ensures at-least-once semantic.



 Comments   
Comment by Asya Kamsky [ 13/Dec/19 ]

> Such a feature is quite important to build a reliable ETL that ensures at-least-once semantic.

Note that this feature would only be able to guarantee at most once - without a token, there is no way to know how many events were missed before the beginning of the oplog.

 

Comment by David McKay [ 23/Jan/18 ]

I'm building a Kafka Connect source plugin that supports mongoDB and this issue is one I am currently trying to solve.

The ability to ask for the earliest document in the oplog would be perfect. Or better yet, the ability to pass a filter and start from the oldest matching record would be better.

Generated at Thu Feb 08 04:30:24 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.