[SERVER-32496] Start tailing from the beginning of changes history Created: 31/Dec/17 Updated: 10/Mar/23 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Aggregation Framework |
| Affects Version/s: | 3.6.0, 3.6.1 |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Minor - P4 |
| Reporter: | Maurizio Casimirri | Assignee: | Backlog - Query Execution |
| Resolution: | Unresolved | Votes: | 8 |
| Labels: | change-streams-improvements | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Assigned Teams: |
Query Execution
|
||||||||||||||||||||||||
| Backwards Compatibility: | Minor Change | ||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||||||
| Description |
|
As far as I understand currently, change streams allows to tail either from resumeAfter token or from the tail of the stream. For many uses that's not enough, we may want to resume from the beginning if a client is not in possession of a resume token. Something like {{ {resumeAfter: 'oldest'}}} for the $changeStream would be great. Although the post-update lookup would lack proper semantic for old events, streaming ETL with aggregation in external pipelines would definitely benefit from that. It's also possible to return always the currently available version of the document in the lookup field. I believe it would simplify the initial ingestion of events without resorting to other means (like starting the $changeStream the first time and touching all the documents). In case the $changeStream is relying on oplog.rs then ideally the same kind of option should be applied as a fallback for a token referring to an operation that has been truncated from the oplog (restart from the first operation found or from the tail?). Such a feature is quite important to build a reliable ETL that ensures at-least-once semantic. |
| Comments |
| Comment by Asya Kamsky [ 13/Dec/19 ] |
|
> Such a feature is quite important to build a reliable ETL that ensures at-least-once semantic. Note that this feature would only be able to guarantee at most once - without a token, there is no way to know how many events were missed before the beginning of the oplog.
|
| Comment by David McKay [ 23/Jan/18 ] |
|
I'm building a Kafka Connect source plugin that supports mongoDB and this issue is one I am currently trying to solve. The ability to ask for the earliest document in the oplog would be perfect. Or better yet, the ability to pass a filter and start from the oldest matching record would be better. |