[SERVER-58952] Implement operation to check if documents in collection have changed Created: 29/Jul/21 Updated: 24/Feb/23 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Catalog, Querying |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Evan Nixon | Assignee: | Backlog - Query Execution |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Assigned Teams: |
Query Execution
|
| Sprint: | QE 2021-08-09 |
| Participants: |
| Description |
|
Is there a way to let users determine "has db.myCollection changed at all since <time>" that is more lightweight than opening a change stream and listening for change events? It would be useful to be able to check if a collection has changed in any way since some time. Motivation The atlas search "synonyms" feature defines synonym data in a user collection. Search automatically detects changes to that collection, and updates synonyms artifacts on change. Search uses change streams to detect changes to that collection today - but does not have a way to incrementally modify synonym artifacts given change event information. mongot must replicate "from scratch" on each change. Change streams are an effective way to detect a change for a collection - but, if possible, we would be able to ask "has this collection changed?" in a less resource-intensive way. |
| Comments |
| Comment by Oren Ovadia [ 24/Feb/23 ] | |
|
> Isn't there an optimization when there's ts comparison that avoids scanning the entire oplog?
We will issue multiple ones where <time> increases for each one. So it would be very similar to using a changestream from our perspective at least. | |
| Comment by Asya Kamsky [ 23/Feb/23 ] | |
|
> It would still need to scan the whole oplog Isn't there an optimization when there's ts comparison that avoids scanning the entire oplog? | |
| Comment by Katya Kamenieva [ 04/Nov/22 ] | |
|
Moving to backlog, no plans for now to implement such a feature | |
| Comment by Evan Nixon [ 12/Aug/21 ] | |
|
One alternative approach could be to directly query the oplog for events in a namespace after a specific time. This would scan a similar number of oplog documents as the change stream approach, but avoids overhead from transforming oplog entries to change events. | |
| Comment by Charlie Swanson [ 12/Aug/21 ] | |
|
Could you try whether you could answer this with a query on the oplog? Something like
It would still need to scan the whole oplog which is not as good as we could do with some custom logic, but it's worth a try as a speedup in the meantime. |