[SERVER-58952] Implement operation to check if documents in collection have changed Created: 29/Jul/21  Updated: 24/Feb/23

Status: Backlog
Project: Core Server
Component/s: Catalog, Querying
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Evan Nixon Assignee: Backlog - Query Execution
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Query Execution
Sprint: QE 2021-08-09
Participants:

 Description   

Is there a way to let users determine "has db.myCollection changed at all since <time>" that is more lightweight than opening a change stream and listening for change events?

It would be useful to be able to check if a collection has changed in any way since some time.

Motivation

The atlas search "synonyms" feature defines synonym data in a user collection. Search automatically detects changes to that collection, and updates synonyms artifacts on change.

Search uses change streams to detect changes to that collection today - but does not have a way to incrementally modify synonym artifacts given change event information. mongot must replicate "from scratch" on each change.

Change streams are an effective way to detect a change for a collection - but, if possible, we would be able to ask "has this collection changed?" in a less resource-intensive way.



 Comments   
Comment by Oren Ovadia [ 24/Feb/23 ]

> Isn't there an optimization when there's ts comparison that avoids scanning the entire oplog?
 
Note we need to know of any change that happened to the collection as time advances, so even if we run a command like:

db.oplog.rs.count({ns: "db.myCollection", ts: {$gte: <time>}})

We will issue multiple ones where <time> increases for each one. So it would be very similar to using a changestream from our perspective at least.

Comment by Asya Kamsky [ 23/Feb/23 ]

> It would still need to scan the whole oplog

Isn't there an optimization when there's ts comparison that avoids scanning the entire oplog?

Comment by Katya Kamenieva [ 04/Nov/22 ]

Moving to backlog, no plans for now to implement such a feature

Comment by Evan Nixon [ 12/Aug/21 ]

One alternative approach could be to directly query the oplog for events in a namespace after a specific time.

This would scan a similar number of oplog documents as the change stream approach, but avoids overhead from transforming oplog entries to change events.

Comment by Charlie Swanson [ 12/Aug/21 ]

Could you try whether you could answer this with a query on the oplog? Something like

db.oplog.rs.count({ns: "db.myCollection", ts: {$gte: <time>}})

It would still need to scan the whole oplog which is not as good as we could do with some custom logic, but it's worth a try as a speedup in the meantime.

Generated at Thu Feb 08 05:45:56 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.