[SERVER-36941] Option to provide "before image" with change streams Created: 30/Aug/18  Updated: 29/Oct/23  Resolved: 29/Mar/22

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: None
Fix Version/s: 6.0.0-rc0

Type: Improvement Priority: Major - P3
Reporter: Arnie Listhaus Assignee: Bernard Gorman
Resolution: Fixed Votes: 38
Labels: change-streams-improvements
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by SERVER-48186 Get fullDocument and before/after wit... Closed
is duplicated by SERVER-37040 Change Stream - Pass deleted object t... Closed
is duplicated by SERVER-41236 Change Notification Stream should als... Closed
Related
related to SERVER-45806 Record pre-images on updates and dele... Closed
related to SERVER-41559 Can not fetch changed array elements ... Backlog
is related to SERVER-58272 Change Streams for complex nested fields Closed
is related to SERVER-56074 Add deleteLookup support on Change St... Closed
Backwards Compatibility: Fully Compatible
Participants:
Case:

 Description   

It would be useful to be able to see the data as it was before the change made to it when monitoring via change streams.

 



 Comments   
Comment by Andy Schwerin [ 03/Oct/22 ]

ywu@stripe.com, I'm afraid that documenting that behavior prior to 6.0 is a mistake on our part. We had a prototype implementation on earlier releases that we used for some targeted use cases at MongoDB, but we did not feel that implementation was appropriate for general release. It relied on putting document pre-images into the oplog, which negatively impacted the amount of oplog (and change events) that could be retained, especially for users with large document sizes. The implementation in 6.0 and later relies on the primary and secondaries storing pre-image documents as necessary when they apply updates, instead. This reduces pressure on the oplog considerably, which in a quirk of the MongoDB implementation, also substantially reduces pressure on the write-ahead log, reducing the I/O burden of the feature.

It is possible to activate the pre-6.0 feature (it requires changing the configuration files on the replica nodes), but it is not supported and we would not be able to support any problems you encountered with it.

Comment by Yang Wu [ 03/Oct/22 ]

Thanks for the clarification, it's helpful - I see there is `fullDocumentBeforeChange` option in 4.4's changeStream operator https://www.mongodb.com/docs/v4.4/reference/operator/aggregation/changeStream/

 

Looking at the code, it's doing a lookup in oplog with UUID, which seems to be doing the right thing – is there any known correctness and performance issue if we use it to get pre-image? 

Comment by Andy Schwerin [ 01/Oct/22 ]

Please check out the documentation for instructions about getting precise pre- and post-images. The code you have linked in your comment, ywu@stripe.com, is the code for the non-precise behavior which is still available.

Comment by Yang Wu [ 01/Oct/22 ]

Hi it looks like there is still a race condition when we do the post-image look up https://github.com/mongodb/mongo/blob/master/src/mongo/db/pipeline/document_source_change_stream_add_post_image.cpp#L203-L205 

i.e., if I do:

t1: insert, t2: update, t3: delete

and I read change stream for t2 after t3, the full doc will be null – is that expected when we say "point-in-time"? Or is there a way to get the post-image when the update actually happened at t2?

Comment by Pavel Kalugin [ 28/Jun/22 ]

Looks like it did https://www.mongodb.com/docs/v6.0/changeStreams/#change-streams-with-document-pre--and-post-images

Comment by Brian Gaeddert [ 28/Jun/22 ]

Hi,

Did this make it into MongoDB 6.0?

Comment by Bernard Gorman [ 04/May/22 ]

Hi divanshu.aggarwal@devrev.ai,

While we're not yet in a position to commit to a specific release date, we hope to make this feature available in MongoDB 6.0 on Atlas in June.

Best regards,
Bernard

Comment by Divanshu Aggarwal [ 04/May/22 ]

Hi,

Is there any time estimation of releasing this on MongoDb Atlas?  Really a need for most of us for quite some time.

Comment by Bernard Gorman [ 29/Mar/22 ]

The feature requested in this ticket has now been completed, via a project spanning many other SERVER tickets, and will be available in MongoDB 6.0. Users will be able to request a point-in-time pre-image, a point-in-time post-image, or both. Users will also be able to enable or disable pre/post-image retention per-collection, and to specify for how long such pre- and post-images are retained.

Comment by Bernard Gorman [ 29/Mar/22 ]

Hi ataramina@shorecg.com, william_luo@trendmicro.com, rupert.madden.abbott@yellowdog.co,

To answer your questions:

So the plan is to no longer include this feature in a 5.0 rapid release?

This feature just missed the cut-off date for inclusion in the 5.3 rapid release. Our plan is therefore to make it available in the next scheduled release, which is MongoDB 6.0. Given the complexity of the feature, we don't plan to backport it to the 5.0.x release line.

how is this going to interact with the 16MB limit? Let's say I have both "before" and "fullDocument" on. My change stream will now break if the original document is greater than 8MB?

If you request both the pre- and post-image in a scenario where your documents are large, it's possible that the resulting change stream event may exceed the 16MB limit. However, this limit only applies to documents as they exit the aggregation pipeline; documents that are still being processed within the pipeline can exceed it. You can therefore, for instance, apply a projection to the pre- and post-images to include only a subset of necessary fields, which will shrink the final output to below the 16MB limit.

Is this feature going to optionally include an "after" as well as a "before"? ... I can see it would be possible to add something downstream that merged the before with the updated/removed fields but I think this could get quite complicated, and it would be nicer to have something more out of the box!

Yes! You will be able to request a point-in-time pre-image, a point-in-time post-image, or both.

Thanks for your continued interest in this feature - we hope you'll be very pleased with the final result.

Best regards,
Bernard

Comment by Rupert Madden-Abbott [ 23/Feb/22 ]

Hi,

Is this feature going to optionally include an "after" as well as a "before"?

Currently, update "fullDocuments" are not necessarily accurate if any other operation interleaves before the lookup. It would be useful to avoid the need for that lookup.

I can see it would be possible to add something downstream that merged the before with the updated/removed fields but I think this could get quite complicated, and it would be nicer to have something more out of the box!

Also, how is this going to interact with the 16MB limit? Let's say I have both "before" and "fullDocument" on. My change stream will now break if the original document is greater than 8MB?

We would also love this feature to be available in 5 if possible!

Comment by William Luo [ 23/Feb/22 ]

I was disappointed to hear this too. Wish to see this available on 5.0 rapid release.

Comment by Andrzej Taramina [ 21/Feb/22 ]

So the plan is to no longer include this feature in a 5.0 rapid release?  That is a bit disappointing, since this is something that we would like to see as well, and sooner rather than later.

Comment by Bernard Gorman [ 11/Feb/22 ]

Hi oraiches@zadarastorage.com - while we're not in a position to commit to an exact release timeline yet, our current plan is to deliver this feature in MongoDB 6.0.

Comment by Oded Raiches [ 03/Feb/22 ]

Would be happy to see this coming as well, is there any update on time estimation of releasing this?

Comment by Warren Edwards [ 17/Jan/22 ]

Very happy to see that this is being worked on! We're in the process of a significant re-architecture and this functionality is very important for us, both for updates and deletions.

Being able to see the old values after updating a document in MongoDB would be perfect as we can then use this to clear old values from caches, instead of having to resort to saving old values in a separate field

Comment by Bernard Gorman [ 16/Nov/21 ]

Hi geotzinos@gmail.com,

I'm very pleased to tell you that we are in the middle of implementing this feature at present, and we expect it to be available in an upcoming Rapid Release version of MongoDB prior to the 6.0 LTS release (please see here for more information about our Rapid Release schedule). Once this project is complete, change streams will be able to retrieve point-in-time pre-images and/or post-images for any collection on which the feature is enabled.

Hope this helps!

Best regards,
Bernard

Comment by George Tzinos [ 16/Nov/21 ]

Hello everyone! This is a really urgent need for most of us, any update on this?

Comment by Stan Yeshchenko [ 06/Jul/21 ]

Any update based on @Giambattista's comment? 

Comment by Giambattista Bloisi [ 01/Jul/21 ]

Since SERVER-45806 has been close and pre-image can be made available in the op-log, now it should be possible to make available pre-image information in the stream API. Pre-image is not only useful and nice feature: it indeed allow to use watch API as a proper Change Data Capture event stream with full document information. The current API exposes throttling when fetching full document content when multiple updates are executed shortly. Worst case is when updated is immediately followed by a delete event: in that case with current API could be not possible to fetch the full document when processing the update event.

Also, it would be nice to have an helper method that reconstruct the resulting document given the pre-image and the update operation.

Comment by Jim Blackhurst [ 30/Aug/19 ]

Being able to request `fullDocument` for delete events would be useful in conjunction with TTL indexes, where documents being expired out of the database (deleted) could be picked up by a change stream and archived in another storage solution. 

Comment by Benjamin Perlmutter (Inactive) [ 19/Aug/19 ]

Hi Asya - the thought is that if Change Streams gave the previous state of the document, all the info desired would be in the change stream event. 

 

Since it isn't there, they'll have to find the document and then perform the change and capture the flow on this end. While findAndModify doesn't incur an extra read (update needs to find the document, then update) - it does mean the developer flow to capture the change will need to be built on the update side (not the change streams side) and thus, the capturing of the change will incur a read on update (rather than read of the change streams event).

Generated at Thu Feb 08 04:44:31 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.