[SERVER-55062] Change stream events can exceed 16MB with no workaround Created: 09/Mar/21  Updated: 14/Dec/23  Resolved: 17/Apr/23

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: None
Fix Version/s: 7.0.0-rc0, 6.0.9

Type: Bug Priority: Major - P3
Reporter: Charlie Swanson Assignee: Backlog - Query Execution
Resolution: Fixed Votes: 7
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Duplicate
is duplicated by SERVER-64592 Make sure events with large documentK... Closed
Problem/Incident
Related
related to SERVER-53387 Large internal metadata can trigger B... Backlog
related to KAFKA-381 Support change stream split large events Backlog
is related to SERVER-81295 Cannot resume V2 changeStream pipelin... Closed
is related to SERVER-67699 Add tracking for when change stream e... Closed
is related to KAFKA-247 Recreate change stream from the point... Closed
Assigned Teams:
Query Execution
Backwards Compatibility: Fully Compatible
Operating System: ALL
Participants:
Case:

 Description   

Each document is limited to 16MB, but because a change event can be required to report both the document after the update and also the update description, we can easily exceed 16MB.

For example, this can be reproduced with the following:
1. Insert a 10 million character string into the DB
2. Update the document to include a different 10 million character string



 Comments   
Comment by Katya Kamenieva [ 26/Sep/23 ]

This feature was backported to 6.0.9 Docs
Be advised there currently exists an issue SERVER-81295 that impacts 6.0. We are working on resolving it promptly.

Comment by Katya Kamenieva [ 13/Jul/23 ]

alex.svatukhin@mongodb.com It is expected to be released in 6.0.9

Comment by Alex Svatukhin [ 10/Jul/23 ]

bernard.gorman@mongodb.com could I ask for a timeline of the $changeStreamSplitLargeEvent backport to 6.0? What exact 6.0 versions will it support? 

Comment by Yue Wang [ 28/Jun/23 ]

Hi, bernard.gorman@mongodb.com Gently ping. Could you help find someone look at this please?

Comment by Yue Wang [ 21/Jun/23 ]

Hi, bernard.gorman@mongodb.com

May I ask for some help to take a look into our potential patch of this issue in Mongo 4.4?

We (Stripe) are still using mongo 4.4. Back porting this fix to 4.4 is cumbersome and risky given that there are many things changed between the versions. So we are looking into some easier workaround to bypass this issue.

One approach that we have identified is to increase the size limit of the CES output to 32MB while still leaving all the write path as is (16MB).

  • After digging into the code, we found that the BSONObj supports customized trait. So we created a new trait with larger limit and applied to the change stream proxy where it was failing here. 
  • Then it fails with another stack trace that we don't understand
  • So we agressively updated the Document::toBson() method to apply the new trait globally for the Document class.

All the changes are in the same PR that linked above.

 

After applying the change, we surprisingly found out that:

  • it would success if we update a doc with 16MB field
  • update 17MB field fail with size limit which is expected
  • change stream get 32MB result succeeded (wow)

Though we don't understand yet how and why this works, this is the exact behavior we want to achieve.

 

So I'd like to ask mongo team

  • Whether this change makes sense?
  • Can we assume that the Document::toBson() is only applied at the read path, so that this change is safe?
    • If not, could you point us to the right place to update to achieve our goal?
  • Is there any other small work arounds that you can think of?
  • Is there any other debugging tricks, so that we can see more useful stack traces?

 

Thanks,

Yue obo Document DB team @ Stripe Inc.

Comment by Bernard Gorman [ 09/Jun/23 ]

Hi yuewang@stripe.com: full documentation of the new stage, including a description of its behaviours and an example use-case, is available here. Please note that this documentation relates to an upcoming, unreleased version of the MongoDB server, and is therefore still subject to change. For any change event which exceeds the maximum size, the $changeStreamSplitLargeEvent stage will break it apart and return a sequence of event fragments to the client, each of which is under 16MB.

Comment by Yue Wang [ 09/Jun/23 ]

bernard.gorman@mongodb.com Is there a design doc for this. On the high level, is this change transparent to the change stream consumers? Or do they have to change the logic on consumer side to merge the split event themselves?

Comment by Bernard Gorman [ 09/Jun/23 ]

Hi yuewang@stripe.com - we do intend to backport the $changeStreamSplitLargeEvent stage to a future release of 6.0, but unfortunately it's not feasible to backport it any further than that.

Comment by Yue Wang [ 09/Jun/23 ]

Hi bernard.gorman@mongodb.com Do we plan to backport this feature/fix into previous mongo versions? We are running on older version of mongodb, this is definitely a bug for us which is blocking us proceeding with some projects.

Comment by Bernard Gorman [ 17/Apr/23 ]

Hi palani@atomicfi.com and others,

I am pleased to report that this issue has been addressed in MongoDB 7.0! While we continue to recommend that users should avoid the BSONObjectTooLarge exception by reducing the size of their change stream events where possible - for instance, by not requesting pre- and post-images or by using $project to retrieve only those fields that are necessary for their application - we have also introduced a new stage called $changeStreamSplitLargeEvent to address use-cases where these approaches are not feasible. The $changeStreamSplitLargeEvent stage, upon seeing an event which would exceed the 16MB limit, will split that event into a series of <16MB fragments which are returned to the client sequentially. The forthcoming MongoDB 7.0 documentation will include more details on this stage, its usage and behaviours.

Comment by Palani Thangaraj [ 14/Dec/22 ]

please fix this, this is definitely a bug in functionality

Comment by Gatsby Lee [ 04/Mar/22 ]

This is is pretty critical.

I think this is a bug in functionality.

Generated at Thu Feb 08 05:35:20 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.