-
Type: Task
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
Atlas Streams
-
Sprint 63
It's possible for a user to insert a large event into a collection, then see a change stream $source error like:
BSONObjectTooLarge: Change stream $source . failed: Executor error during getMore :: caused by :: BSONObj size: 17201172 (0x1067814) is invalid. Size must be between 0 and 16793600(16MB) First element: _id: { _data: "8267321B560000000C2B042C0100296E5A1004DA0DA361E906436D9A885CF53EB93904463C6F7065726174696F6E54797065003C696E736572740046646F63756D656E744B65790046465F..." }: generic server error
Currently, this error fails the processor.
In this ticket, we want to change the behavior to emit a DLQ message for this event, then keep reading. The DLQ message will look like:
{
processorName: ...,
dlqTime: ...,
errInfo: { reason: <a helpful error message for the 16MB issue> },
doc: <some truncated view of the change event, depending on what we can get back from the server>
}
We discussed a few behavior options with the product team and aligned on (1):
- DLQ a message corresponding to the large event. The message should contain a truncated view of the data and metadata that helps the user find the event in their change stream (i.e., a resumeToken).
- Fail the processor with clearer error (recommend to use $source.config.pipeline = [$changeStreamSplitLargeEvent])
- Try to automatically use $changeStreamSplitLargeEvent