[SERVER-66289] $out incorrectly throws BSONObj size error on v5.0.8 Created: 06/May/22  Updated: 29/Oct/23  Resolved: 16/Aug/22

Status: Closed
Project: Core Server
Component/s: Query Execution, Shell
Affects Version/s: 5.0.8
Fix Version/s: 4.4.18, 5.0.14, 6.0.3, 6.1.0-rc0

Type: Bug Priority: Major - P3
Reporter: Awanish Raj Assignee: Mihai Andrei
Resolution: Fixed Votes: 2
Labels: $out, aggregation, bug
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File screenshot-1.png     PNG File screenshot-2.png     PNG File screenshot-3.png    
Issue Links:
Backports
Depends
Problem/Incident
Related
related to SERVER-68845 BSONObjectTooLarge when $merge during... Closed
is related to SERVER-71238 BSONObjectTooLarge Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v6.0, v5.0, v4.4
Steps To Reproduce:

Steps followed:

  1. Launch M10 cluster on Atlas and load sample dataset.
  2. In the shell, use sample_mflix db
  3. Run aggregation command: db.movies.aggregate([{$out: "movies2"}])

 

On version 4.4.13:

The command runs successfully and movies2 collection is created.

 

On version 5.0.8:

The command fails with the error:

PlanExecutor error during aggregation :: caused by :: BSONObj size: 16836845 (0x100E8ED) is invalid. Size must be between 0 and 16793600(16MB) First element: insert: "tmp.agg_out.045d5052-e014-4830-8d16-5b985ebfc46e"

Sprint: QE 2022-05-30, QE 2022-06-13, QE 2022-06-27, QE 2022-07-11, QE 2022-07-25, QE 2022-08-08, QE 2022-08-22
Participants:
Case:
Linked BF Score: 170

 Description   

In version 5, aggregation with $out as final stage is throwing BSONObj size (16MB) error even though individual documents are much smaller. It seems to be measuring the total size of all the documents in the result set.



 Comments   
Comment by Githook User [ 05/Oct/22 ]

Author:

{'name': 'Mihai Andrei', 'email': 'mihai.andrei@10gen.com', 'username': 'mtandrei'}

Message: SERVER-66289 Update write size estimation logic in DocumentSourceWriter

(cherry picked from commit 707ba0a0ade42c4540b9cabaaf5a257de944cc3e)
(cherry picked from commit c172ccd37516f3c2118f349817cdb1841a2486b9)
(cherry picked from commit 3f2cd9485a807eaeabc60bd99653cffd2942f662)
Branch: v4.4
https://github.com/mongodb/mongo/commit/dad5d0a196ffb05b2c5c8b315c33ca46c8b65934

Comment by Githook User [ 04/Oct/22 ]

Author:

{'name': 'Mihai Andrei', 'email': 'mihai.andrei@10gen.com', 'username': 'mtandrei'}

Message: SERVER-66289 Update write size estimation logic in DocumentSourceWriter

(cherry picked from commit 707ba0a0ade42c4540b9cabaaf5a257de944cc3e)
(cherry picked from commit c172ccd37516f3c2118f349817cdb1841a2486b9)
Branch: v5.0
https://github.com/mongodb/mongo/commit/3f2cd9485a807eaeabc60bd99653cffd2942f662

Comment by Githook User [ 04/Oct/22 ]

Author:

{'name': 'Mihai Andrei', 'email': 'mihai.andrei@10gen.com', 'username': 'mtandrei'}

Message: SERVER-66289 Update write size estimation logic in DocumentSourceWriter

(cherry picked from commit 707ba0a0ade42c4540b9cabaaf5a257de944cc3e)
Branch: v6.0
https://github.com/mongodb/mongo/commit/c172ccd37516f3c2118f349817cdb1841a2486b9

Comment by Mihai Andrei [ 03/Oct/22 ]

The real commit that wound up in master: 

https://github.com/mongodb/mongo/commit/707ba0a0ade42c4540b9cabaaf5a257de944cc3e

(note that https://github.com/mongodb/mongo/commit/7b7fe658db948e6f5a4a6c30d4590d7866c59371 was reverted)

I wonder why the githook didn't pick this one up... oh well

Comment by Githook User [ 30/Sep/22 ]

Author:

{'name': 'Uladzimir Makouski', 'email': 'uladzimir.makouski@mongodb.com', 'username': 'umakouski'}

Message: Revert "SERVER-66289 Update write size estimation logic in DocumentSourceWriter"

This reverts commit dff46e92439898e1012f93ee7ca82d35ad88dae5.
Branch: v6.0
https://github.com/mongodb/mongo/commit/814ad525328e887add796abd0940563da33d0224

Comment by Githook User [ 29/Sep/22 ]

Author:

{'name': 'Mihai Andrei', 'email': 'mihai.andrei@10gen.com', 'username': 'mtandrei'}

Message: SERVER-66289 Update write size estimation logic in DocumentSourceWriter

(cherry picked from commit 7b7fe658db948e6f5a4a6c30d4590d7866c59371)
Branch: v6.0
https://github.com/mongodb/mongo/commit/dff46e92439898e1012f93ee7ca82d35ad88dae5

Comment by Lars Van Casteren [ 14/Sep/22 ]

Thanks!

Comment by Kyle Suarez [ 14/Sep/22 ]

larsvancasteren@gmail.com, the fix in this ticket is present in 6.1.0-rc0; however, the team has not yet backported the fix to previous branches. We've requested backport to the v6.0, v5.0 and v4.4 versions. Please continue to watch this ticket for updates and thank you for your patience.

Kyle

Comment by Lars Van Casteren [ 14/Sep/22 ]

MongoDB 4.4.15

When updating a $out (aggregation) on a primary node without any readPreference it runs successful.
When running the same $out aggregation using the secondary node readPreference approach we get this error:

BSONObj size: 16953146 (0x102AF3A) is invalid.
Size must be between 0 and 16793600(16MB) First element:
insert: " tmp.agg_out.55125f65 - 0ef0 - 4c1f - a7ef - dec311c99612 "

For completness: https://www.mongodb.com/community/forums/t/mongodb-4-4-15-bsonobjecttoolarge-aggregation-out-update/186983

Gr,
L

Comment by Githook User [ 27/Jul/22 ]

Author:

{'name': 'Mihai Andrei', 'email': 'mihai.andrei@10gen.com', 'username': 'mtandrei'}

Message: SERVER-66289 Update write size estimation logic in DocumentSourceWriter
Branch: master
https://github.com/mongodb/mongo/commit/7b7fe658db948e6f5a4a6c30d4590d7866c59371

Comment by Mihai Andrei [ 01/Jun/22 ]

Hi awanish@assetplus.io,

Thanks for filing this ticket and reporting this issue! After doing some investigation, there does appear to be an issue with how the server handles batches of writes when running $out with secondary read preference. One thing I would like to note however, is that when investigating, I was able to reproduce the issue on 4.4 as well as 5.0. Would you be able to confirm that when you ran your aggregation with secondary read preference on 4.4, that the aggregate succeeded? Also, which shell did you use? I would like to rule out the possibility of a separate issue (that of the shell or drivers not respecting read preference).

Comment by Chris Kelly [ 24/May/22 ]

Hello,

Given that this issue appears to be behaving inconsistently depending on where the query is ran from, I'm going to forward this to Driver Escalation for further investigation, given that:

Not all driver versions support targeting of $out operations to replica set secondary nodes. Check your driver documentation to see when your driver added support for $out running on a secondary.

  • The same query on the same data is behaving differently based on where it is going (Atlas vs local replica set)
  • This may be affecting $merge as well, which has the same warning in the docs.

 

Christopher

Comment by Clayton Taylor [ 12/May/22 ]

Seeing this issue as well; a few details from my team's end:

  1. When running code locally on a Windows workstation pointed to Atlas cluster, aggregation succeeds. When code executes from a container in a Linux based K8s cluster pointed to the same Atlas cluster, aggregation fails with:
    • // Anonymized method executions
      Error while indexing: Command aggregate failed: PlanExecutor error during aggregation :: caused by :: BSONObj size: 16813584 (0x1008E10) is invalid. Size must be between 0 and 16793600(16MB) First element: insert: "tmp.agg_out.d03d9e58-98fd-44ed-bed0-4242d6ffafce".
      MongoDB.Driver.MongoCommandException: Command aggregate failed: PlanExecutor error during aggregation :: caused by :: BSONObj size: 16813584 (0x1008E10) is invalid. Size must be between 0 and 16793600(16MB) First element: insert: "tmp.agg_out.d03d9e58-98fd-44ed-bed0-4242d6ffafce".
          at MongoDB.Driver.Core.WireProtocol.CommandUsingCommandMessageWireProtocol`1.ProcessResponse(ConnectionId connectionId, CommandMessage responseMessage)
          at MongoDB.Driver.Core.WireProtocol.CommandUsingCommandMessageWireProtocol`1.ExecuteAsync(IConnection connection, CancellationToken cancellationToken)
          at MongoDB.Driver.Core.Servers.Server.ServerChannel.ExecuteProtocolAsync[TResult](IWireProtocol`1 protocol, ICoreSession session, CancellationToken cancellationToken)
          at MongoDB.Driver.Core.Operations.CommandOperationBase`1.ExecuteProtocolAsync(IChannelSource channelSource, ICoreSessionHandle session, ReadPreference readPreference, CancellationToken cancellationToken)
          at MongoDB.Driver.Core.Operations.WriteCommandOperation`1.ExecuteAsync(IWriteBinding binding, CancellationToken cancellationToken)
          at MongoDB.Driver.Core.Operations.AggregateToCollectionOperation.ExecuteAsync(IWriteBinding binding, CancellationToken cancellationToken)
          at MongoDB.Driver.OperationExecutor.ExecuteWriteOperationAsync[TResult](IWriteBinding binding, IWriteOperation`1 operation, CancellationToken cancellationToken)
          at MongoDB.Driver.MongoCollectionImpl`1.ExecuteWriteOperationAsync[TResult](IClientSessionHandle session, IWriteOperation`1 operation, CancellationToken cancellationToken)
          at MongoDB.Driver.MongoCollectionImpl`1.AggregateAsync[TResult](IClientSessionHandle session, PipelineDefinition`2 pipeline, AggregateOptions options, CancellationToken cancellationToken)
          at MongoDB.Driver.MongoCollectionImpl`1.UsingImplicitSessionAsync[TResult](Func`2 funcAsync, CancellationToken cancellationToken)
          at **REDACTED**
          at **REDACTED**
      

  2. Setting the readPreference to primary as a workaround worked for us (kudos to @Awanish Raj).
  3. From what I've read on batchSize being set on the cursor, the total result of each batch must be less than 16MB; not just the document size. I wouldn't expect the $out stage to be done in batches, but seemed worth mentioning.
Comment by Awanish Raj [ 10/May/22 ]

The issue happens only if readPreference was "secondary". If no readPreference is given, the command succeeds.

Comment by Awanish Raj [ 10/May/22 ]

The error pops up when using Realm Triggers (In production) and MongoDB Compass Shell (In testing) to run this command so far. When I ran the same from mongosh in terminal, there was no issue and it succeeded.

Comment by Awanish Raj [ 06/May/22 ]

I have verified that the aggregation succeeds on v4.4.13, but fails on v5.0.8 and v5.3.1.

Comment by Awanish Raj [ 06/May/22 ]

I have tested it with v5.3 on Atlas. The issue remains.

Generated at Thu Feb 08 06:05:00 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.