[SERVER-76027] Limit memory usage for bulkWrite (mongos) Created: 12/Apr/23  Updated: 19/Jan/24  Resolved: 19/Jan/24

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 7.3.0-rc0

Type: Task Priority: Major - P3
Reporter: Lingzhi Deng Assignee: Sean Zimmerman
Resolution: Fixed Votes: 0
Labels: milestone-2
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-72792 Implement progress bookkeeping for in... Closed
Related
related to SERVER-72794 Implement cursor response for bulkWri... Closed
related to SERVER-76026 Limit memory usage for bulkWrite (mon... Closed
Assigned Teams:
Replication
Backwards Compatibility: Fully Compatible
Sprint: Repl 2024-01-22
Participants:

 Description   

Mongos can't use remoteCursors for bulkWrite like other cluster commands because it will need to consume write results from the shards first before determining what to do next as a router. Given this nature of a router, mongos needs to cache response for each individual WriteOp. And there is no memory limit for it today.

Similar to SERVER-76026, this ticket is to limit memory usage (e.g. capped at 100MB) for the WriteOp responses cached on mongos. And this will effectively also limit memory usage for each bulkWrite response cursor.

The end goal is to limit the memory usage when we build or cache responses for bulkWrite ops. Once this limit is hit, we should stop executing the remaining operations in this bulkWrite.



 Comments   
Comment by Githook User [ 19/Jan/24 ]

Author:

{'name': 'seanzimm', 'email': '102551488+seanzimm@users.noreply.github.com', 'username': 'seanzimm'}

Message: SERVER-76027: Limit bulkWrite memory usage on mongos (#18115)

GitOrigin-RevId: 81e883615c2f01286d16dd1aa763485245c20899
Branch: master
https://github.com/mongodb/mongo/commit/83e4ffafb01399f99ae20b457fb83099668b2312

Comment by Lingzhi Deng [ 14/Apr/23 ]

More detailed explanation:
Once this limit is hit, we should stop executing the remaining operations in this bulkWrite. And we can return an error for the first operation we stop at. If it is a transaction, we will abort it but we will need to make sure it doesn’t return transient transaction error label.

This was an “easy but good enough” solution for now to work around the fact that mongos/the router logic needs to cache shard responses before knowing what to do next. We think this should be rare for people to hit this limit in practice.

If this becomes an issue in the future, ideally we will want to:

  • Allow mongod to spill to disk
    • This bounds memory usages on mongod/shards
  • Allow mongos/router to peek (w/o consuming) cursor responses from mongod
    • This will truly allow remote cursors on mongos for bulkWrite
Generated at Thu Feb 08 06:31:39 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.