[SERVER-74133] Spilling to TemporaryRecordStores in multi-doc transactions does not work as expected Created: 17/Feb/23  Updated: 13/Oct/23

Status: Backlog
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Louis Williams Assignee: Backlog - Query Execution
Resolution: Unresolved Votes: 0
Labels: query-product-scope-2, query-product-urgency-3, query-product-value-2
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-74085 Ensure queries that spill to Temporar... Backlog
related to WT-10576 Return EBUSY on forced drop if there ... In Code Review
is related to SERVER-81331 Spilling in SBE may lead to read on d... Closed
Assigned Teams:
Query Execution
Operating System: ALL
Participants:
Case:

 Description   

Some query stages implement spilling by writing to a WT table. If this happens inside the context of a user multi-doc transaction, these writes will be included in the same storage engine transaction. As uncommitted writes are not evictable to disk by WiredTiger, this has the consequence of filling up WiredTiger's dirty cache with un-evictable data.

This has a few problems:

  • Spilling in a multi-doc transaction will create negative performance consequences and potentially stall any concurrent write workload, as we can pollute the WT cache with un-evictable data
  • Because no spilling is happening, we actually are still imposing a limit based on the size of the WT cache, meaning that user queries will fail if they spill enough data.
  • We're running into WiredTiger bugs because we drop the tables before the transactions commit/abort (WT-10576)

I see two solutions:

  • We can ban spilling in multi-doc transactions until we come up with a better solution
  • Spill in a separate storage transaction. We have to be incredibly careful here because this can cause deadlocks (see SERVER-61116), and would need close collaboration with Storage Execution. To make this work, we would need to do something similar to what we did for SERVER-62650. In the spilling transaction, we would always need to impose a timeout on how long it will block on cache eviction. Otherwise, we have to force the user's transaction to roll-back.


 Comments   
Comment by Garaudy Etienne [ 06/Apr/23 ]

From a product standpoint, we would love to be able to support larger transactions. The downsides of spilling to disk that are outlined above, make it less ideal.

Comment by Chris Harris [ 04/Apr/23 ]

Thanks - I'm looking into what the expectations are for transactions in the future

Comment by Kyle Suarez [ 04/Apr/23 ]

Sending a friendly ping to christopher.harris@mongodb.com and joe.sack@mongodb.com to please discuss this at your next product triage meeting.

Comment by David Storch [ 23/Feb/23 ]

kyle.suarez@mongodb.com sending this to Query Director Triage.

Generated at Thu Feb 08 06:26:35 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.