[SERVER-62748] spill_to_disk.js fails burn-in tests with "WT_CACHE_FULL: operation would overflow cache" error Created: 19/Jan/22  Updated: 09/Jun/23

Status: Backlog
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Minor - P4
Reporter: Irina Yatsenko (Inactive) Assignee: Backlog - Query Execution
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-61300 Investigate memory usage of exact-top... Closed
Assigned Teams:
Query Execution
Sprint: QE 2022-02-07, QE 2022-02-21, QE 2022-01-24
Participants:
Linked BF Score: 124

 Description   

The test creates up to 200MB of test data and that seems to stress the WT cache when the test is run in repeat mode on Enterprise RHEL 8.0 (inMemory) for the burn-in check.

https://jira.mongodb.org/browse/SERVER-61300 ran into this as well but the fix for that ticket hasn't fully addressed the problem as it's still happening: https://spruce.mongodb.com/task/mongodb_mongo_master_enterprise_rhel_80_64_bit_inmem_required_display_burn_in_tests_patch_2a763ca4ea023341983ca8f5e89fa214a56e331c_61e74f570305b91e5b10596e_22_01_18_23_40_26/execution-tasks?execution=0&sorts=STATUS%3AASC

Affected suites:
aggregation_mongos_passthrough_0_enterprise-rhel-80-64-bit-inmem-required
aggregation_one_shard_sharded_collections_0_enterprise-rhel-80-64-bit-inmem-required

[j0:s0:prim] | 2022-01-19T06:47:56.628+00:00 W STORAGE 6148401 [JournalFlusher] "The JournalFlusher encountered an error attempting to flush data to disk","attr":

{"JournalFlusherError":"ExceededMemoryLimit: -31807: WT_CACHE_FULL: operation would overflow cache"}

[js_test:spill_to_disk] assert: write failed with error: {
[js_test:spill_to_disk] "nInserted" : 0,
[js_test:spill_to_disk] "writeError" :

{ [js_test:spill_to_disk] "code" : 146, [js_test:spill_to_disk] "errmsg" : "WiredTigerIdIndex::_insert: index: _id_; uri: table:index-202--2892005430572051450 -31807: WT_CACHE_FULL: operation would overflow cache" [js_test:spill_to_disk] }

[js_test:spill_to_disk] }
[js_test:spill_to_disk] _getErrorWithCode@src/mongo/shell/utils.js:24:13
[js_test:spill_to_disk] doassert@src/mongo/shell/assert.js:18:14
[js_test:spill_to_disk] assert.writeOK@src/mongo/shell/assert.js:881:13
[js_test:spill_to_disk] _assertCommandWorked@src/mongo/shell/assert.js:722:17
[js_test:spill_to_disk] assert.commandWorked@src/mongo/shell/assert.js:829:16
[js_test:spill_to_disk] @jstests/aggregation/spill_to_disk.js:29:5
[js_test:spill_to_disk] @jstests/aggregation/spill_to_disk.js:14:2
[js_test:spill_to_disk] Error: write failed with error: {
[js_test:spill_to_disk] "nInserted" : 0,
[js_test:spill_to_disk] "writeError" :

{ [js_test:spill_to_disk] "code" : 146, [js_test:spill_to_disk] "errmsg" : "WiredTigerIdIndex::_insert: index: _id_; uri: table:index-202--2892005430572051450 -31807: WT_CACHE_FULL: operation would overflow cache" [js_test:spill_to_disk] }

[js_test:spill_to_disk] } :
[js_test:spill_to_disk] _getErrorWithCode@src/mongo/shell/utils.js:24:13
[js_test:spill_to_disk] doassert@src/mongo/shell/assert.js:18:14
[js_test:spill_to_disk] assert.writeOK@src/mongo/shell/assert.js:881:13
[js_test:spill_to_disk] _assertCommandWorked@src/mongo/shell/assert.js:722:17
[js_test:spill_to_disk] assert.commandWorked@src/mongo/shell/assert.js:829:16
[js_test:spill_to_disk] @jstests/aggregation/spill_to_disk.js:29:5
[js_test:spill_to_disk] @jstests/aggregation/spill_to_disk.js:14:2
[js_test:spill_to_disk] failed to load: jstests/aggregation/spill_to_disk.js



 Comments   
Comment by Mihai Andrei [ 19/Jan/22 ]

That sounds fine to me, but I wonder whether we should spend some time to better understand why the JournalFlusher is running out of memory so quickly? Also, this seems to be an issue only on the burn in tests task, but not when run once regularly in a suite.

Comment by Kyle Suarez [ 19/Jan/22 ]

mihai.andrei, taking a look at the linked patch build, it looks like cleaning every 20 executions still isn't enough. irina.yatsenko has suggested reducing the amount of memory used by the test, to avoid stressing memory-constrained systems like the inMemory storage engine builder. Perhaps we could dial down the size of the documents, as well as internalQuerySlotBasedExecutionHashAggApproxMemoryUseInBytesBeforeSpill, to maybe half of the current size? Even more?

Generated at Thu Feb 08 05:55:57 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.