[SERVER-79434] Bug running spill to disk in mongosync passthroughs Created: 27/Jul/23 Updated: 16/Oct/23 Resolved: 16/Oct/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Rohan Sharan | Assignee: | Ivan Fefer |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | greenerbuild | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Assigned Teams: |
Query Execution
|
||||
| Operating System: | ALL | ||||
| Sprint: | QE 2023-10-16, QE 2023-10-30 | ||||
| Participants: | |||||
| Description |
|
I tried re-enabling spill_to_disk after
Essentially it seems that at some point when the test is running j3:cl0:s1:prim throws a QueryExceededMemoryLimitNoDiskUseAllowed, and then it never recovers or is reachable by the other nodes. |
| Comments |
| Comment by Ivan Fefer [ 16/Oct/23 ] |
|
Unable to reproduce the issue. It looks like the problem is outside spill_to_disk.js test and the server. |
| Comment by Ivan Fefer [ 09/Oct/23 ] |
|
Also, latest execution succeeded, but accumulator_js_size_limits.js took 220 seconds to complete with a single query taking this much most of the time. }},{"$unwind":"$a"},{"$group":{"_id":"$groupBy","count":{"$accumulator":{"init":{"$code":"function() {\n return \"\";\n }"},"accumulateArgs":[\{"fieldName":"$a"}],"accumulate":{"$code":"function(state, args) {\n return state + \"a\";\n }"},"merge":{"$code":"function(state1, state2) {\n return state1 + state2;\n }"},"finalize":{"$code":"function(state) {\n return state.length;\n }"}}}}},{"$limit":10}],"cursor":{"batchSize":101},"let":{},"fromMongos":false,"$_generateV2ResumeTokens":false,"readConcern":{"level":"local","provenance":"implicitDefault"},"writeConcern":{"w":"majority","wtimeout":0,"provenance":"implicitDefault"},"shardVersion":{"t":{"$timestamp":{"t":0,"i":0}},"e":{"$oid":"000000000000000000000000"},"v":{"$timestamp": {"t":0,"i":0}}},"databaseVersion":{"uuid": {"$uuid":"5952481b-83b1-47af-b8d4-ba9c5bcf7ffc"},"timestamp":{"$timestamp":{"t":1696860872,"i":1}},"lastMod":1},"lsid":{"id": {"$uuid":"bd5169cf-043c-4e09-8049-0f8db356c9fd"},"uid":{"$binary": {"base64":"47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=","subType":"0"}}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1696860879,"i":2}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$configTime":{"$timestamp":{"t":1696860877,"i":1}},"$topologyTime":{"$timestamp":{"t":1696860862,"i":11}},"$client":{"application": {"name":"MongoDB Shell"},"driver":{"name":"MongoDB Internal Client","version":"6.0.8-43-g806d10c"},"os":{"type":"Linux","name":"Red Hat Enterprise Linux release 8.0 (Ootpa)","architecture":"x86_64","version":"Kernel 4.18.0-80.1.2.el8_0.x86_64"},"mongos":{"host":"ip-10-128-213-98.ec2.internal:20255","client":"127.0.0.1:57474","version":"6.0.8-43-g806d10c"}},"mayBypassWriteBlocking":false,"$db":"staging"},"planSummary":"COLLSCAN","numYields":0,"queryHash":"9B1F31E9","queryFramework":"classic","ok":0,"errMsg":"PlanExecutor error during aggregation :: caused by :: $accumulator arguments exceed max BSON size: 16777228","errName":"Location4545000","errCode":4545000,"reslen":396,"locks":{"FeatureCompatibilityVersion":{"acquireCount":{"r":2}},"Global":{"acquireCount":{"r":2}},"Mutex":{"acquireCount": {"r":3}}},"readConcern":{"level":"local","provenance":"implicitDefault"},"writeConcern":{"w":"majority","wtimeout":0,"provenance":"implicitDefault"},"storage":{},"remote":"127.0.0.1:47074","protocol":"op_msg","durationMillis":219626} |
| Comment by Ivan Fefer [ 09/Oct/23 ] |
|
I have restarted the task couple of time and it has failed 3 times out of 6. Failures are similar to the original failure, however it seems that spill_to_disk.js test finished successfully in this runs, but they hanged on accumulator_js_size_limits.js The common denominator of both tests is that they both spill data to disk. And according to logs hanging or death of one of the mongod processes happened after some data was spilled to disk. The lack of logs suggests the process either stuck waiting for something or is killed with SIGKILL or something similar any other failures usually leave at least "Segmentation Fault (core dumped)" in the worst case and a full stack trace and an error message in the best case. I suspect that the problem is on the side of environment: disk ran out of inodes, process exceeded limit on open file descriptor, something like that, not on the mongod side. |