[SERVER-81018] Access violation error Created: 13/Sep/23  Updated: 29/Dec/23  Resolved: 29/Dec/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 5.0.19, 5.0.20, 5.0.21
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Fabian Eisinger Assignee: Alison Rhea Thorne
Resolution: Gone away Votes: 0
Labels: windows-10
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Windows 10 Enterprise 12H2


Attachments: Text File Error-output.txt     Text File startup_debug.txt    
Operating System: ALL
Steps To Reproduce:

I'm unable to reproduce this error in a mvp

Participants:

 Description   

When running integration tests in a NodeJs project that integrates with mongoDB via mongoose I receive a timeout error after a short while because the mongoDB database that has been started with mongodb-memory-server crashed after an access violation error.
This is a copy of this issue I created on the mongodb-memory-server project https://github.com/nodkz/mongodb-memory-server/issues/798

It occurs with all v5 Versions of mongoDB and is fixed with v6.

Versions

  • NodeJS: 16.20.2
  • mongodb-memory-server-*: 8.15.1
  • mongodb(the binary version): 5.0.20
  • mongoose: 7.4.3
  • system: Windows 10 Enterprise 12H2

package: mongo-memory-server

What is the Problem?

Testcases time out after 2000ms. The debug output of memory-server shows

Unhandled exception","attr":{"exceptionString":"(access violation)","addressString":"0x00007FF6215CC4D4

(full log below)

Mongod is started in mochaGlobalSetup with:

mongod = await MongoMemoryReplSet.create({
    replSet: {
         count: 3,
         args: ['--notablescan'],
    },
    binary: {
          version: '5.0.20',
     },
});

The error occurs consistently but not always in the same testmethod. But it can be traced back to a particular line in the code under test, which is:

await Promise.all(
    task.countingScopes.map(countingScope => this._shelfStockLockDao.findOneAndDelete({
        lock: countingScope.lock,
        requester: countingScope.lockRequester,
        outletId: countingScope.outletId
    });

Commenting this code snippet resolves the error.

Also executing it in series resolves this error with:

for (const countingScope of task.countingScope) {
    await this._shelfStockLockDao.findOneAndDelete({
        lock: countingScope.lock,
        requester: countingScope.lockRequester,
        outletId: countingScope.outletId
}

Setting the option `writeConcern` to `1` also resolves the error but leads to other issues. As soon as `writeConcern` is `2, 3 or majority` the error occurs again.

`task.countingScope` is right now an array of length 3. Reducing it to length 2 also solves the issue.

Executing the tests agains a local cluster started with `run-rs` of the same version also resolves the error.

The only thing even close to the error I've found is https://github.com/Automattic/mongoose/issues/5376. An issue from 2017 and for mongoose v4 which recommends setting `useMongoClient`, an option which is the default by now.

Debug Output
I anonymized the debug output. If that leads to missing information please inform me and I'll fix it.

The file startup_debug.txt shows the debug output of mongodb-memory-server on startup
The file Error-output.txt shows the access violation error with the entire stacktrace



 Comments   
Comment by Alison Rhea Thorne [ 29/Dec/23 ]

Hi @Fabian,

Thank you for the clarification. The ephemeralForTest engine was only intended for testing purposes, and has since been deprecated and removed in more recent versions of MongoDB (as you observed). With that in mind, I'm going to go ahead and close this ticket.

Comment by Fabian Eisinger [ 11/Dec/23 ]

Hello rhea.thorne@mongodb.com

No I'm still not able to provide a MVP or similar. The error started happening after I added 5 new testcases. I tried added them one by one but couldn't figure out which testcase caused the issue because the error didn't happen.
It is occuring for 5.0.22 and I also checked 5.0.23, 5.3.2 and 6.0.9. I also wanted to check for 6.3.2 but got an error Cannot start server with an unknown storage engine: ephemeralForTest which is the default storage engine for MongoDBMemoryServer. After changing the storage engine to wired Tiger I could not reproduce the Access violation. So I went back and tested all the above mentioned versions with wired Tiger and was also not able to reproduce the Access violation error.

This explains why the error never occured when using my local MongoDB Cluster and only with MongoDBMemoryServer.
I could also not reproduce the error with the ephemeralForTest engine on 5.0.22 and my up-to-date code base. So something changed between Oct. 24 and now. I can track it down but wanted to ask first if you still want to pursue it given that ephermeralForTest is deprecated and already removed in the newest versions?

Comment by Alison Rhea Thorne [ 04/Dec/23 ]

We still need additional information to diagnose the problem. If this is still an issue for you, could you answer the following:

  1. You mentioned that you haven't been able to reproduce this in a production environment, has that situation changed?
  2. You mentioned that this is occurring on all versions of mongod v5, however I did want to get confirmation that you have confirmed this behavior is occurring specifically on version 5.0.22.
  3. Is this a test that you have written? I've been unable to locate this within either the mongoose, mongodb-memory-server nor our own javascript tests. If this is a test that you have written, would you be willing to provide a full reproduction script for our testing? That would go a long way towards assisting us with triage.
Comment by Alison Rhea Thorne [ 13/Nov/23 ]

Hello Fabian,

Thank you for providing the logfiles and diagnostics as requested. Unfortunately, the FTDC that was collected doesn't contain enough data to be actionable in and of itself (likely due to the time period in which the mongod instance was running, as it is running during tests specifically). That said, I do have a few questions:

  1. You mentioned that you haven't been able to reproduce this in a production environment, has that situation changed?
  2. You mentioned that this is occurring on all versions of mongod v5, however I did want to get confirmation that you have confirmed this behavior is occurring specifically on version 5.0.22.
  3. Is this a test that you have written? I've been unable to locate this within either the mongoose, mongodb-memory-server nor our own javascript tests. If this is a test that you have written, would you be willing to provide a full reproduction script for our testing? That would go a long way towards assisting us with triage.

Thank you,
Rhea

Comment by Fabian Eisinger [ 24/Oct/23 ]

rhea.thorne@mongodb.com I managed to get the log files from the mongo instances and uploaded everything you requested as "mongod-logs-full.zip". Let me know if you need a higher log level. It should be easier for me now to achieve it.

hasezoey@gmail.com Thank you for putting me on the right path. That's exactly what I did and with a few hacky patches to MMS I got the logfiles

Comment by hasezoey N/A [ 17/Oct/23 ]

>  When adding --logpath <path> to the node cli options they wouldn't start and be terminated after 20s.

 

using `–logpath` changes the log output instead of duplicating the log to the logpath (log output to stdout gets disabled), from my quick investigation `systemLog`(in the config file) does the same thing.

if this option was added as a argument to creating a instance with mongodb-memory-server (via `replSet.args`), then this is to be expected because MMS relies on the stdout log to be able to parse the mongod server state (though the default launch timeout is 10 seconds).

if this was not the case, what actually do you mean with "node cli options"? (MMS does not provide a runnable binary and i dont think there is a binary in the mongodb node driver)

Comment by Alison Rhea Thorne [ 17/Oct/23 ]

Hello Fabian. Thank you for uploading the diagnostic data, in regards to your question: yes the --logpath command line switch is the correct one to use. However, you may also use the configuration file to accomplish this. Heres a link to our doc on this:
 
systemLog Options

Comment by Fabian Eisinger [ 11/Oct/23 ]

rhea.thorne@mongodb.com I uploaded the diagnostic.data for each node as well as the log output from MongoDBMemoryServer again.
I wasn't able to collect the mongod logs. When adding --logpath <path> to the node cli options they wouldn't start and be terminated after 20s.
Is that the correct way to collect those logs or is there another possibility?

Comment by Alison Rhea Thorne [ 06/Oct/23 ]

Thank you for your report!

I've created a secure upload portal for you. Files uploaded to this portal are hosted on Box, are visible only to MongoDB employees, and are routinely deleted after some time.

For each node in the replica set spanning a time period that includes the incident, would you please archive (tar or zip) and upload to that link:

  • the mongod logs
  • the $dbpath/diagnostic.data directory (the contents are described here)

 

Generated at Thu Feb 08 06:45:14 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.