[SERVER-6682] Can't write to ANY collection in the MongoDB instance Created: 01/Aug/12  Updated: 08/Mar/13  Resolved: 28/Sep/12

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 2.0.6
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Eugene Shustef Assignee: Ben Becker
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Participants:

 Description   

We are trying to write to any collection in our database and are unable to do so. Log file shows no entries. Even trying to change log level to 2 does not work. The query just continues to run (active, but no result).

We've been using this database for a long time and have never encountered this issue.



 Comments   
Comment by Ben Becker [ 08/Sep/12 ]

Hi Eugene,

I just wanted to follow up to see if you have encountered any similar issues since upgrading to v2.0.7-rc1. Also, were you able to determine if any system events were logged when this was encountered (e.g. due to low disk space)?

Thanks,
Ben

Comment by Ben Becker [ 02/Aug/12 ]

Thanks, that makes sense and seems to line up with SERVER-5663. We've seen VirtualProtect fail under similar circumstances – usually a system warning message similar to the following will be present in the event viewer:

"Your system is low on virtual memory. Windows is increasing the size of your virtual memory paging file. During this process, memory requests for some applications may be denied"

I believe that in most cases the mongod.exe process will start responding again once there is more room in the commit pool, so I'm curious if there were any other related messages from the OS at the time of the VirtualProtect failure.

Comment by Eugene Shustef [ 02/Aug/12 ]

I think we figured out that the drive where the service resides had less than 1 GB left (actual mongo files are on a different drive). The page file size is the standard Windows Server size (just over 8 GB).

BTW, even if you run --nojournal (sp?) option, repair won't work. I don't remember the exact message.

Comment by Ben Becker [ 02/Aug/12 ]

Hi Eugene,

If you're referring to SERVER-4266, --repairpath should work if disable journaling (which is ok since the journal files were already removed. If importing to a new replica set is already in progress, then no need to worry about it (sorry I missed you're previous comment).

The system events around the time VirtualProtect failed would be very helpful to confirm the root cause. Any additional details about system settings (e.g. custom page file size, network attached storage, virtualization, etc.) would also be very helpful.

Comment by Eugene Shustef [ 02/Aug/12 ]

I didn't have enough space for the repair option on the same drive. Couldn't use --repairpath because we are using journaling and apparently you can't repair using a different drive when journaling is turned on. There was a jira case from Nov. 2011 talking about this.

Comment by Ben Becker [ 02/Aug/12 ]

Hi Eugene,

To get back up and running, I would recommend repairing the database. For future reference, it would be best not to manually remove the journal files.

If time permits, before running the repair, could you run the 'db.foo.validate(

{full:true}

)' command on each collection and paste the output here? Also, could you check in the event viewer for any system events that may have occurred around the time of this failure?

Thanks,
Ben

Comment by Eugene Shustef [ 02/Aug/12 ]

FYI... we had to take the old db offline. Created a new replica set (all members running 2.0.7 rc1) with a brand new database. Just going to migrate the data out of the old database since we can still read. Would still love to find out what happened.

Comment by Eugene Shustef [ 01/Aug/12 ]

So, the question is how to get our current database back up.

Comment by Tad Marshall [ 01/Aug/12 ]

The first problem you hit is that in version 2.0.6, mongod.exe consumes page file space when journaling is enabled (as it is by default for 64-bit version). A second problem is that when mongod.exe runs out of page file space it really should just shut down, since it can't safely maintain the journal any more, but version 2.0.6 keeps running anyway.

Both of these problems are fixed in 2.0.7, which is currently in release candidate state (2.0.7-rc1 is the current one a this moment). The first problem is documented in SERVER-5663 as Andy said; the second problem (shut down if we run out of page file space) is documented in SERVER-6132.

Comment by Eugene Shustef [ 01/Aug/12 ]

We have upgraded to 2.0.7. We believe the problem is writing to new records, because are able to update existing old records. This is most likely caused by corruption in the db file that was created earlier today by the database.
We have two 2-gig files that were created at the exact same time, right around the time that this issue began.
Is there a way to repair a file? Can we just delete the latest 2-gig files?

Comment by Andy Schwerin [ 01/Aug/12 ]

This looks like an instance of SERVER-5663, which is resolved in Mongo version 2.0.7.

Comment by Eugene Shustef [ 01/Aug/12 ]

It's version 2.0.6.

Here is the latest log content since restart:

Wed Aug 01 19:49:53 [initandlisten] MongoDB starting : pid=1632 port=27017 dbpath=d:\mongo 64-bit host=ip-0A32EB4B
Wed Aug 01 19:49:53 [initandlisten] db version v2.0.6, pdfile version 4.5
Wed Aug 01 19:49:53 [initandlisten] git version: e1c0cbc25863f6356aa4e31375add7bb49fb05bc
Wed Aug 01 19:49:53 [initandlisten] build info: windows sys.getwindowsversion(major=6, minor=1, build=7601, platform=2, service_pack='Service Pack 1') BOOST_LIB_VERSION=1_42
Wed Aug 01 19:49:53 [initandlisten] options:

{ dbpath: "d:\mongo", logpath: "d:\mongo\log\mongodb.log", service: true }

Wed Aug 01 19:49:53 [initandlisten] journal dir=d:/mongo/journal
Wed Aug 01 19:49:53 [initandlisten] recover : no journal files present, no recovery needed
Wed Aug 01 19:49:53 [initandlisten] waiting for connections on port 27017
Wed Aug 01 19:49:53 [websvr] admin web console waiting for connections on port 28017
Wed Aug 01 19:49:58 [initandlisten] connection accepted from 98.113.167.34:62402 #1
Wed Aug 01 19:50:01 [initandlisten] connection accepted from 127.0.0.1:49177 #2
Wed Aug 01 19:50:01 [initandlisten] connection accepted from 98.113.167.34:42947 #3
Wed Aug 01 19:50:02 [conn1] end connection 98.113.167.34:62402
Wed Aug 01 19:50:02 [conn3] end connection 98.113.167.34:42947

We also tried to delete the journal folder and re-install the MongoDB service. We are running on Windows Server 2008.

Comment by Andy Schwerin [ 01/Aug/12 ]

@Eugene, can you supply version information about the version of Mongo you're using? Attach mongo.log files?

Comment by Eugene Shustef [ 01/Aug/12 ]

Just noticed that 10-20 minutes prior to when we noticed the issue, we saw this in the log: VirtualProtect failed (mcw) D:/mongo/troidb.136 849284b09a0000 3660000 errno:1455 The paging file is too small for this operation to complete.

Generated at Thu Feb 08 03:12:24 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.