[SERVER-6682] Can't write to ANY collection in the MongoDB instance Created: 01/Aug/12 Updated: 08/Mar/13 Resolved: 28/Sep/12 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 2.0.6 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Eugene Shustef | Assignee: | Ben Becker |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Operating System: | ALL |
| Participants: |
| Description |
|
We are trying to write to any collection in our database and are unable to do so. Log file shows no entries. Even trying to change log level to 2 does not work. The query just continues to run (active, but no result). We've been using this database for a long time and have never encountered this issue. |
| Comments |
| Comment by Ben Becker [ 08/Sep/12 ] |
|
Hi Eugene, I just wanted to follow up to see if you have encountered any similar issues since upgrading to v2.0.7-rc1. Also, were you able to determine if any system events were logged when this was encountered (e.g. due to low disk space)? Thanks, |
| Comment by Ben Becker [ 02/Aug/12 ] |
|
Thanks, that makes sense and seems to line up with "Your system is low on virtual memory. Windows is increasing the size of your virtual memory paging file. During this process, memory requests for some applications may be denied" I believe that in most cases the mongod.exe process will start responding again once there is more room in the commit pool, so I'm curious if there were any other related messages from the OS at the time of the VirtualProtect failure. |
| Comment by Eugene Shustef [ 02/Aug/12 ] |
|
I think we figured out that the drive where the service resides had less than 1 GB left (actual mongo files are on a different drive). The page file size is the standard Windows Server size (just over 8 GB). BTW, even if you run --nojournal (sp?) option, repair won't work. I don't remember the exact message. |
| Comment by Ben Becker [ 02/Aug/12 ] |
|
Hi Eugene, If you're referring to The system events around the time VirtualProtect failed would be very helpful to confirm the root cause. Any additional details about system settings (e.g. custom page file size, network attached storage, virtualization, etc.) would also be very helpful. |
| Comment by Eugene Shustef [ 02/Aug/12 ] |
|
I didn't have enough space for the repair option on the same drive. Couldn't use --repairpath because we are using journaling and apparently you can't repair using a different drive when journaling is turned on. There was a jira case from Nov. 2011 talking about this. |
| Comment by Ben Becker [ 02/Aug/12 ] |
|
Hi Eugene, To get back up and running, I would recommend repairing the database. For future reference, it would be best not to manually remove the journal files. If time permits, before running the repair, could you run the 'db.foo.validate( {full:true})' command on each collection and paste the output here? Also, could you check in the event viewer for any system events that may have occurred around the time of this failure? Thanks, |
| Comment by Eugene Shustef [ 02/Aug/12 ] |
|
FYI... we had to take the old db offline. Created a new replica set (all members running 2.0.7 rc1) with a brand new database. Just going to migrate the data out of the old database since we can still read. Would still love to find out what happened. |
| Comment by Eugene Shustef [ 01/Aug/12 ] |
|
So, the question is how to get our current database back up. |
| Comment by Tad Marshall [ 01/Aug/12 ] |
|
The first problem you hit is that in version 2.0.6, mongod.exe consumes page file space when journaling is enabled (as it is by default for 64-bit version). A second problem is that when mongod.exe runs out of page file space it really should just shut down, since it can't safely maintain the journal any more, but version 2.0.6 keeps running anyway. Both of these problems are fixed in 2.0.7, which is currently in release candidate state (2.0.7-rc1 is the current one a this moment). The first problem is documented in |
| Comment by Eugene Shustef [ 01/Aug/12 ] |
|
We have upgraded to 2.0.7. We believe the problem is writing to new records, because are able to update existing old records. This is most likely caused by corruption in the db file that was created earlier today by the database. |
| Comment by Andy Schwerin [ 01/Aug/12 ] |
|
This looks like an instance of |
| Comment by Eugene Shustef [ 01/Aug/12 ] |
|
It's version 2.0.6. Here is the latest log content since restart: Wed Aug 01 19:49:53 [initandlisten] MongoDB starting : pid=1632 port=27017 dbpath=d:\mongo 64-bit host=ip-0A32EB4B Wed Aug 01 19:49:53 [initandlisten] journal dir=d:/mongo/journal We also tried to delete the journal folder and re-install the MongoDB service. We are running on Windows Server 2008. |
| Comment by Andy Schwerin [ 01/Aug/12 ] |
|
@Eugene, can you supply version information about the version of Mongo you're using? Attach mongo.log files? |
| Comment by Eugene Shustef [ 01/Aug/12 ] |
|
Just noticed that 10-20 minutes prior to when we noticed the issue, we saw this in the log: VirtualProtect failed (mcw) D:/mongo/troidb.136 849284b09a0000 3660000 errno:1455 The paging file is too small for this operation to complete. |