[SERVER-3476] exception 13636 createPrivateMap failed when journal = true Created: 26/Jul/11 Updated: 13/Jul/16 Resolved: 27/Jul/11 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Storage |
| Affects Version/s: | 1.8.2 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Blocker - P1 |
| Reporter: | Kenn Ejima | Assignee: | Mathias Stearn |
| Resolution: | Done | Votes: | 1 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
64bit Debian 6 (2.6.39.1-x86_64) |
||
| Issue Links: |
|
||||||||||||
| Operating System: | Linux | ||||||||||||
| Participants: | |||||||||||||
| Description |
|
This morning we've been swamped by the following error messages: Tue Jul 26 11:28:22 [conn1899] insert pankia_production.articles exception 13636 createPrivateMap failed (look in log for error) 6ms It didn't crash, but some write operation failed consistently. Now we commented out "journal = true" line from /etc/mongodb.conf, and everything went back to normal, it seems. We've confirmed that if we set "journal = true" again, the above log start to appear again, so it's reproducible here. The server has 4GB RAM on 64bit Linux, and there are 6.4GB of files under /var/lib/mongodb/. If you need more info, let me know. |
| Comments |
| Comment by Eliot Horowitz (Inactive) [ 27/Jul/11 ] |
|
See |
| Comment by Kenn Ejima [ 27/Jul/11 ] |
|
Ok, I just set it to 1, and the immediate errors after restart are gone - we'll see. You should make clear in the doc that journaling requires overcommit - I believe many DBAs have a habit to set vm.overcommit_memory = 2 upon setup a new machine. I need to adjust myself to the new way. I'll update this ticket when the insertion error is back - if I don't, consider things are going well. |
| Comment by Mathias Stearn [ 27/Jul/11 ] |
|
Well journalling basically requires overcommit. We private map all of your data which counts as "committed" even if only ~100MB max is needed in ram at once. As to the full crash recovery issue, luckly this is only important with journalling on which means that you shouldn't need a full recovery, just a journal replay. |
| Comment by Kenn Ejima [ 27/Jul/11 ] |
|
Are you sure it's a good idea? My thought was that, as MongoDB tries to fully mmap its RAM and there aren't much free memory, it's likely that OOM killer wakes up when, say, another large process is invoked on the same machine (we have monit / munin running periodically, and run Rails console from time to time, which consumes 100MB+ of memory per instance), and OOM killer will choose MongoDB to kill forcibly because it allocates the largest chunk of memory. If that's the case, it's the worst scenario as it requires full crash recovery, particularly because we turned off journaling right now. I'd like to hear that my assumption here is wrong before change the setting - it's our production system. |
| Comment by Mathias Stearn [ 27/Jul/11 ] |
|
try setting that to either 0 or 1. from man 5 proc: 0: heuristic overcommit (this is the default) |
| Comment by Mathias Stearn [ 26/Jul/11 ] |
|
try setting that to either 0 or 1. from man 5 proc: 0: heuristic overcommit (this is the default) |
| Comment by Kenn Ejima [ 26/Jul/11 ] |
|
2. Also we've set the following in /etc/sysctl.conf. vm.swappiness = 0 |
| Comment by Mathias Stearn [ 26/Jul/11 ] |
|
what is the value of /proc/sys/vm/overcommit_memory? |
| Comment by Kenn Ejima [ 26/Jul/11 ] |
|
For the record, the errors that immediately appeared after restart were slightly different: Tue Jul 26 12:26:12 [conn10] assertion 13636 createPrivateMap failed (look in log for error) ns:pankia_production.system.namespaces query:{} So, it's not an insertion error as I reported originally - the immediate errors seem to happen while Rails servers reconnect to the MongoDB. It's possible they are separate issues (but hitting the same error). |
| Comment by Kenn Ejima [ 26/Jul/11 ] |
|
Sure. Before the error (without journal = true):
> db.serverStatus() , }, , , , }, , , , , , During the error (set journal = true, then restart):
> db.serverStatus() , }, , , , }, , , , , , }, |
| Comment by Scott Hernandez (Inactive) [ 26/Jul/11 ] |
|
Can you include "free -ltm" and db.serverStatus() right before and during this error, if you can reproduce it. |