[SERVER-3476] exception 13636 createPrivateMap failed when journal = true Created: 26/Jul/11  Updated: 13/Jul/16  Resolved: 27/Jul/11

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: 1.8.2
Fix Version/s: None

Type: Bug Priority: Blocker - P1
Reporter: Kenn Ejima Assignee: Mathias Stearn
Resolution: Done Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

64bit Debian 6 (2.6.39.1-x86_64)


Issue Links:
Duplicate
duplicates SERVER-3489 check for overcommit_memory settings Closed
Related
Operating System: Linux
Participants:

 Description   

This morning we've been swamped by the following error messages:

Tue Jul 26 11:28:22 [conn1899] insert pankia_production.articles exception 13636 createPrivateMap failed (look in log for error) 6ms
Tue Jul 26 11:28:22 [conn1892] ERROR: mmap private failed with out of memory. (64 bit build)
Tue Jul 26 11:28:22 [conn1892] Assertion: 13636:createPrivateMap failed (look in log for error)
0x55f5aa 0x629030 0x62999c 0x7826aa 0x769d0b 0x76a0aa 0x76a2a0 0x779aee 0x77b4cf 0x7595bd 0x75b23f 0x8a8fce 0x8bb630 0x7fab0e45e8ba 0x7fab0da1a02d
/usr/bin/mongod(_ZN5mongo11msgassertedEiPKc+0x12a) [0x55f5aa]
/usr/bin/mongod(_ZN5mongo8MongoMMF13finishOpeningEv+0x60) [0x629030]
/usr/bin/mongod(_ZN5mongo8MongoMMF6createESsRyb+0x6c) [0x62999c]
/usr/bin/mongod(_ZN5mongo13MongoDataFile4openEPKcib+0x13a) [0x7826aa]
/usr/bin/mongod(_ZN5mongo8Database7getFileEiib+0x17b) [0x769d0b]
/usr/bin/mongod(_ZN5mongo8Database12suitableFileEib+0x3a) [0x76a0aa]
/usr/bin/mongod(_ZN5mongo8Database11allocExtentEPKcib+0x70) [0x76a2a0]
/usr/bin/mongod(_ZN5mongo11DataFileMgr6insertEPKcPKvibRKNS_11BSONElementEb+0x98e) [0x779aee]
/usr/bin/mongod(_ZN5mongo11DataFileMgr16insertWithObjModEPKcRNS_7BSONObjEb+0x5f) [0x77b4cf]
/usr/bin/mongod(_ZN5mongo14receivedInsertERNS_7MessageERNS_5CurOpE+0x37d) [0x7595bd]
/usr/bin/mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_8SockAddrE+0x192f) [0x75b23f]
/usr/bin/mongod(_ZN5mongo10connThreadEPNS_13MessagingPortE+0x21e) [0x8a8fce]
/usr/bin/mongod(thread_proxy+0x80) [0x8bb630]
/lib/libpthread.so.0(+0x68ba) [0x7fab0e45e8ba]
/lib/libc.so.6(clone+0x6d) [0x7fab0da1a02d]

It didn't crash, but some write operation failed consistently.

Now we commented out "journal = true" line from /etc/mongodb.conf, and everything went back to normal, it seems. We've confirmed that if we set "journal = true" again, the above log start to appear again, so it's reproducible here.

The server has 4GB RAM on 64bit Linux, and there are 6.4GB of files under /var/lib/mongodb/.

If you need more info, let me know.



 Comments   
Comment by Eliot Horowitz (Inactive) [ 27/Jul/11 ]

See SERVER-3489

Comment by Kenn Ejima [ 27/Jul/11 ]

Ok, I just set it to 1, and the immediate errors after restart are gone - we'll see.

You should make clear in the doc that journaling requires overcommit - I believe many DBAs have a habit to set vm.overcommit_memory = 2 upon setup a new machine. I need to adjust myself to the new way.

I'll update this ticket when the insertion error is back - if I don't, consider things are going well.

Comment by Mathias Stearn [ 27/Jul/11 ]

Well journalling basically requires overcommit. We private map all of your data which counts as "committed" even if only ~100MB max is needed in ram at once.

As to the full crash recovery issue, luckly this is only important with journalling on which means that you shouldn't need a full recovery, just a journal replay.

Comment by Kenn Ejima [ 27/Jul/11 ]

Are you sure it's a good idea?

My thought was that, as MongoDB tries to fully mmap its RAM and there aren't much free memory, it's likely that OOM killer wakes up when, say, another large process is invoked on the same machine (we have monit / munin running periodically, and run Rails console from time to time, which consumes 100MB+ of memory per instance), and OOM killer will choose MongoDB to kill forcibly because it allocates the largest chunk of memory.

If that's the case, it's the worst scenario as it requires full crash recovery, particularly because we turned off journaling right now. I'd like to hear that my assumption here is wrong before change the setting - it's our production system.

Comment by Mathias Stearn [ 27/Jul/11 ]

try setting that to either 0 or 1.

from man 5 proc:

0: heuristic overcommit (this is the default)
1: always overcommit, never check
2: always check, never overcommit

Comment by Mathias Stearn [ 26/Jul/11 ]

try setting that to either 0 or 1.

from man 5 proc:

0: heuristic overcommit (this is the default)
1: always overcommit, never check
2: always check, never overcommit

Comment by Kenn Ejima [ 26/Jul/11 ]

2.

Also we've set the following in /etc/sysctl.conf.

vm.swappiness = 0
vm.overcommit_memory = 2
vm.overcommit_ratio = 99

Comment by Mathias Stearn [ 26/Jul/11 ]

what is the value of /proc/sys/vm/overcommit_memory?

Comment by Kenn Ejima [ 26/Jul/11 ]

For the record, the errors that immediately appeared after restart were slightly different:

Tue Jul 26 12:26:12 [conn10] assertion 13636 createPrivateMap failed (look in log for error) ns:pankia_production.system.namespaces query:{}
Tue Jul 26 12:26:12 [conn10] end connection 192.168.12.34:59324

So, it's not an insertion error as I reported originally - the immediate errors seem to happen while Rails servers reconnect to the MongoDB. It's possible they are separate issues (but hitting the same error).

Comment by Kenn Ejima [ 26/Jul/11 ]

Sure.
Note that this problem immediately start to occur after reboot with the journal = true setting.

Before the error (without journal = true):

  1. free -ltm
    total used free shared buffers cached
    Mem: 3945 3914 31 0 135 3567
    Low: 3945 3914 31
    High: 0 0 0
    -/+ buffers/cache: 211 3734
    Swap: 255 0 255
    Total: 4201 3914 287

> db.serverStatus()
{
"host" : "mongo",
"version" : "1.8.2",
"process" : "mongod",
"uptime" : 1993,
"uptimeEstimate" : 1978,
"localTime" : ISODate("2011-07-26T12:22:51.212Z"),
"globalLock" : {
"totalTime" : 1992693214,
"lockTime" : 5188938,
"ratio" : 0.0026039823709662112,
"currentQueue" :

{ "total" : 0, "readers" : 0, "writers" : 0 }

,
"activeClients" :

{ "total" : 0, "readers" : 0, "writers" : 0 }

},
"mem" :

{ "bits" : 64, "resident" : 530, "virtual" : 4395, "supported" : true, "mapped" : 4047 }

,
"connections" :

{ "current" : 18, "available" : 801 }

,
"extra_info" :

{ "note" : "fields vary by platform", "heap_usage_bytes" : 2536512, "page_faults" : 949 }

,
"indexCounters" : {
"btree" :

{ "accesses" : 466, "hits" : 466, "misses" : 0, "resets" : 0, "missRatio" : 0 }

},
"backgroundFlushing" :

{ "flushes" : 33, "total_ms" : 168, "average_ms" : 5.090909090909091, "last_ms" : 5, "last_finished" : ISODate("2011-07-26T12:22:38.529Z") }

,
"cursors" :

{ "totalOpen" : 0, "clientCursors_size" : 0, "timedOut" : 0 }

,
"network" :

{ "bytesIn" : 25162841, "bytesOut" : 64681025, "numRequests" : 320346 }

,
"opcounters" :

{ "insert" : 1494, "query" : 310784, "update" : 2707, "delete" : 406, "getmore" : 0, "command" : 9020 }

,
"asserts" :

{ "regular" : 0, "warning" : 0, "msg" : 0, "user" : 0, "rollovers" : 0 }

,
"writeBacksQueued" : false,
"ok" : 1
}

During the error (set journal = true, then restart):

  1. free -ltm
    total used free shared buffers cached
    Mem: 3945 3902 42 0 135 3567
    Low: 3945 3902 42
    High: 0 0 0
    -/+ buffers/cache: 199 3746
    Swap: 255 0 255
    Total: 4201 3902 298

> db.serverStatus()
{
"host" : "mongo",
"version" : "1.8.2",
"process" : "mongod",
"uptime" : 133,
"uptimeEstimate" : 131,
"localTime" : ISODate("2011-07-26T12:27:47.479Z"),
"globalLock" : {
"totalTime" : 133022016,
"lockTime" : 370294,
"ratio" : 0.002783704616234353,
"currentQueue" :

{ "total" : 0, "readers" : 0, "writers" : 0 }

,
"activeClients" :

{ "total" : 0, "readers" : 0, "writers" : 0 }

},
"mem" :

{ "bits" : 64, "resident" : 4, "virtual" : 177, "supported" : true, "mapped" : 0 }

,
"connections" :

{ "current" : 1, "available" : 818 }

,
"extra_info" :

{ "note" : "fields vary by platform", "heap_usage_bytes" : 206176, "page_faults" : 0 }

,
"indexCounters" : {
"btree" :

{ "accesses" : 0, "hits" : 0, "misses" : 0, "resets" : 0, "missRatio" : 0 }

},
"backgroundFlushing" :

{ "flushes" : 2, "total_ms" : 0, "average_ms" : 0, "last_ms" : 0, "last_finished" : ISODate("2011-07-26T12:27:34.458Z") }

,
"cursors" :

{ "totalOpen" : 0, "clientCursors_size" : 0, "timedOut" : 0 }

,
"network" :

{ "bytesIn" : 3285, "bytesOut" : 4981, "numRequests" : 53 }

,
"opcounters" :

{ "insert" : 0, "query" : 18, "update" : 0, "delete" : 0, "getmore" : 0, "command" : 37 }

,
"asserts" :

{ "regular" : 0, "warning" : 17, "msg" : 0, "user" : 0, "rollovers" : 0 }

,
"writeBacksQueued" : false,
"dur" : {
"commits" : 0,
"journaledMB" : 0,
"writeToDataFilesMB" : 0,
"commitsInWriteLock" : 0,
"earlyCommits" : 0,
"timeMs" :

{ "dt" : 3006, "prepLogBuffer" : 0, "writeToJournal" : 0, "writeToDataFiles" : 0, "remapPrivateView" : 0 }

},
"ok" : 1
}

Comment by Scott Hernandez (Inactive) [ 26/Jul/11 ]

Can you include "free -ltm" and db.serverStatus() right before and during this error, if you can reproduce it.

Generated at Thu Feb 08 03:03:10 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.