[SERVER-19339] crash with this log Created: 09/Jul/15  Updated: 31/Jul/15  Resolved: 21/Jul/15

Status: Closed
Project: Core Server
Component/s: Replication, WiredTiger
Affects Version/s: 3.0.4
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: lnxcym Assignee: Sam Kleinman (Inactive)
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Red Hat Enterprise Linux 6.2


Issue Links:
Duplicate
duplicates SERVER-18829 Cache usage exceeds configured maximu... Closed
Related
related to SERVER-19673 Excessive memory allocated by WiredTi... Closed
Operating System: Linux
Steps To Reproduce:

we do nothing~

Participants:

 Description   

2015-07-09T11:33:38.896+0800 I CONTROL  [rsSync]
 0xf77369 0xf140f1 0xef99fa 0xd8a55c 0xd8a66a 0x908a2c 0xc3ea7a 0xcaabfd 0xcabdd8 0xca43af 0xfc58e4 0x3ae5e07851 0x3ae5ae767d
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"B77369"},{"b":"400000","o":"B140F1"},{"b":"400000","o":"AF99FA"},{"b":"400000","o":"98A55C"},{"b":"400000","o":"98A66A"},{"b":"400000","o":"508A2C"},{"b":"400000","o":"83EA7A"},{"b":"400000","o":"8AABFD"},{"b":"400000","o":"8ABDD8"},{"b":"400000","o":"8A43AF"},{"b":"400000","o":"BC58E4"},{"b":"3AE5E00000","o":"7851"},{"b":"3AE5A00000","o":"E767D"}],"processInfo":{ "mongodbVersion" : "3.0.4", "gitVersion" : "0481c958daeb2969800511e7475dc66986fa9ed5", "uname" : { "sysname" : "Linux", "release" : "2.6.32-279.el6.x86_64", "version" : "#1 SMP Fri Jun 22 12:19:21 UTC 2012", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "7702B4740E91E1BA6F701DB2A42553968AC09562" }, { "b" : "7FFF87E7A000", "elfType" : 3, "buildId" : "801B9608DAA2CD5F7035AD415E9C7DD06EBDB0A2" }, { "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "48A9F8600F0A15F6418EDE25846C324EC8891DD4" }, { "path" : "/usr/lib64/libssl.so.10", "elfType" : 3, "buildId" : "B06F7B61A75BD941A6D9E36B2DC1CDCB4183D706" }, { "path" : "/usr/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "0E46D8ED406D53C9A553C20859CD4679928AE7C0" }, { "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "1574F9C2DDAECEE537C45143BB79E8D61BED98FE" }, { "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "0B4FE52FE93C0B9894775AFDD53E2DF9D3C2839A" }, { "path" : "/usr/lib64/libstdc++.so.6", "elfType" : 3, "buildId" : "1A4BC78E7DA0FA025262D516D00E04AFD1B0F429" }, { "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "BF14593D7915402AA62C2573FCCDB252AEEBF754" }, { "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "CE152B8676517F23E7F54AD6408330979BE41443" }, { "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "4C37E276CD23F0877AF7587ED1F699AADB741EC9" }, { "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "42AEFAFC23375DC250C49C420C37EDC4515B9C02" }, { "path" : "/lib64/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "76A3DEEB6876CBED69A57D3EBC1E2AFBCA84EC76" }, { "path" : "/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "A4D98A7611F0CDBEA8A4FEDF36BDA25E772F387E" }, { "path" : "/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "4623A78918C882770E81AE7B5EE9DDF8DD2B6674" }, { "path" : "/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "190D45F6743DEF9DF8169D65801D4575B01825BD" }, { "path" : "/lib64/libz.so.1", "elfType" : 3, "buildId" : "5FA8E5038EC04A774AF72A9BB62DC86E1049C4D6" }, { "path" : "/lib64/libkrb5support.so.0", "elfType" : 3, "buildId" : "DAE2A7E4E8B37D43EF6839FF5D8E012AFCF21A69" }, { "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "8A8734DC37305D8CC2EF8F8C3E5EA03171DB07EC" }, { "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "8217E68E5C9D964CDF500F488B2A183F870F36B2" }, { "path" : "/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "BAD5C71361DADF259B6E306A49E6F47F24AEA3DC" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x29) [0xf77369]
 mongod(_ZN5mongo10logContextEPKc+0xE1) [0xf140f1]
 mongod(_ZN5mongo17invariantOKFailedEPKcRKNS_6StatusES1_j+0xDA) [0xef99fa]
 mongod(_ZN5mongo22WiredTigerRecoveryUnit9_txnCloseEb+0x2EC) [0xd8a55c]
 mongod(_ZN5mongo22WiredTigerRecoveryUnit7_commitEv+0x8A) [0xd8a66a]
 mongod(_ZN5mongo15WriteUnitOfWork6commitEv+0x1C) [0x908a2c]
 mongod(_ZN5mongo4repl15writeOpsToOplogEPNS_16OperationContextERKSt5dequeINS_7BSONObjESaIS4_EE+0x26A) [0xc3ea7a]
 mongod(_ZN5mongo4repl8SyncTail10multiApplyEPNS_16OperationContextERSt5dequeINS_7BSONObjESaIS5_EE+0x4DD) [0xcaabfd]
 mongod(_ZN5mongo4repl8SyncTail16oplogApplicationEv+0x118) [0xcabdd8]
 mongod(_ZN5mongo4repl13runSyncThreadEv+0x1EF) [0xca43af]
 mongod(+0xBC58E4) [0xfc58e4]
 libpthread.so.0(+0x7851) [0x3ae5e07851]
 libc.so.6(clone+0x6D) [0x3ae5ae767d]
-----  END BACKTRACE  -----
2015-07-09T11:33:38.896+0800 I -        [rsSync]
 
***aborting after invariant() failure



 Comments   
Comment by Michael Cahill (Inactive) [ 21/Jul/15 ]

On further investigation, I now believe this is the same issue as SERVER-18829. During index builds, WiredTiger could use more memory than the allocated cache size. That issue will be fixed by the upcoming 3.0.5 release.

The particular operation that is failing here is writing a batch of updates to the oplog. That has to allocate a buffer to serialize the journal record, which can fail with ENOMEM under the conditions that trigger SERVER-18829.

Again, I am sorry that you have hit this problem with MongoDB and it should be fixed by 3.0.5.

Comment by Sam Kleinman (Inactive) [ 13/Jul/15 ]

The log you sent is from the day after the crash. We're looking for a few minutes of logs from before 2015-07-09T11:33:36.808+0800. Do you still have access to these logs, and if so could you attach these logs to the ticket?

In general, could you please attach all information related to this ticket rather than sending emails: this will allow us to consult our colleges and maintain a record of the work we do here. Emails are hard for us to track and distribute. Thanks for understanding.

Regards,
sam

Comment by lnxcym [ 13/Jul/15 ]

Michael Cahill

Thank you for your help, I send the whole mongod.log to your email~~

Comment by Michael Cahill (Inactive) [ 13/Jul/15 ]

Thank you for the extra information. We have looked through what we have and unfortunately we still don't understand the root cause.

What we know is that the WiredTiger call to commit a transaction returned ENOMEM. We have analyzed the places in WiredTiger that we think could return ENOMEM and all of them should have generated a log message.

I notice that we have less than 1 second of log before the "Invariant failure" failure message. Can we please get logs from 20 seconds before that message? The lines we care about should have the [rsSync] tag.

If we can't see anything in the log and the problem doesn't reproduce, then I don't think we will be able to make progress. Any call that allocates memory can fail with ENOMEM so unless we can find evidence of a WiredTiger bug, I think we will have to put this down to an out-of-memory condition.

Comment by lnxcym [ 10/Jul/15 ]

Michael Cahill

I send a email to you about the crash time the system memory and mongodb status with MMS

this machine has 24G memory.

Comment by Michael Cahill (Inactive) [ 10/Jul/15 ]

The key piece of information is this one:

2015-07-09T11:33:36.808+0800 I - [rsSync] Invariant failure: s->commit_transaction(s, NULL) resulted in status UnknownError 12: Cannot allocate memory at src/mongo/db/storage/wiredtiger/wiredtiger_recovery_unit.cpp 305

I suspect that if we could get a few seconds more of the log, we could see where in WiredTiger this is coming from. The most likely candidate is that WiredTiger allocates memory to hold the log record as part of a commit, and the system may genuinely have been out of memory.

Is there any other information from the crash (apart from more lines from the log) that could confirm whether the amount of free memory was very low?

Comment by lnxcym [ 10/Jul/15 ]

5. .:ls -l for both the MongoDB dbPath and the journal directory:

like this and i only intercepted the front part of the content

total 660K
drwxr-xr-x 3 root root 656K Jul 10 10:09 data
drwxr-xr-x 2 root root 4.0K Jul 10 00:01 log
lrwxrwxrwx 1 root root   48 Jul  3 05:05 bin -> /usr/local/mongodb-linux-x86_64-rhel62-3.0.4/bin

./data:
total 76G

-rw-r--r-- 1 root root   25G Jul 10 10:09 collection-6-4655405084670776749.wt
-rw-r--r-- 1 root root   53M Jul 10 10:09 index-244204-4655405084670776749.wt
-rw-r--r-- 1 root root   86M Jul 10 10:09 index-39052--6656083566520963234.wt
-rw-r--r-- 1 root root  102M Jul 10 10:09 index-39437--6656083566520963234.wt
-rw-r--r-- 1 root root   52M Jul 10 10:09 index-157693-4655405084670776749.wt
-rw-r--r-- 1 root root   30M Jul 10 10:09 index-48-4655405084670776749.wt
-rw-r--r-- 1 root root   913 Jul 10 10:09 WiredTiger.turtle
-rw-r--r-- 1 root root   23M Jul 10 10:09 WiredTiger.wt
-rw-r--r-- 1 root root   36K Jul 10 10:09 collection-7-4655405084670776749.wt
-rw-r--r-- 1 root root   70M Jul 10 10:09 collection-2045-4655405084670776749.wt
-rw-r--r-- 1 root root  6.6M Jul 10 10:09 index-2046-4655405084670776749.wt
-rw-r--r-- 1 root root   17M Jul 10 10:09 index-6038-4655405084670776749.wt
-rw-r--r-- 1 root root  1.6M Jul 10 10:09 index-65440-4655405084670776749.wt
-rw-r--r-- 1 root root  4.8M Jul 10 10:09 index-65441-4655405084670776749.wt
-rw-r--r-- 1 root root  5.8M Jul 10 10:09 index-65442-4655405084670776749.wt
-rw-r--r-- 1 root root   28M Jul 10 10:09 index-118330-4655405084670776749.wt
-rw-r--r-- 1 root root   32M Jul 10 10:09 index-262099-4655405084670776749.wt
-rw-r--r-- 1 root root   22M Jul 10 10:09 index-173430--6656083566520963234.wt
-rw-r--r-- 1 root root  2.5M Jul 10 10:09 index-50328--4392599446128106349.wt
-rw-r--r-- 1 root root   28M Jul 10 10:09 index-31738-8457798153372257414.wt
-rw-r--r-- 1 root root   88K Jul 10 10:09 collection-143056-4655405084670776749.wt
-rw-r--r-- 1 root root   22M Jul 10 10:09 collection-136485--6656083566520963234.wt
-rw-r--r-- 1 root root  139M Jul 10 10:09 collection-157692-4655405084670776749.wt
-rw-r--r-- 1 root root   30M Jul 10 10:09 index-267383-4655405084670776749.wt
-rw-r--r-- 1 root root   19M Jul 10 10:09 collection-138097--6656083566520963234.wt
-rw-r--r-- 1 root root  308M Jul 10 10:09 collection-71475--6656083566520963234.wt
-rw-r--r-- 1 root root   98M Jul 10 10:09 index-71476--6656083566520963234.wt
-rw-r--r-- 1 root root  496M Jul 10 10:09 collection-154721-4655405084670776749.wt
-rw-r--r-- 1 root root  151M Jul 10 10:09 index-154722-4655405084670776749.wt
-rw-r--r-- 1 root root  147M Jul 10 10:09 index-155836-4655405084670776749.wt
-rw-r--r-- 1 root root   29M Jul 10 10:09 index-157484-4655405084670776749.wt
-rw-r--r-- 1 root root   90M Jul 10 10:09 index-157485-4655405084670776749.wt
-rw-r--r-- 1 root root   89M Jul 10 10:09 index-157486-4655405084670776749.wt
-rw-r--r-- 1 root root   88M Jul 10 10:09 index-204100-4655405084670776749.wt
-rw-r--r-- 1 root root  327M Jul 10 10:09 index-204101-4655405084670776749.wt
-rw-r--r-- 1 root root   98M Jul 10 10:09 collection-276269-4655405084670776749.wt
-rw-r--r-- 1 root root  1.4M Jul 10 10:09 index-276270-4655405084670776749.wt
-rw-r--r-- 1 root root  316K Jul 10 10:09 index-276271-4655405084670776749.wt
-rw-r--r-- 1 root root  1.1M Jul 10 10:09 index-276272-4655405084670776749.wt
-rw-r--r-- 1 root root  1.1M Jul 10 10:09 index-276273-4655405084670776749.wt
-rw-r--r-- 1 root root  2.2M Jul 10 10:09 index-276278-4655405084670776749.wt
-rw-r--r-- 1 root root  348K Jul 10 10:09 index-49939--4392599446128106349.wt
-rw-r--r-- 1 root root  996K Jul 10 10:09 index-49940--4392599446128106349.wt
-rw-r--r-- 1 root root  1.4M Jul 10 10:09 index-49941--4392599446128106349.wt
-rw-r--r-- 1 root root  1.9M Jul 10 10:09 index-49942--4392599446128106349.wt
-rw-r--r-- 1 root root  1.6M Jul 10 10:09 index-27068-8457798153372257414.wt
-rw-r--r-- 1 root root   55M Jul 10 10:09 collection-136483--6656083566520963234.wt
-rw-r--r-- 1 root root  2.6M Jul 10 10:09 index-136484--6656083566520963234.wt
-rw-r--r-- 1 root root  1.7M Jul 10 10:09 index-45499--4392599446128106349.wt
-rw-r--r-- 1 root root  2.5M Jul 10 10:09 index-81270--4392599446128106349.wt
-rw-r--r-- 1 root root  7.8M Jul 10 10:09 index-81271--4392599446128106349.wt
-rw-r--r-- 1 root root  3.0M Jul 10 10:09 index-81273--4392599446128106349.wt
-rw-r--r-- 1 root root  1.6M Jul 10 10:09 index-27830-8457798153372257414.wt
-rw-r--r-- 1 root root  2.6M Jul 10 10:09 index-27831-8457798153372257414.wt
-rw-r--r-- 1 root root  375M Jul 10 10:09 collection-47-4655405084670776749.wt
-rw-r--r-- 1 root root   21M Jul 10 10:09 index-65190-4655405084670776749.wt
-rw-r--r-- 1 root root   59M Jul 10 10:09 collection-367-4655405084670776749.wt
-rw-r--r-- 1 root root   13M Jul 10 10:09 index-368-4655405084670776749.wt
-rw-r--r-- 1 root root   23M Jul 10 10:09 index-5781-4655405084670776749.wt
-rw-r--r-- 1 root root  3.0M Jul 10 10:09 index-65203-4655405084670776749.wt
-rw-r--r-- 1 root root  9.6M Jul 10 10:09 index-65204-4655405084670776749.wt
-rw-r--r-- 1 root root  9.0M Jul 10 10:09 index-65205-4655405084670776749.wt
-rw-r--r-- 1 root root   16M Jul 10 10:09 collection-88857--4392599446128106349.wt
-rw-r--r-- 1 root root  2.8M Jul 10 10:09 index-88858--4392599446128106349.wt
-rw-r--r-- 1 root root  2.2M Jul 10 10:09 index-88859--4392599446128106349.wt
-rw-r--r-- 1 root root  2.8M Jul 10 10:09 index-89214--4392599446128106349.wt
-rw-r--r-- 1 root root   16M Jul 10 10:09 collection-88834--4392599446128106349.wt
-rw-r--r-- 1 root root  1.1M Jul 10 10:09 collection-547-4655405084670776749.wt
-rw-r--r-- 1 root root   49M Jul 10 10:09 collection-75927--4392599446128106349.wt
-rw-r--r-- 1 root root  100K Jul 10 10:09 index-65038-4655405084670776749.wt
-rw-r--r-- 1 root root  984K Jul 10 10:09 index-75928--4392599446128106349.wt
-rw-r--r-- 1 root root  2.3M Jul 10 10:09 index-75929--4392599446128106349.wt
-rw-r--r-- 1 root root  1.5M Jul 10 10:09 index-75970--4392599446128106349.wt
-rw-r--r-- 1 root root  2.3M Jul 10 10:09 index-18042-8457798153372257414.wt
-rw-r--r-- 1 root root   43M Jul 10 10:09 collection-136707--6656083566520963234.wt
-rw-r--r-- 1 root root  964K Jul 10 10:09 index-26168-8457798153372257414.wt
-rw-r--r-- 1 root root  3.1M Jul 10 10:09 index-136708--6656083566520963234.wt
-rw-r--r-- 1 root root  2.7M Jul 10 10:09 index-81279--4392599446128106349.wt
-rw-r--r-- 1 root root   60K Jul 10 10:09 collection-107719-4655405084670776749.wt
-rw-r--r-- 1 root root   40K Jul 10 10:09 collection-92145--6656083566520963234.wt
-rw-r--r-- 1 root root  700K Jul 10 10:09 collection-1101-4655405084670776749.wt
-rw-r--r-- 1 root root  120K Jul 10 10:09 index-65186-4655405084670776749.wt
-rw-r--r-- 1 root root   64K Jul 10 10:09 collection-1181-4655405084670776749.wt
-rw-r--r-- 1 root root   68K Jul 10 10:09 collection-74617--4392599446128106349.wt
-rw-r--r-- 1 root root  1.8G Jul 10 10:09 collection-17-4655405084670776749.wt
-rw-r--r-- 1 root root  116M Jul 10 10:09 index-18-4655405084670776749.wt
-rw-r--r-- 1 root root  162M Jul 10 10:09 index-5785-4655405084670776749.wt
-rw-r--r-- 1 root root   47M Jul 10 10:09 index-64397-4655405084670776749.wt
-rw-r--r-- 1 root root  117M Jul 10 10:09 index-64398-4655405084670776749.wt
-rw-r--r-- 1 root root  118M Jul 10 10:09 index-64399-4655405084670776749.wt
-rw-r--r-- 1 root root  120M Jul 10 10:09 index-64400-4655405084670776749.wt
-rw-r--r-- 1 root root   83M Jul 10 10:09 index-64402-4655405084670776749.wt
-rw-r--r-- 1 root root   48M Jul 10 10:09 index-64403-4655405084670776749.wt
-rw-r--r-- 1 root root   50M Jul 10 10:09 index-64404-4655405084670776749.wt
-rw-r--r-- 1 root root   49M Jul 10 10:09 index-64405-4655405084670776749.wt
-rw-r--r-- 1 root root   48M Jul 10 10:09 index-64406-4655405084670776749.wt
-rw-r--r-- 1 root root   48M Jul 10 10:09 index-64407-4655405084670776749.wt
-rw-r--r-- 1 root root   49M Jul 10 10:09 index-64408-4655405084670776749.wt
-rw-r--r-- 1 root root  114M Jul 10 10:09 index-96307-4655405084670776749.wt
-rw-r--r-- 1 root root   48M Jul 10 10:09 index-109587-4655405084670776749.wt
-rw-r--r-- 1 root root  7.2M Jul 10 10:09 collection-74606--4392599446128106349.wt
-rw-r--r-- 1 root root   27M Jul 10 10:09 collection-17336-8457798153372257414.wt
-rw-r--r-- 1 root root   36K Jul 10 10:09 collection-74599--4392599446128106349.wt
-rw-r--r-- 1 root root  1.6M Jul 10 10:09 index-17988-8457798153372257414.wt
-rw-r--r-- 1 root root  1.5M Jul 10 10:09 index-18225-8457798153372257414.wt
-rw-r--r-- 1 root root  3.6M Jul 10 10:09 index-27659-8457798153372257414.wt
-rw-r--r-- 1 root root   30M Jul 10 10:09 collection-74595--4392599446128106349.wt
-rw-r--r-- 1 root root  1.9M Jul 10 10:09 index-74596--4392599446128106349.wt
-rw-r--r-- 1 root root  2.2M Jul 10 10:09 index-74597--4392599446128106349.wt
-rw-r--r-- 1 root root  4.1M Jul 10 10:09 index-74598--4392599446128106349.wt
-rw-r--r-- 1 root root  3.0M Jul 10 10:09 index-75179--4392599446128106349.wt
-rw-r--r-- 1 root root   13M Jul 10 10:09 collection-88832--4392599446128106349.wt
-rw-r--r-- 1 root root  3.9M Jul 10 10:09 index-88833--4392599446128106349.wt
-rw-r--r-- 1 root root  4.0M Jul 10 10:09 index-89222--4392599446128106349.wt
-rw-r--r-- 1 root root   33M Jul 10 10:09 collection-56462--4392599446128106349.wt
-rw-r--r-- 1 root root 1004K Jul 10 10:09 index-136486--6656083566520963234.wt
-rw-r--r-- 1 root root  2.1M Jul 10 10:09 index-81286--4392599446128106349.wt
-rw-r--r-- 1 root root  2.4M Jul 10 10:09 index-28012-8457798153372257414.wt
-rw-r--r-- 1 root root 1020K Jul 10 10:09 index-138098--6656083566520963234.wt
-rw-r--r-- 1 root root  2.3M Jul 10 10:09 index-81284--4392599446128106349.wt
-rw-r--r-- 1 root root   28M Jul 10 10:09 collection-45061--4392599446128106349.wt
-rw-r--r-- 1 root root  1.7M Jul 10 10:09 index-94492--4392599446128106349.wt
-rw-r--r-- 1 root root  1.1M Jul 10 10:09 index-94494--4392599446128106349.wt
-rw-r--r-- 1 root root  119M Jul 10 10:09 collection-45063--4392599446128106349.wt
-rw-r--r-- 1 root root  6.5M Jul 10 10:09 index-45064--4392599446128106349.wt
-rw-r--r-- 1 root root  6.3M Jul 10 10:09 index-45658--4392599446128106349.wt
-rw-r--r-- 1 root root  5.0M Jul 10 10:09 index-58185--4392599446128106349.wt
-rw-r--r-- 1 root root   18M Jul 10 10:09 index-58186--4392599446128106349.wt
-rw-r--r-- 1 root root  4.4M Jul 10 10:09 index-58188--4392599446128106349.wt
-rw-r--r-- 1 root root  6.4M Jul 10 10:09 index-58189--4392599446128106349.wt
-rw-r--r-- 1 root root  4.8M Jul 10 10:09 index-7964-8457798153372257414.wt
-rw-r--r-- 1 root root   18M Jul 10 10:09 index-8359-8457798153372257414.wt
-rw-r--r-- 1 root root  292K Jul 10 10:09 index-88557--4392599446128106349.wt
-rw-r--r-- 1 root root  972K Jul 10 10:09 index-74607--4392599446128106349.wt
-rw-r--r-- 1 root root  1.2M Jul 10 10:09 index-74608--4392599446128106349.wt
-rw-r--r-- 1 root root  216K Jul 10 10:09 index-74609--4392599446128106349.wt
-rw-r--r-- 1 root root  216K Jul 10 10:09 index-74610--4392599446128106349.wt
-rw-r--r-- 1 root root  216K Jul 10 10:09 index-74611--4392599446128106349.wt
-rw-r--r-- 1 root root  776K Jul 10 10:09 index-75186--4392599446128106349.wt
-rw-r--r-- 1 root root  1.2M Jul 10 10:09 index-21019-8457798153372257414.wt
-rw-r--r-- 1 root root  972K Jul 10 10:09 index-26842-8457798153372257414.wt
-rw-r--r-- 1 root root  2.6M Jul 10 10:09 index-27848-8457798153372257414.wt
-rw-r--r-- 1 root root  960K Jul 10 10:09 index-88835--4392599446128106349.wt
-rw-r--r-- 1 root root  2.2M Jul 10 10:09 index-88836--4392599446128106349.wt
-rw-r--r-- 1 root root  1.4M Jul 10 10:09 index-89219--4392599446128106349.wt
-rw-r--r-- 1 root root  552K Jul 10 10:09 collection-148726--6656083566520963234.wt
-rw-r--r-- 1 root root   80K Jul 10 10:09 index-148727--6656083566520963234.wt
-rw-r--r-- 1 root root   76K Jul 10 10:09 index-153822--6656083566520963234.wt
-rw-r--r-- 1 root root   68K Jul 10 10:09 index-153823--6656083566520963234.wt
-rw-r--r-- 1 root root  5.2M Jul 10 10:09 collection-207918--6656083566520963234.wt

Comment by lnxcym [ 10/Jul/15 ]

Dear Sam Kleinman
1. it only one member of the secondaries
2. mongod's Journaling is enable.
3. disk free space is enough and the disk is SSD (not samsung)

/dev/sda1             459G   76G  360G  18% /ssd_volume

4. more logs:

2015-07-09T11:33:35.901+0800 I NETWORK  [initandlisten] connection accepted from 10.0.0.32:41316 #66074 (1836 connections now open)
2015-07-09T11:33:36.121+0800 I NETWORK  [initandlisten] connection accepted from 10.0.0.32:41321 #66075 (1837 connections now open)
2015-07-09T11:33:36.178+0800 I NETWORK  [initandlisten] connection accepted from 10.0.0.31:59455 #66076 (1838 connections now open)
2015-07-09T11:33:36.187+0800 I NETWORK  [initandlisten] connection accepted from 10.0.0.32:41328 #66077 (1839 connections now open)
2015-07-09T11:33:36.301+0800 I COMMAND  [conn66077] command admin.$cmd command: isMaster { isMaster: 1 } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:363 locks:{} 113ms
2015-07-09T11:33:36.443+0800 I NETWORK  [initandlisten] connection accepted from 10.0.0.30:34022 #66078 (1840 connections now open)
2015-07-09T11:33:36.616+0800 I NETWORK  [initandlisten] connection accepted from 10.0.0.30:34039 #66079 (1841 connections now open)
2015-07-09T11:33:36.625+0800 I NETWORK  [initandlisten] connection accepted from 10.0.0.30:34040 #66080 (1842 connections now open)
2015-07-09T11:33:36.626+0800 I NETWORK  [initandlisten] connection accepted from 10.0.0.31:59480 #66081 (1843 connections now open)
2015-07-09T11:33:36.808+0800 I -        [rsSync] Invariant failure: s->commit_transaction(s, NULL) resulted in status UnknownError 12: Cannot allocate memory at src/mongo/db/storage/wiredtiger/wiredtiger_recovery_unit.cpp 305
2015-07-09T11:33:36.821+0800 I NETWORK  [initandlisten] connection accepted from 10.0.0.30:34044 #66082 (1844 connections now open)
2015-07-09T11:33:36.864+0800 I NETWORK  [initandlisten] connection accepted from 10.0.0.31:59492 #66083 (1845 connections now open)
2015-07-09T11:33:36.899+0800 I NETWORK  [initandlisten] connection accepted from 10.0.0.30:34049 #66084 (1846 connections now open)
2015-07-09T11:33:36.906+0800 I NETWORK  [initandlisten] connection accepted from 10.0.0.30:34051 #66085 (1847 connections now open)
2015-07-09T11:33:37.218+0800 I NETWORK  [initandlisten] connection accepted from 10.0.0.30:34058 #66086 (1848 connections now open)
2015-07-09T11:33:37.272+0800 I NETWORK  [initandlisten] connection accepted from 10.0.0.32:41357 #66087 (1849 connections now open)
2015-07-09T11:33:37.369+0800 I NETWORK  [initandlisten] connection accepted from 10.0.0.31:59505 #66088 (1850 connections now open)
2015-07-09T11:33:37.413+0800 I NETWORK  [initandlisten] connection accepted from 10.0.0.32:41361 #66089 (1851 connections now open)
2015-07-09T11:33:37.429+0800 I NETWORK  [initandlisten] connection accepted from 10.0.0.31:59514 #66090 (1852 connections now open)
2015-07-09T11:33:37.433+0800 I NETWORK  [initandlisten] connection accepted from 10.0.0.30:34071 #66091 (1853 connections now open)
2015-07-09T11:33:37.434+0800 I NETWORK  [initandlisten] connection accepted from 10.0.0.30:34072 #66092 (1854 connections now open)
2015-07-09T11:33:37.636+0800 I COMMAND  [conn66090] command admin.$cmd command: isMaster { isMaster: 1 } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:363 locks:{} 104ms
2015-07-09T11:33:37.636+0800 I COMMAND  [conn66089] command admin.$cmd command: isMaster { isMaster: 1 } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:363 locks:{} 104ms
2015-07-09T11:33:37.636+0800 I COMMAND  [conn66092] command admin.$cmd command: isMaster { isMaster: 1 } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:363 locks:{} 104ms
2015-07-09T11:33:37.668+0800 I NETWORK  [initandlisten] connection accepted from 10.0.0.30:34077 #66093 (1855 connections now open)
2015-07-09T11:33:37.770+0800 I NETWORK  [initandlisten] connection accepted from 10.0.0.32:41370 #66094 (1856 connections now open)
2015-07-09T11:33:38.450+0800 I NETWORK  [initandlisten] connection accepted from 10.0.0.30:34082 #66095 (1857 connections now open)
2015-07-09T11:33:38.616+0800 I NETWORK  [initandlisten] connection accepted from 10.0.0.32:41376 #66096 (1858 connections now open)
2015-07-09T11:33:38.632+0800 I NETWORK  [initandlisten] connection accepted from 10.0.0.32:41377 #66097 (1859 connections now open)
2015-07-09T11:33:38.828+0800 I NETWORK  [initandlisten] connection accepted from 10.0.0.31:59552 #66098 (1860 connections now open)
2015-07-09T11:33:38.829+0800 I NETWORK  [initandlisten] connection accepted from 10.0.0.31:59553 #66099 (1861 connections now open)
2015-07-09T11:33:38.896+0800 I CONTROL  [rsSync] 
 0xf77369 0xf140f1 0xef99fa 0xd8a55c 0xd8a66a 0x908a2c 0xc3ea7a 0xcaabfd 0xcabdd8 0xca43af 0xfc58e4 0x3ae5e07851 0x3ae5ae767d
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"B77369"},{"b":"400000","o":"B140F1"},{"b":"400000","o":"AF99FA"},{"b":"400000","o":"98A55C"},{"b":"400000","o":"98A66A"},{"b":"400000","o":"508A2C"},{"b":"400000","o":"83EA7A"},{"b":"400000","o":"8AABFD"},{"b":"400000","o":"8ABDD8"},{"b":"400000","o":"8A43AF"},{"b":"400000","o":"BC58E4"},{"b":"3AE5E00000","o":"7851"},{"b":"3AE5A00000","o":"E767D"}],"processInfo":{ "mongodbVersion" : "3.0.4", "gitVersion" : "0481c958daeb2969800511e7475dc66986fa9ed5", "uname" : { "sysname" : "Linux", "release" : "2.6.32-279.el6.x86_64", "version" : "#1 SMP Fri Jun 22 12:19:21 UTC 2012", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "7702B4740E91E1BA6F701DB2A42553968AC09562" }, { "b" : "7FFF87E7A000", "elfType" : 3, "buildId" : "801B9608DAA2CD5F7035AD415E9C7DD06EBDB0A2" }, { "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "48A9F8600F0A15F6418EDE25846C324EC8891DD4" }, { "path" : "/usr/lib64/libssl.so.10", "elfType" : 3, "buildId" : "B06F7B61A75BD941A6D9E36B2DC1CDCB4183D706" }, { "path" : "/usr/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "0E46D8ED406D53C9A553C20859CD4679928AE7C0" }, { "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "1574F9C2DDAECEE537C45143BB79E8D61BED98FE" }, { "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "0B4FE52FE93C0B9894775AFDD53E2DF9D3C2839A" }, { "path" : "/usr/lib64/libstdc++.so.6", "elfType" : 3, "buildId" : "1A4BC78E7DA0FA025262D516D00E04AFD1B0F429" }, { "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "BF14593D7915402AA62C2573FCCDB252AEEBF754" }, { "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "CE152B8676517F23E7F54AD6408330979BE41443" }, { "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "4C37E276CD23F0877AF7587ED1F699AADB741EC9" }, { "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "42AEFAFC23375DC250C49C420C37EDC4515B9C02" }, { "path" : "/lib64/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "76A3DEEB6876CBED69A57D3EBC1E2AFBCA84EC76" }, { "path" : "/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "A4D98A7611F0CDBEA8A4FEDF36BDA25E772F387E" }, { "path" : "/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "4623A78918C882770E81AE7B5EE9DDF8DD2B6674" }, { "path" : "/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "190D45F6743DEF9DF8169D65801D4575B01825BD" }, { "path" : "/lib64/libz.so.1", "elfType" : 3, "buildId" : "5FA8E5038EC04A774AF72A9BB62DC86E1049C4D6" }, { "path" : "/lib64/libkrb5support.so.0", "elfType" : 3, "buildId" : "DAE2A7E4E8B37D43EF6839FF5D8E012AFCF21A69" }, { "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "8A8734DC37305D8CC2EF8F8C3E5EA03171DB07EC" }, { "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "8217E68E5C9D964CDF500F488B2A183F870F36B2" }, { "path" : "/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "BAD5C71361DADF259B6E306A49E6F47F24AEA3DC" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x29) [0xf77369]
 mongod(_ZN5mongo10logContextEPKc+0xE1) [0xf140f1]
 mongod(_ZN5mongo17invariantOKFailedEPKcRKNS_6StatusES1_j+0xDA) [0xef99fa]
 mongod(_ZN5mongo22WiredTigerRecoveryUnit9_txnCloseEb+0x2EC) [0xd8a55c]
 mongod(_ZN5mongo22WiredTigerRecoveryUnit7_commitEv+0x8A) [0xd8a66a]
 mongod(_ZN5mongo15WriteUnitOfWork6commitEv+0x1C) [0x908a2c]
 mongod(_ZN5mongo4repl15writeOpsToOplogEPNS_16OperationContextERKSt5dequeINS_7BSONObjESaIS4_EE+0x26A) [0xc3ea7a]
 mongod(_ZN5mongo4repl8SyncTail10multiApplyEPNS_16OperationContextERSt5dequeINS_7BSONObjESaIS5_EE+0x4DD) [0xcaabfd]
 mongod(_ZN5mongo4repl8SyncTail16oplogApplicationEv+0x118) [0xcabdd8]
 mongod(_ZN5mongo4repl13runSyncThreadEv+0x1EF) [0xca43af]
 mongod(+0xBC58E4) [0xfc58e4]
 libpthread.so.0(+0x7851) [0x3ae5e07851]
 libc.so.6(clone+0x6D) [0x3ae5ae767d]
-----  END BACKTRACE  -----
2015-07-09T11:33:38.896+0800 I -        [rsSync] 
 
***aborting after invariant() failure
 
 
2015-07-09T12:03:57.496+0800 I CONTROL  ***** SERVER RESTARTED *****

Comment by Geert Bosch [ 09/Jul/15 ]

The most likely cause of an error during commit of a transaction is running out of space on the device with the journal (WT log file).

Comment by Geert Bosch [ 09/Jul/15 ]

The failed invariant (including the actual error return) should have been logged and is crucial to finding out what the problem is.

Comment by Sam Kleinman (Inactive) [ 09/Jul/15 ]

I would like to ask for some additional information that will help us understand the problem a bit better:

  1. I would like to confirm that your mongod instance has Journaling enabled. This is the default, and I just want to make sure that you're specifically disabling journalling for this instance.
  2. Could you provide the output of ls -l for both the MongoDB dbPath and the journal directory?
  3. Could you please report on the amount of free space on the volume, or volumes, where MongoDB is writing data?

As above, the logging from before the crash would also be very helpful.

Thanks,
sam

Comment by Sam Kleinman (Inactive) [ 09/Jul/15 ]

Sorry that you hit this issue, and thanks so much for the report. I have a couple of questions.

  1. Did all the secondaries in the replica set encounter and log this message, or was it only one member?
  2. Can you provide some of the logging before the stack trace? 20 Lines should be enough, but whatever you have is fine.

Thanks,
sam

Generated at Thu Feb 08 03:50:38 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.