[SERVER-23370] sync producer exception: Resource temporarily unavailable Created: 28/Mar/16  Updated: 03/Jun/16  Resolved: 03/Jun/16

Status: Closed
Project: Core Server
Component/s: Replication, WiredTiger
Affects Version/s: 3.2.4
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Boris POZDNYAKOV Assignee: Kelsey Schubert
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File mongodbreplica1.log     Text File mongodbreplica2.log     Text File mongodbreplica3.log    
Issue Links:
Depends
depends on SERVER-23902 Failing to create a thread should fai... Closed
Operating System: ALL
Sprint: Platforms 15 (06/03/16)
Participants:

 Description   

Hello
Replica set have got weird behaviour after correct work. Fail down with this log:

NETWORK  [conn652] SocketException handling request, closing client connection: 9001 socket exception [SEND_ERROR] server [127.0.0.1:47801] 
2016-03-28T11:57:42.330+0000 I NETWORK  [conn602] SocketException handling request, closing client connection: 9001 socket exception [SEND_ERROR] server [xxx.xxx.xxx.xxx:46213] 
2016-03-28T11:57:43.784+0000 I COMMAND  [conn680] command admin.$cmd command: isMaster { ismaster: true } keyUpdates:0 writeConflicts:0 numYields:0 reslen:370 locks:{} protocol:op_query 1609ms
2016-03-28T11:57:43.869+0000 F REPL     [rsBackgroundSync] sync producer exception: Resource temporarily unavailable
2016-03-28T11:57:43.899+0000 I -        [rsBackgroundSync] Fatal Assertion 28546
2016-03-28T11:57:43.900+0000 I -        [rsBackgroundSync] 
***aborting after fassert() failure
 
 
2016-03-28T11:57:56.249+0000 F -        [rsBackgroundSync] Got signal: 6 (Aborted).
 
 0x12f3502 0x12f2659 0x12f2e62 0x7f4ba82ee340 0x7f4ba7f4fcc9 0x7f4ba7f530d8 0x127d9d2 0xe44646 0x7f4ba8ac9a60 0x7f4ba82e6182 0x7f4ba801347d
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"EF3502","s":"_ZN5mongo15printStackTraceERSo"},{"b":"400000","o":"EF2659"},{"b":"400000","o":"EF2E62"},{"b":"7F4BA82DE000","o":"10340"},{"b":"7F4BA7F19000","o":"36CC9","s":"gsignal"},{"b":"7F4BA7F19000","o":"3A0D8","s":"abort"},{"b":"400000","o":"E7D9D2","s":"_ZN5mongo13fassertFailedEi"},{"b":"400000","o":"A44646","s":"_ZN5mongo4repl14BackgroundSync14producerThreadEv"},{"b":"7F4BA8A18000","o":"B1A60"},{"b":"7F4BA82DE000","o":"8182"},{"b":"7F4BA7F19000","o":"FA47D","s":"clone"}],"processInfo":{ "mongodbVersion" : "3.2.4", "gitVersion" : "e2ee9ffcf9f5a94fad76802e28cc978718bb7a30", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "3.13.0-74-generic", "version" : "#118-Ubuntu SMP Thu Dec 17 22:52:10 UTC 2015", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "EF46210F8976780D45B811C3540FECB9E734EABE" }, { "b" : "7FFDA8435000", "elfType" : 3, "buildId" : "DC075B751E9FB361F14CD59BD81300A6BB5CB377" }, { "b" : "7F4BA9504000", "path" : "/lib/x86_64-linux-gnu/libssl.so.1.0.0", "elfType" : 3, "buildId" : "E21720F2804EF30440F2B39CD409252C26F58F73" }, { "b" : "7F4BA9128000", "path" : "/lib/x86_64-linux-gnu/libcrypto.so.1.0.0", "elfType" : 3, "buildId" : "9BC22F9457E3D7E9CF8DDC135C0DAC8F7742135D" }, { "b" : "7F4BA8F20000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "B376100CAB1EAC4E5DE066EACFC282BF7C0B54F3" }, { "b" : "7F4BA8D1C000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "67699FFDA9FD2A552032E0652A242E82D65AA10D" }, { "b" : "7F4BA8A18000", "path" : "/usr/lib/x86_64-linux-gnu/libstdc++.so.6", "elfType" : 3, "buildId" : "D0E735DBECD63462DA114BD3F76E6EC7BB1FACCC" }, { "b" : "7F4BA8712000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "EF3F6DFFA1FBE48436EC6F45CD3AABA157064BB4" }, { "b" : "7F4BA84FC000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "36311B4457710AE5578C4BF00791DED7359DBB92" }, { "b" : "7F4BA82DE000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "AF06068681750736E0524DF17D5A86CB2C3F765C" }, { "b" : "7F4BA7F19000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "5382058B69031CAA9B9996C11061CD164C9398FF" }, { "b" : "7F4BA9763000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "2A816C3EBBA4E12813FBD34B06FBD25BC892A67F" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x32) [0x12f3502]
 mongod(+0xEF2659) [0x12f2659]
 mongod(+0xEF2E62) [0x12f2e62]
 libpthread.so.0(+0x10340) [0x7f4ba82ee340]
 libc.so.6(gsignal+0x39) [0x7f4ba7f4fcc9]
 libc.so.6(abort+0x148) [0x7f4ba7f530d8]
 mongod(_ZN5mongo13fassertFailedEi+0x82) [0x127d9d2]
 mongod(_ZN5mongo4repl14BackgroundSync14producerThreadEv+0x116) [0xe44646]
 libstdc++.so.6(+0xB1A60) [0x7f4ba8ac9a60]
 libpthread.so.0(+0x8182) [0x7f4ba82e6182]
 libc.so.6(clone+0x6D) [0x7f4ba801347d]
-----  END BACKTRACE  -----



 Comments   
Comment by Ramon Fernandez Marina [ 03/Jun/16 ]

As per SERVER-19135, MongoDB 3.2 increases the default WiredTiger cache size from 50% to 60% of RAM. Since you're running multiple mongod servers on the same box this small increase may have put your system over the red line.

Comment by David Lynch [ 03/Jun/16 ]

Thanks anonymous.user, I think you're right and I wanted to add some closing thoughts for anyone that finds this issue.

After a lot of digging, I believe that the underlying issue boils down to a change in 3.2.X cache size calculation.

New Configuration YAML

storage:
    dbPath: "/var/lib/mongodb-replica/mongodbreplica1"
    wiredTiger:
        engineConfig:
            cacheSizeGB: 1        
systemLog:
    destination: file
    path: "/var/log/mongodb-replica/mongodbreplica1.log"
    logAppend: true
net:
   port: 27017
   http:
      enabled: true
      RESTInterfaceEnabled: true    
processManagement:
   pidFilePath: "/var/run/mongodb-replica/mongodbreplica1.pid" 
replication:
   oplogSizeMB: 128
   replSetName: "mongodbreplica"

In my case, the culprit parameter was cacheSizeGB, and it is because I'm running three databases in a single machine. I know that this isn't the optimal configuration for production, but my main goal is to get the OPLOG (for Meteor optimization), and redundancy and fail-over is a minor concern.

The problem is that WiredTiger memory defaults don't anticipate that I'll be running three databases in a single machine, in fact they explain this. By default MongoDB is taking 60% of the RAM minus 1GB, but that same calculation happens three times: one for each replica in the replica set.

https://docs.mongodb.com/manual/reference/configuration-options/#storage.wiredTiger.engineConfig.cacheSizeGB

Note this comment:

The default WiredTiger cache size value assumes that there is a single mongod instance per machine. If a single machine contains multiple MongoDB instances, then you should decrease the setting to accommodate the other mongod instances.

.
In this case, the defaults caused the overall cache size to exceed the physical memory of my computer.

As shown in the YAML, I lowered the cache size to 1GB for each mongod instance, so the three MongoDB instances now use 1GB for each mongod instance, so we're using 3GB of resident memory (about 6GB of virtual memory), on a machine with a total of 8GB.

After changing this parameter, the problem has not yet reoccurred, so I'm feeling optimistic that this was the root cause of the issue. Note that this problem did not surface in MongoDB 3.0.X, so from the outset I had a hunch that the 3.2.X upgrade was somehow involved.

Comment by Kelsey Schubert [ 03/Jun/16 ]

Hi 0x42 and david@sotaenterprises.com,

We have completed our investigation of this issue and do not see anything to indicate a bug in MongoDB. The behavior you are observing suggests that your system may be under provisioned for its current configuration as the operating system is rejecting the request to create a new thread.

For MongoDB-related support discussion please post on the mongodb-users group or Stack Overflow with the mongodb tag. A question about system provisioning or system configuration would involve more discussion and would be best posted on the mongodb-users group.

Thank you,
Thomas

Comment by David Lynch [ 02/May/16 ]

These are the three log files from each of my databases, all three are in a replica set. Note two occurrences of the issue. I'm not 100% sure this problem is actually caused by MongoDB of if this is a symptom of something else. In both cases, whatever occurred prevented me from connecting to the system via PuTTY.

Comment by David Lynch [ 02/May/16 ]

BTW on my system here is the threads-max:

ubuntu@ip-172-31-6-169:~$ cat /proc/sys/kernel/threads-max
60070

Comment by David Lynch [ 02/May/16 ]

I'm seeing something very similar. I'm using replication on MongoDB 3.2.5 and twice I've gotten Fatal Assertion 28546. I have a replica set with three databases.

2016-04-18T02:16:45.549+0000 I COMMAND  [ftdc] serverStatus was very slow: { after basic: 80, after asserts: 90, after connections: 100, after extra_info: 280, after globalLock: 290, after locks: 380, after network: 380, after opcounters: 380, after opcountersRepl: 380, after repl: 670, after storageEngine: 810, after tcmalloc: 900, after wiredTiger: 1230, at end: 1340 }
2016-04-18T02:16:46.537+0000 I COMMAND  [conn423] command admin.$cmd command: replSetHeartbeat { replSetHeartbeat: "mongodbreplica", configVersion: 2, from: "localhost:27015", fromId: 2, term: 37 } keyUpdates:0 writeConflicts:0 numYields:0 reslen:425 locks:{} protocol:op_command 396ms
2016-04-18T02:16:47.561+0000 I COMMAND  [conn425] command admin.$cmd command: replSetHeartbeat { replSetHeartbeat: "mongodbreplica", configVersion: 2, from: "localhost:27016", fromId: 1, term: 37 } keyUpdates:0 writeConflicts:0 numYields:0 reslen:425 locks:{} protocol:op_command 714ms
2016-04-18T02:16:48.285+0000 F REPL     [rsBackgroundSync] sync producer exception: Resource temporarily unavailable
2016-04-18T02:16:48.475+0000 I -        [rsBackgroundSync] Fatal Assertion 28546
2016-04-18T02:16:48.504+0000 I -        [rsBackgroundSync] 
 
***aborting after fassert() failure
 
 
2016-04-18T02:16:54.597+0000 I COMMAND  [conn425] command admin.$cmd command: replSetHeartbeat { replSetHeartbeat: "mongodbreplica", configVersion: 2, from: "localhost:27016", fromId: 1, term: 37 } keyUpdates:0 writeConflicts:0 numYields:0 reslen:425 locks:{} protocol:op_command 542ms
2016-04-18T02:16:54.601+0000 I REPL     [ReplicationExecutor] Canceling priority takeover callback
2016-04-18T02:16:55.363+0000 I NETWORK  [initandlisten] connection accepted from 127.0.0.1:48748 #430 (8 connections now open)
2016-04-18T02:16:55.928+0000 I REPL     [ReplicationExecutor] Starting an election for a priority takeover
2016-04-18T02:16:56.312+0000 I REPL     [ReplicationExecutor] conducting a dry run election to see if we could be elected
2016-04-18T02:16:56.312+0000 I COMMAND  [ftdc] serverStatus was very slow: { after basic: 290, after asserts: 480, after connections: 810, after extra_info: 1830, after globalLock: 2310, after locks: 2450, after network: 2450, after opcounters: 2450, after opcountersRepl: 2450, after repl: 2820, after storageEngine: 3330, after tcmalloc: 4280, after wiredTiger: 5310, at end: 5773 }
2016-04-18T02:16:56.312+0000 F -        [rsBackgroundSync] Got signal: 6 (Aborted).
 
 0x12f3502 0x12f2659 0x12f2e62 0x7fb037374340 0x7fb036fd5cc9 0x7fb036fd90d8 0x127d9d2 0xe44646 0x7fb037b4fa40 0x7fb03736c182 0x7fb03709947d
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"EF3502","s":"_ZN5mongo15printStackTraceERSo"},{"b":"400000","o":"EF2659"},{"b":"400000","o":"EF2E62"},{"b":"7FB037364000","o":"10340"},{"b":"7FB036F9F000","o":"36CC9","s":"gsignal"},{"b":"7FB036F9F000","o":"3A0D8","s":"abort"},{"b":"400000","o":"E7D9D2","s":"_ZN5mongo13fassertFailedEi"},{"b":"400000","o":"A44646","s":"_ZN5mongo4repl14BackgroundSync14producerThreadEv"},{"b":"7FB037A9E000","o":"B1A40"},{"b":"7FB037364000","o":"8182"},{"b":"7FB036F9F000","o":"FA47D","s":"clone"}],"processInfo":{ "mongodbVersion" : "3.2.4", "gitVersion" : "e2ee9ffcf9f5a94fad76802e28cc978718bb7a30", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "3.13.0-74-generic", "version" : "#118-Ubuntu SMP Thu Dec 17 22:52:10 UTC 2015", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "EF46210F8976780D45B811C3540FECB9E734EABE" }, { "b" : "7FFFE7AAC000", "elfType" : 3, "buildId" : "DC075B751E9FB361F14CD59BD81300A6BB5CB377" }, { "b" : "7FB038589000", "path" : "/lib/x86_64-linux-gnu/libssl.so.1.0.0", "elfType" : 3, "buildId" : "D08DD65F97859C71BB2CBBF1043BD968EFE18AAD" }, { "b" : "7FB0381AE000", "path" : "/lib/x86_64-linux-gnu/libcrypto.so.1.0.0", "elfType" : 3, "buildId" : "F86FA9FB4ECEB4E06B40DBDF761A4172B70A4229" }, { "b" : "7FB037FA6000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "92FCF41EFE012D6186E31A59AD05BDBB487769AB" }, { "b" : "7FB037DA2000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "C1AE4CB7195D337A77A3C689051DABAA3980CA0C" }, { "b" : "7FB037A9E000", "path" : "/usr/lib/x86_64-linux-gnu/libstdc++.so.6", "elfType" : 3, "buildId" : "4BF6F7ADD8244AD86008E6BF40D90F8873892197" }, { "b" : "7FB037798000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "1D76B71E905CB867B27CEF230FCB20F01A3178F5" }, { "b" : "7FB037582000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "36311B4457710AE5578C4BF00791DED7359DBB92" }, { "b" : "7FB037364000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "9318E8AF0BFBE444731BB0461202EF57F7C39542" }, { "b" : "7FB036F9F000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "30C94DC66A1FE95180C3D68D2B89E576D5AE213C" }, { "b" : "7FB0387E8000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "9F00581AB3C73E3AEA35995A0C50D24D59A01D47" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x32) [0x12f3502]
 mongod(+0xEF2659) [0x12f2659]
 mongod(+0xEF2E62) [0x12f2e62]
 libpthread.so.0(+0x10340) [0x7fb037374340]
 libc.so.6(gsignal+0x39) [0x7fb036fd5cc9]
 libc.so.6(abort+0x148) [0x7fb036fd90d8]
 mongod(_ZN5mongo13fassertFailedEi+0x82) [0x127d9d2]
 mongod(_ZN5mongo4repl14BackgroundSync14producerThreadEv+0x116) [0xe44646]
 libstdc++.so.6(+0xB1A40) [0x7fb037b4fa40]
 libpthread.so.0(+0x8182) [0x7fb03736c182]
 libc.so.6(clone+0x6D) [0x7fb03709947d]
-----  END BACKTRACE  -----
2016-04-19T00:08:55.406+0000 I CONTROL  [main] ***** SERVER RESTARTED *****

And a second time today 2016-05-02:

2016-05-02T06:29:43.430+0000 I COMMAND  [conn1946] command admin.$cmd command: replSetHeartbeat { replSetHeartbeat: "mongodbreplica", configVersion: 2, from: "localhost:27016", fromId: 1, term: 46 } keyUpdates:0 writeConflicts:0 numYields:0 reslen:394 locks:{} protocol:op_command 771ms
2016-05-02T06:29:43.430+0000 I COMMAND  [conn1947] command admin.$cmd command: replSetHeartbeat { replSetHeartbeat: "mongodbreplica", configVersion: 2, from: "localhost:27015", fromId: 2, term: 46 } keyUpdates:0 writeConflicts:0 numYields:0 reslen:394 locks:{} protocol:op_command 771ms
2016-05-02T06:29:43.835+0000 I REPL     [ReplicationExecutor] syncing from: localhost:27016
2016-05-02T06:29:45.663+0000 I ACCESS   [conn1958] Successfully authenticated as principal oplogger on admin
2016-05-02T06:29:46.262+0000 F REPL     [rsBackgroundSync] sync producer exception: Resource temporarily unavailable
2016-05-02T06:29:46.361+0000 I REPL     [ReplicationExecutor] Canceling priority takeover callback
2016-05-02T06:29:47.073+0000 I -        [rsBackgroundSync] Fatal Assertion 28546
2016-05-02T06:29:47.075+0000 I COMMAND  [conn1958] command admin.system.users command: saslContinue { saslContinue: 1, conversationId: 1, payload: BinData(0, ) } keyUpdates:0 writeConflicts:0 numYields:0 reslen:78 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_query 6704ms
2016-05-02T06:29:47.155+0000 I REPL     [ReplicationExecutor] Starting an election for a priority takeover
2016-05-02T06:29:47.162+0000 I REPL     [ReplicationExecutor] conducting a dry run election to see if we could be elected
2016-05-02T06:29:47.250+0000 I -        [rsBackgroundSync] 
 
***aborting after fassert() failure
 
 
2016-05-02T06:29:48.601+0000 I COMMAND  [conn1946] command admin.$cmd command: replSetHeartbeat { replSetHeartbeat: "mongodbreplica", configVersion: 2, from: "localhost:27016", fromId: 1, term: 46 } keyUpdates:0 writeConflicts:0 numYields:0 reslen:425 locks:{} protocol:op_command 281ms
2016-05-02T06:29:49.309+0000 I COMMAND  [ftdc] serverStatus was very slow: { after basic: 70, after asserts: 140, after connections: 140, after extra_info: 590, after globalLock: 790, after locks: 890, after network: 980, after opcounters: 980, after opcountersRepl: 980, after repl: 2980, after storageEngine: 2980, after tcmalloc: 3080, after wiredTiger: 3080, at end: 3620 }
2016-05-02T06:29:49.549+0000 I REPL     [ReplicationExecutor] VoteRequester: Got no vote from localhost:27016 because: candidate's data is staler than mine, resp:{ term: 46, voteGranted: false, reason: "candidate's data is staler than mine", ok: 1.0 }
2016-05-02T06:29:51.012+0000 I COMMAND  [conn1947] command admin.$cmd command: replSetHeartbeat { replSetHeartbeat: "mongodbreplica", configVersion: 2, from: "localhost:27015", fromId: 2, term: 46 } keyUpdates:0 writeConflicts:0 numYields:0 reslen:425 locks:{} protocol:op_command 635ms
2016-05-02T06:29:52.360+0000 I COMMAND  [conn1946] command admin.$cmd command: replSetHeartbeat { replSetHeartbeat: "mongodbreplica", configVersion: 2, from: "localhost:27016", fromId: 1, term: 46 } keyUpdates:0 writeConflicts:0 numYields:0 reslen:425 locks:{} protocol:op_command 726ms
2016-05-02T06:29:58.242+0000 I ASIO     [NetworkInterfaceASIO-Replication-0] Successfully connected to localhost:27015
2016-05-02T06:29:58.244+0000 I REPL     [ReplicationExecutor] Scheduling priority takeover at 2016-05-02T06:30:08.240+0000
2016-05-02T06:29:58.900+0000 F -        [rsBackgroundSync] Got signal: 6 (Aborted).
 
 0x1315022 0x1314179 0x1314982 0x7fc7e7ebe340 0x7fc7e7b1fcc9 0x7fc7e7b230d8 0x129f4f2 0xe64886 0x1b2b8d0 0x7fc7e7eb6182 0x7fc7e7be347d
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"F15022","s":"_ZN5mongo15printStackTraceERSo"},{"b":"400000","o":"F14179"},{"b":"400000","o":"F14982"},{"b":"7FC7E7EAE000","o":"10340"},{"b":"7FC7E7AE9000","o":"36CC9","s":"gsignal"},{"b":"7FC7E7AE9000","o":"3A0D8","s":"abort"},{"b":"400000","o":"E9F4F2","s":"_ZN5mongo13fassertFailedEi"},{"b":"400000","o":"A64886","s":"_ZN5mongo4repl14BackgroundSync14producerThreadEv"},{"b":"400000","o":"172B8D0","s":"execute_native_thread_routine"},{"b":"7FC7E7EAE000","o":"8182"},{"b":"7FC7E7AE9000","o":"FA47D","s":"clone"}],"processInfo":{ "mongodbVersion" : "3.2.5", "gitVersion" : "34e65e5383f7ea1726332cb175b73077ec4a1b02", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "3.13.0-74-generic", "version" : "#118-Ubuntu SMP Thu Dec 17 22:52:10 UTC 2015", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "8BD0E2ADD4592C91BBADCA1EEBC2B002DF5555A6" }, { "b" : "7FFC812EE000", "elfType" : 3, "buildId" : "DC075B751E9FB361F14CD59BD81300A6BB5CB377" }, { "b" : "7FC7E8DD0000", "path" : "/lib/x86_64-linux-gnu/libssl.so.1.0.0", "elfType" : 3, "buildId" : "E21720F2804EF30440F2B39CD409252C26F58F73" }, { "b" : "7FC7E89F4000", "path" : "/lib/x86_64-linux-gnu/libcrypto.so.1.0.0", "elfType" : 3, "buildId" : "9BC22F9457E3D7E9CF8DDC135C0DAC8F7742135D" }, { "b" : "7FC7E87EC000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "B376100CAB1EAC4E5DE066EACFC282BF7C0B54F3" }, { "b" : "7FC7E85E8000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "67699FFDA9FD2A552032E0652A242E82D65AA10D" }, { "b" : "7FC7E82E2000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "EF3F6DFFA1FBE48436EC6F45CD3AABA157064BB4" }, { "b" : "7FC7E80CC000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "36311B4457710AE5578C4BF00791DED7359DBB92" }, { "b" : "7FC7E7EAE000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "AF06068681750736E0524DF17D5A86CB2C3F765C" }, { "b" : "7FC7E7AE9000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "5382058B69031CAA9B9996C11061CD164C9398FF" }, { "b" : "7FC7E902F000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "2A816C3EBBA4E12813FBD34B06FBD25BC892A67F" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x32) [0x1315022]
 mongod(+0xF14179) [0x1314179]
 mongod(+0xF14982) [0x1314982]
 libpthread.so.0(+0x10340) [0x7fc7e7ebe340]
 libc.so.6(gsignal+0x39) [0x7fc7e7b1fcc9]
 libc.so.6(abort+0x148) [0x7fc7e7b230d8]
 mongod(_ZN5mongo13fassertFailedEi+0x82) [0x129f4f2]
 mongod(_ZN5mongo4repl14BackgroundSync14producerThreadEv+0x116) [0xe64886]
 mongod(execute_native_thread_routine+0x20) [0x1b2b8d0]
 libpthread.so.0(+0x8182) [0x7fc7e7eb6182]
 libc.so.6(clone+0x6D) [0x7fc7e7be347d]
-----  END BACKTRACE  -----
2016-05-02T21:24:30.747+0000 I CONTROL  [main] ***** SERVER RESTARTED *****

Comment by Boris POZDNYAKOV [ 26/Apr/16 ]

Hi guys,
1) # cat /proc/sys/kernel/threads-max
63070
2) This problem was only once.
3) Unfortunately, I haven't got full log.

Linux server has default configuration, after this error I change config how in this article https://docs.mongodb.org/manual/reference/ulimit/. Maybe its help me ...

Comment by Kelsey Schubert [ 25/Apr/16 ]

Hi 0x42,

We still need the information Eric requested to diagnose the problem. If this is still an issue for you, can you please answer Eric's questions?

Thank you,
Thomas

Comment by Eric Milkie [ 06/Apr/16 ]

Hi Boris,
It's possible you may have hit the limit for number of processes on your system.
1. Can you report the output of "cat /proc/sys/kernel/threads-max" from the failing node here?
2. Have you experienced this problem frequently or only once?
3. Can you upload the entire mongod log starting from when the process was started, up to and including the exception messages?

Generated at Thu Feb 08 04:03:11 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.