[SERVER-36161] pthread_create failed: Resource temporarily unavailable in sharding cluster Created: 17/Jul/18  Updated: 20/Jul/18  Resolved: 20/Jul/18

Status: Closed
Project: Core Server
Component/s: Sharding, WiredTiger
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Roberto Rodriguez Assignee: Nick Brewer
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File diagnostic.data.tar.gz    
Operating System: ALL
Participants:

 Description   

Hi,

We've a sharding, based on 8 servers, with 4 replicaset with this structure:

  • replicaset1: server01a / server01b
  • replicaset2: server02a / server02b
  • replicaset3: server03a / server03b
  • replicaset4: server04a / server04b

The servers are physical servers, have SSD, 32 threads and 256Gb of RAM.

The mongodb config on each node is similar to this one:

 

storage:
 dbPath: /var/lib/mongodb
 journal:
 enabled: true
 wiredTiger:
 engineConfig:
 configString : "session_max=102400"
 cacheSizeGB: 200
setParameter:
 cursorTimeoutMillis: 120000
operationProfiling:
 mode: slowOp
 slowOpThresholdMs: 300
systemLog:
 destination: file
 logAppend: true
 path: /var/log/mongodb/mongod.log
net:
 port: 27017
 bindIp: 0.0.0.0
 maxIncomingConnections: 102400
replication:
 replSetName: rsmmhad03
sharding:
 clusterRole: shardsvr
 

sysctl file:

 

net.ipv4.ip_local_port_range = 1024 65535
kernel.shmmax = 1073741824
fs.file-max=5000000
vm.swappiness = 1
vm.dirty_ratio = 15
vm.dirty_background_ratio = 5
net.core.somaxconn = 4096
net.ipv4.tcp_fin_timeout = 30
net.ipv4.tcp_keepalive_intvl = 30
net.ipv4.tcp_keepalive_time = 120
net.ipv4.tcp_max_syn_backlog = 4096

 

/etc/security/limits.d/mongod.conf

mongod soft nproc 128000
mongod hard nproc 128000
mongod soft nofile 128000
mongod hard nofile 128000

/lib/systemd/system/mongod.service

 

[Unit]
 Description=High-performance, schema-free document-oriented database
 After=network.target
 Documentation=https://docs.mongodb.org/manual
[Service]
 User=mongodb
 Group=mongodb
 ExecStart=/usr/bin/numactl --interleave=all /usr/bin/mongod --config /etc/mongod.conf
 PIDFile=/var/run/mongodb/mongod.pid
 
file size
 LimitFSIZE=infinity cpu time
 LimitCPU=infinity virtual memory size
 LimitAS=infinity open files
 LimitNOFILE=128000 processes/threads
 LimitNPROC=128000 locked memory
 LimitMEMLOCK=infinity total threads (user+kernel)
 TasksMax=infinity
 TasksAccounting=false   Recommended limits for for mongod as specified in http://docs.mongodb.org/manual/reference/ulimit/#recommended-settings
[Install]
 WantedBy=multi-user.target

 

The sharding have millions of documents, and millions of queries (more than 100.000.000 queries per day).

The problem is that randomly, we receive an error like the next one:

 

2018-07-17T15:57:17.978+0200 I - [thread1] pthread_create failed: Resource temporarily unavailable
2018-07-17T15:57:17.978+0200 I - [thread1] failed to create service entry worker thread for 10.3.16.1:56153
2018-07-17T15:57:17.978+0200 I COMMAND [conn16910] command had.hadCompressed command: find { find: "hadCompressed", filter: { chkin: "2018-08-10", n: 4, occ: "1::3-0/", nid: { $in: [ 0, 30115 ] }, rtype: { $in: [ 1, null ] }, hid: { $in: [ 435179, 231562, 38468, 330644, 307226, 359353, 352215, 88059, 321458, 307181, 85590, 87268, 385303, 252432, 242030, 231596, 307182, 172732, 577889, 38743, 38621, 199946, 435167, 149852, 244963, 391702, 260891, 150236, 307227, 307202, 38730, 156100, 297051, 257466, 498152, 174201, 174250, 577903, 424804, 435152, 197357, 242026, 385251, 205997, 330638, 154974, 37600, 38021, 160751, 435137, 86520, 37217, 363892, 375650, 244960, 252441, 261988, 432659, 609717, 156152, 363893, 149696, 149490, 232726, 87413, 252958, 315863, 219739, 231563, 388212, 412850, 501130, 388772, 231607, 369178, 164246, 38029, 330636, 260877, 38156, 236389, 38068, 257418, 282221, 307186, 299255, 199164, 231575, 88191, 199162, 80373, 200283, 246961, 195476, 424809, 286709, 193058, 208323, 435142, 318242 ] }, lchg: { $gte: new Date(1531749437000) } }, shardVersion: [ Timestamp 22129000|0, ObjectId('5af1c64abeee30df3be9f7db') ] } planSummary: IXSCAN { chkin: 1, n: 1, occ: 1, nid: 1, rtype: 1, hid: 1 } keysExamined:117 docsExamined:41 cursorExhausted:1 numYields:1 nreturned:0 reslen:202 locks:{ Global: { acquireCount: { r: 4 } }, Database: { acquireCount: { r: 2 } }, Collection: { acquireCount: { r: 2 } } } protocol:op_command 547ms
2018-07-17T15:57:17.978+0200 I NETWORK [thread1] connection accepted from 10.3.102.1:53260 #42127 (32627 connections now open)
2018-07-17T15:57:17.978+0200 I - [thread1] pthread_create failed: Resource temporarily unavailable
2018-07-17T15:57:17.978+0200 I - [thread1] failed to create service entry worker thread for 10.3.102.1:53260
2018-07-17T15:57:17.978+0200 I NETWORK [thread1] connection accepted from 10.3.9.1:47587 #42128 (32627 connections now open)
2018-07-17T15:57:17.978+0200 F - [conn14595] Got signal: 6 (Aborted).
0x562cd6379171 0x562cd6378389 0x562cd637886d 0x7f49ce038890 0x7f49cdcb3067 0x7f49cdcb4448 0x562cd561a341 0x562cd607e01b 0x562cd607edf0 0x562cd607b18d 0x562cd607bccd 0x562cd607bf30 0x562cd6056ef7 0x562cd5a64478 0x562cd5994b68 0x562cd599508f 0x562cd59a55c3 0x562cd5983d0e 0x562cd59a55c3 0x562cd59b56e7 0x562cd59a55c3 0x562cd5977338 0x562cd5cae7a2 0x562cd5cb0b48 0x562cd5cb17fc 0x562cd5c6ac42 0x562cd5c6b79b 0x562cd58917a0 0x562cd58689af 0x562cd586a0aa 0x562cd5e85480 0x562cd5a89540 0x562cd568a97d 0x562cd568b2ad 0x562cd62df0d1 0x7f49ce031064 0x7f49cdd6662d
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"562CD4DFE000","o":"157B171","s":"_ZN5mongo15printStackTraceERSo"},{"b":"562CD4DFE000","o":"157A389"},{"b":"562CD4DFE000","o":"157A86D"},{"b":"7F49CE029000","o":"F890"},{"b":"7F49CDC7E000","o":"35067","s":"gsignal"},{"b":"7F49CDC7E000","o":"36448","s":"abort"},{"b":"562CD4DFE000","o":"81C341","s":"_ZN5mongo25fassertFailedWithLocationEiPKcj"},{"b":"562CD4DFE000","o":"128001B","s":"_ZN5mongo17WiredTigerSessionC1EP15__wt_connectionPNS_22WiredTigerSessionCacheEmm"},{"b":"562CD4DFE000","o":"1280DF0","s":"_ZN5mongo22WiredTigerSessionCache10getSessionEv"},{"b":"562CD4DFE000","o":"127D18D"},{"b":"562CD4DFE000","o":"127DCCD","s":"_ZN5mongo22WiredTigerRecoveryUnit8_txnOpenEPNS_16OperationContextE"},{"b":"562CD4DFE000","o":"127DF30","s":"_ZN5mongo16WiredTigerCursorC1ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEmbPNS_16OperationContextE"},{"b":"562CD4DFE000","o":"1258EF7","s":"_ZNK5mongo23WiredTigerIndexStandard9newCursorEPNS_16OperationContextEb"},{"b":"562CD4DFE000","o":"C66478","s":"_ZNK5mongo17IndexAccessMethod9newCursorEPNS_16OperationContextEb"},{"b":"562CD4DFE000","o":"B96B68","s":"_ZN5mongo9IndexScan13initIndexScanEv"},{"b":"562CD4DFE000","o":"B9708F","s":"_ZN5mongo9IndexScan6doWorkEPm"},{"b":"562CD4DFE000","o":"BA75C3","s":"_ZN5mongo9PlanStage4workEPm"},{"b":"562CD4DFE000","o":"B85D0E","s":"_ZN5mongo10FetchStage6doWorkEPm"},{"b":"562CD4DFE000","o":"BA75C3","s":"_ZN5mongo9PlanStage4workEPm"},{"b":"562CD4DFE000","o":"BB76E7","s":"_ZN5mongo16ShardFilterStage6doWorkEPm"},{"b":"562CD4DFE000","o":"BA75C3","s":"_ZN5mongo9PlanStage4workEPm"},{"b":"562CD4DFE000","o":"B79338","s":"_ZN5mongo15CachedPlanStage12pickBestPlanEPNS_15PlanYieldPolicyE"},{"b":"562CD4DFE000","o":"EB07A2","s":"_ZN5mongo12PlanExecutor12pickBestPlanENS0_11YieldPolicyEPKNS_10CollectionE"},{"b":"562CD4DFE000","o":"EB2B48","s":"_ZN5mongo12PlanExecutor4makeEPNS_16OperationContextESt10unique_ptrINS_10WorkingSetESt14default_deleteIS4_EES3_INS_9PlanStageES5_IS8_EES3_INS_13QuerySolutionES5_ISB_EES3_INS_14CanonicalQueryES5_ISE_EEPKNS_10CollectionERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS0_11YieldPolicyE"},{"b":"562CD4DFE000","o":"EB37FC","s":"_ZN5mongo12PlanExecutor4makeEPNS_16OperationContextESt10unique_ptrINS_10WorkingSetESt14default_deleteIS4_EES3_INS_9PlanStageES5_IS8_EES3_INS_13QuerySolutionES5_ISB_EES3_INS_14CanonicalQueryES5_ISE_EEPKNS_10CollectionENS0_11YieldPolicyE"},{"b":"562CD4DFE000","o":"E6CC42","s":"_ZN5mongo11getExecutorEPNS_16OperationContextEPNS_10CollectionESt10unique_ptrINS_14CanonicalQueryESt14default_deleteIS5_EENS_12PlanExecutor11YieldPolicyEm"},{"b":"562CD4DFE000","o":"E6D79B","s":"_ZN5mongo15getExecutorFindEPNS_16OperationContextEPNS_10CollectionERKNS_15NamespaceStringESt10unique_ptrINS_14CanonicalQueryESt14default_deleteIS8_EENS_12PlanExecutor11YieldPolicyE"},{"b":"562CD4DFE000","o":"A937A0","s":"_ZN5mongo7FindCmd3runEPNS_16OperationContextERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERNS_7BSONObjEiRS8_RNS_14BSONObjBuilderE"},{"b":"562CD4DFE000","o":"A6A9AF","s":"_ZN5mongo7Command3runEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS3_21ReplyBuilderInterfaceE"},{"b":"562CD4DFE000","o":"A6C0AA","s":"_ZN5mongo7Command11execCommandEPNS_16OperationContextEPS0_RKNS_3rpc16RequestInterfaceEPNS4_21ReplyBuilderInterfaceE"},{"b":"562CD4DFE000","o":"1087480","s":"_ZN5mongo11runCommandsEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS2_21ReplyBuilderInterfaceE"},{"b":"562CD4DFE000","o":"C8B540","s":"_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE"},{"b":"562CD4DFE000","o":"88C97D","s":"_ZN5mongo23ServiceEntryPointMongod12_sessionLoopERKSt10shared_ptrINS_9transport7SessionEE"},{"b":"562CD4DFE000","o":"88D2AD"},{"b":"562CD4DFE000","o":"14E10D1"},{"b":"7F49CE029000","o":"8064"},{"b":"7F49CDC7E000","o":"E862D","s":"clone"}],"processInfo":{ "mongodbVersion" : "3.4.16", "gitVersion" : "0d6a9242c11b99ddadcfb6e86a850b6ba487530a", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "3.16.0-6-amd64", "version" : "#1 SMP Debian 3.16.56-1+deb8u1 (2018-05-08)", "machine" : "x86_64" }, "somap" : [ { "b" : "562CD4DFE000", "elfType" : 3, "buildId" : "36452F27FE7A41D0E57DDE38A17B3FAE9980B0BE" }, { "b" : "7FFD853E8000", "path" : "linux-vdso.so.1", "elfType" : 3, "buildId" : "90F495E259305E7C4F498541D91C9E1240057F52" }, { "b" : "7F49CEF66000", "path" : "/usr/lib/x86_64-linux-gnu/libssl.so.1.0.0", "elfType" : 3, "buildId" : "EDE40F0BC2115063088BF442E0F2ED84BF76B11E" }, { "b" : "7F49CEB69000", "path" : "/usr/lib/x86_64-linux-gnu/libcrypto.so.1.0.0", "elfType" : 3, "buildId" : "0C9DA403601A5EEA627AF96E1EB63DD22B8DC28B" }, { "b" : "7F49CE961000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "A63C95FB33CCA970E141D2E13774B997C1CF0565" }, { "b" : "7F49CE75D000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "D70B531D672A34D71DB42EB32B68E63F2DCC5B6A" }, { "b" : "7F49CE45C000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "152C93BA3E8590F7ED0BCDDF868600D55EC4DD6F" }, { "b" : "7F49CE246000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "BAC839560495859598E8515CBAED73C7799AE1FF" }, { "b" : "7F49CE029000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "9DA9387A60FFC196AEDB9526275552AFEF499C44" }, { "b" : "7F49CDC7E000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "48C48BC6ABB794461B8A558DD76B29876A0551F0" }, { "b" : "7F49CF1C7000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "1D98D41FBB1EABA7EC05D0FD7624B85D6F51C03C" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x41) [0x562cd6379171]
 mongod(+0x157A389) [0x562cd6378389]
 mongod(+0x157A86D) [0x562cd637886d]
 libpthread.so.0(+0xF890) [0x7f49ce038890]
 libc.so.6(gsignal+0x37) [0x7f49cdcb3067]
 libc.so.6(abort+0x148) [0x7f49cdcb4448]
 mongod(_ZN5mongo25fassertFailedWithLocationEiPKcj+0x0) [0x562cd561a341]
 mongod(_ZN5mongo17WiredTigerSessionC1EP15__wt_connectionPNS_22WiredTigerSessionCacheEmm+0xBB) [0x562cd607e01b]
 mongod(_ZN5mongo22WiredTigerSessionCache10getSessionEv+0xE0) [0x562cd607edf0]
 mongod(+0x127D18D) [0x562cd607b18d]
 mongod(_ZN5mongo22WiredTigerRecoveryUnit8_txnOpenEPNS_16OperationContextE+0x19D) [0x562cd607bccd]
 mongod(_ZN5mongo16WiredTigerCursorC1ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEmbPNS_16OperationContextE+0x90) [0x562cd607bf30]
 mongod(_ZNK5mongo23WiredTigerIndexStandard9newCursorEPNS_16OperationContextEb+0x157) [0x562cd6056ef7]
 mongod(_ZNK5mongo17IndexAccessMethod9newCursorEPNS_16OperationContextEb+0x28) [0x562cd5a64478]
 mongod(_ZN5mongo9IndexScan13initIndexScanEv+0x58) [0x562cd5994b68]
 mongod(_ZN5mongo9IndexScan6doWorkEPm+0x14F) [0x562cd599508f]
 mongod(_ZN5mongo9PlanStage4workEPm+0x63) [0x562cd59a55c3]
 mongod(_ZN5mongo10FetchStage6doWorkEPm+0x29E) [0x562cd5983d0e]
 mongod(_ZN5mongo9PlanStage4workEPm+0x63) [0x562cd59a55c3]
 mongod(_ZN5mongo16ShardFilterStage6doWorkEPm+0x77) [0x562cd59b56e7]
 mongod(_ZN5mongo9PlanStage4workEPm+0x63) [0x562cd59a55c3]
 mongod(_ZN5mongo15CachedPlanStage12pickBestPlanEPNS_15PlanYieldPolicyE+0x198) [0x562cd5977338]
 mongod(_ZN5mongo12PlanExecutor12pickBestPlanENS0_11YieldPolicyEPKNS_10CollectionE+0xF2) [0x562cd5cae7a2]
 mongod(_ZN5mongo12PlanExecutor4makeEPNS_16OperationContextESt10unique_ptrINS_10WorkingSetESt14default_deleteIS4_EES3_INS_9PlanStageES5_IS8_EES3_INS_13QuerySolutionES5_ISB_EES3_INS_14CanonicalQueryES5_ISE_EEPKNS_10CollectionERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS0_11YieldPolicyE+0x2D8) [0x562cd5cb0b48]
 mongod(_ZN5mongo12PlanExecutor4makeEPNS_16OperationContextESt10unique_ptrINS_10WorkingSetESt14default_deleteIS4_EES3_INS_9PlanStageES5_IS8_EES3_INS_13QuerySolutionES5_ISB_EES3_INS_14CanonicalQueryES5_ISE_EEPKNS_10CollectionENS0_11YieldPolicyE+0xEC) [0x562cd5cb17fc]
 mongod(_ZN5mongo11getExecutorEPNS_16OperationContextEPNS_10CollectionESt10unique_ptrINS_14CanonicalQueryESt14default_deleteIS5_EENS_12PlanExecutor11YieldPolicyEm+0x132) [0x562cd5c6ac42]
 mongod(_ZN5mongo15getExecutorFindEPNS_16OperationContextEPNS_10CollectionERKNS_15NamespaceStringESt10unique_ptrINS_14CanonicalQueryESt14default_deleteIS8_EENS_12PlanExecutor11YieldPolicyE+0x8B) [0x562cd5c6b79b]
 mongod(_ZN5mongo7FindCmd3runEPNS_16OperationContextERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERNS_7BSONObjEiRS8_RNS_14BSONObjBuilderE+0xC90) [0x562cd58917a0]
 mongod(_ZN5mongo7Command3runEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS3_21ReplyBuilderInterfaceE+0x4FF) [0x562cd58689af]
 mongod(_ZN5mongo7Command11execCommandEPNS_16OperationContextEPS0_RKNS_3rpc16RequestInterfaceEPNS4_21ReplyBuilderInterfaceE+0xF6A) [0x562cd586a0aa]
 mongod(_ZN5mongo11runCommandsEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS2_21ReplyBuilderInterfaceE+0x240) [0x562cd5e85480]
 mongod(_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0xD30) [0x562cd5a89540]
 mongod(_ZN5mongo23ServiceEntryPointMongod12_sessionLoopERKSt10shared_ptrINS_9transport7SessionEE+0x1FD) [0x562cd568a97d]
 mongod(+0x88D2AD) [0x562cd568b2ad]
 mongod(+0x14E10D1) [0x562cd62df0d1]
 libpthread.so.0(+0x8064) [0x7f49ce031064]
 libc.so.6(clone+0x6D) [0x7f49cdd6662d]
----- END BACKTRACE -----
2018-07-17T15:57:17.978+0200 I - [thread1] pthread_create failed: Resource temporarily unavailable
2018-07-17T15:57:17.978+0200 I - [thread1] failed to create service entry worker thread for 10.3.9.1:47587

 
In syslog we get the next:

Jul 17 15:57:15 mmhad03b kernel: [78725.202597] TCP: TCP: Possible SYN flooding on port 27017. Sending cookies.  Check SNMP counters.
Jul 17 15:57:40 mmhad03b systemd[1]: mongod.service: main process exited, code=killed, status=6/ABRT
Jul 17 15:57:40 mmhad03b systemd[1]: Unit mongod.service entered failed state.

Randomly, we get this error aswell in syslog:

Jul 17 16:17:25 mmhad03b numactl[20402]: src/third_party/gperftools-2.5/src/central_freelist.cc:333] tcmalloc: allocation failed 8192
Jul 17 16:17:25 mmhad03b numactl[20402]: src/third_party/gperftools-2.5/src/central_freelist.cc:333] tcmalloc: allocation failed 8192
Jul 17 16:17:25 mmhad03b numactl[20402]: src/third_party/gperftools-2.5/src/central_freelist.cc:333] tcmalloc: allocation failed 12288
Jul 17 16:17:25 mmhad03b numactl[20402]: src/third_party/gperftools-2.5/src/central_freelist.cc:333] tcmalloc: allocation failed 8192
Jul 17 16:17:25 mmhad03b numactl[20402]: src/third_party/gperftools-2.5/src/central_freelist.cc:333] tcmalloc: allocation failed 12288
Jul 17 16:17:25 mmhad03b numactl[20402]: src/third_party/gperftools-2.5/src/central_freelist.cc:333] tcmalloc: allocation failed 8192
Jul 17 16:17:25 mmhad03b numactl[20402]: src/third_party/gperftools-2.5/src/central_freelist.cc:333] tcmalloc: allocation failed 8192

We have upgraded all server limits and applied them, but appears that where isn't any improvement.

Mongo version is 3.4.16 in sharding and also in mongos .
I'm attaching diagnostic data aswell.



 Comments   
Comment by Nick Brewer [ 20/Jul/18 ]

roberds Glad to hear that fixed it. 

-Nick

Comment by Roberto Rodriguez [ 20/Jul/18 ]

Hi again,

After adjusting the vm.max_map_count value, there haven't been more crashes. So I think that this was the problem.

Sorry because of the false bug and thanks.

Comment by Roberto Rodriguez [ 19/Jul/18 ]

Ok, I'll test it and tell you the result in a few hours.

Thanks

Comment by Nick Brewer [ 18/Jul/18 ]

roberds,

Could you try increasing the vm.max_map_count to at least 128000? Per the production checklist.

Thanks,

Nick

Comment by Roberto Rodriguez [ 18/Jul/18 ]

The problem continue, this is the last error:

2018-07-18T11:49:48.149+0200 I WRITE    [conn15873] update had.recentsearches query: { checkIn: "2018-07-23", geounitId: 33182, nights: 7, occupancy: "1::2-0/", customerNationality: 30115, rateType: "PUBL
IC", priceTypes: "-" } planSummary: IXSCAN { checkIn: 1, geoUnitId: 1, nights: 1, occupancy: 1, customerNationality: 1, rateType: 1, priceTypes: 1 } update: { $set: { checkIn: "2018-07-23", geounitId: 331
82, nights: 7, occupancy: "1::2-0/", customerNationality: 30115, rateType: "PUBLIC", priceTypes: "-" }, $setOnInsert: { expireAt: new Date(1531909175000), createdAt: new Date(1531907375000) } } keysExamin
ed:252 docsExamined:247 nMatched:1 nModified:0 numYields:2 locks:{ Global: { acquireCount: { r: 3, w: 3 } }, Database: { acquireCount: { w: 3 } }, Collection: { acquireCount: { w: 3 } } } 12746ms
2018-07-18T11:49:48.150+0200 I NETWORK  [thread1] connection accepted from 10.3.41.1:6103 #52708 (32626 connections now open)
2018-07-18T11:49:48.151+0200 I -        [thread1] pthread_create failed: Resource temporarily unavailable
2018-07-18T11:49:48.151+0200 I -        [thread1] failed to create service entry worker thread for 10.3.41.1:6103
2018-07-18T11:49:48.151+0200 I NETWORK  [thread1] connection accepted from 10.3.23.1:22450 #52709 (32626 connections now open)
2018-07-18T11:49:48.151+0200 I -        [thread1] pthread_create failed: Resource temporarily unavailable
2018-07-18T11:49:48.151+0200 I -        [thread1] failed to create service entry worker thread for 10.3.23.1:22450
2018-07-18T11:49:48.151+0200 I NETWORK  [thread1] connection accepted from 172.16.108.110:34375 #52710 (32626 connections now open)
2018-07-18T11:49:48.151+0200 I -        [thread1] pthread_create failed: Resource temporarily unavailable
2018-07-18T11:49:48.151+0200 I -        [thread1] failed to create service entry worker thread for 172.16.108.110:34375
2018-07-18T11:49:48.151+0200 I NETWORK  [thread1] connection accepted from 172.16.108.20:12675 #52711 (32626 connections now open)
2018-07-18T11:49:48.151+0200 I -        [thread1] pthread_create failed: Resource temporarily unavailable
2018-07-18T11:49:48.151+0200 I -        [thread1] failed to create service entry worker thread for 172.16.108.20:12675
2018-07-18T11:49:48.152+0200 F -        [conn22973] out of memory.
 
 0x55d22b7a8171 0x55d22b7a77a4 0x55d22b7137a1 0x55d22aab6bde 0x55d22b54c175 0x55d22ac97561 0x55d22ac990aa 0x55d22b2b4480 0x55d22aeb8540 0x55d22aab997d 0x55d22aaba2ad 0x55d22b70e0d1 0x7fbfda315064 0x7fbfda
04a62d
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"55D22A22D000","o":"157B171","s":"_ZN5mongo15printStackTraceERSo"},{"b":"55D22A22D000","o":"157A7A4","s":"_ZN5mongo29reportOutOfMemoryErrorAndExitEv"},{"b":"55D22A22D000","o":"14E67A1",
"s":"_ZN5mongo12mongoReallocEPvm"},{"b":"55D22A22D000","o":"889BDE","s":"_ZN5mongo11_BufBuilderINS_21SharedBufferAllocatorEE15grow_reallocateEi"},{"b":"55D22A22D000","o":"131F175","s":"_ZN5mongo3rpc19Comm
andReplyBuilder22getInPlaceReplyBuilderEm"},{"b":"55D22A22D000","o":"A6A561","s":"_ZN5mongo7Command3runEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS3_21ReplyBuilderInterfaceE"},{"b":"55D22A22D00
0","o":"A6C0AA","s":"_ZN5mongo7Command11execCommandEPNS_16OperationContextEPS0_RKNS_3rpc16RequestInterfaceEPNS4_21ReplyBuilderInterfaceE"},{"b":"55D22A22D000","o":"1087480","s":"_ZN5mongo11runCommandsEPNS
_16OperationContextERKNS_3rpc16RequestInterfaceEPNS2_21ReplyBuilderInterfaceE"},{"b":"55D22A22D000","o":"C8B540","s":"_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_
11HostAndPortE"},{"b":"55D22A22D000","o":"88C97D","s":"_ZN5mongo23ServiceEntryPointMongod12_sessionLoopERKSt10shared_ptrINS_9transport7SessionEE"},{"b":"55D22A22D000","o":"88D2AD"},{"b":"55D22A22D000","o"
:"14E10D1"},{"b":"7FBFDA30D000","o":"8064"},{"b":"7FBFD9F62000","o":"E862D","s":"clone"}],"processInfo":{ "mongodbVersion" : "3.4.16", "gitVersion" : "0d6a9242c11b99ddadcfb6e86a850b6ba487530a", "compiledM
odules" : [], "uname" : { "sysname" : "Linux", "release" : "3.16.0-6-amd64", "version" : "#1 SMP Debian 3.16.56-1+deb8u1 (2018-05-08)", "machine" : "x86_64" }, "somap" : [ { "b" : "55D22A22D000", "elfType
" : 3, "buildId" : "36452F27FE7A41D0E57DDE38A17B3FAE9980B0BE" }, { "b" : "7FFC78AD2000", "path" : "linux-vdso.so.1", "elfType" : 3, "buildId" : "90F495E259305E7C4F498541D91C9E1240057F52" }, { "b" : "7FBFD
B24A000", "path" : "/usr/lib/x86_64-linux-gnu/libssl.so.1.0.0", "elfType" : 3, "buildId" : "EDE40F0BC2115063088BF442E0F2ED84BF76B11E" }, { "b" : "7FBFDAE4D000", "path" : "/usr/lib/x86_64-linux-gnu/libcryp
to.so.1.0.0", "elfType" : 3, "buildId" : "0C9DA403601A5EEA627AF96E1EB63DD22B8DC28B" }, { "b" : "7FBFDAC45000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "A63C95FB33CCA970E141
D2E13774B997C1CF0565" }, { "b" : "7FBFDAA41000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "D70B531D672A34D71DB42EB32B68E63F2DCC5B6A" }, { "b" : "7FBFDA740000", "path" : "/li
b/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "152C93BA3E8590F7ED0BCDDF868600D55EC4DD6F" }, { "b" : "7FBFDA52A000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "
BAC839560495859598E8515CBAED73C7799AE1FF" }, { "b" : "7FBFDA30D000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "9DA9387A60FFC196AEDB9526275552AFEF499C44" }, { "b" : "7FB
FD9F62000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "48C48BC6ABB794461B8A558DD76B29876A0551F0" }, { "b" : "7FBFDB4AB000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" :
 3, "buildId" : "1D98D41FBB1EABA7EC05D0FD7624B85D6F51C03C" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x41) [0x55d22b7a8171]
 mongod(_ZN5mongo29reportOutOfMemoryErrorAndExitEv+0x84) [0x55d22b7a77a4]
 mongod(_ZN5mongo12mongoReallocEPvm+0x21) [0x55d22b7137a1]
 mongod(_ZN5mongo11_BufBuilderINS_21SharedBufferAllocatorEE15grow_reallocateEi+0x5E) [0x55d22aab6bde]
 mongod(_ZN5mongo3rpc19CommandReplyBuilder22getInPlaceReplyBuilderEm+0x35) [0x55d22b54c175]
 mongod(_ZN5mongo7Command3runEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS3_21ReplyBuilderInterfaceE+0xB1) [0x55d22ac97561]
 mongod(_ZN5mongo7Command11execCommandEPNS_16OperationContextEPS0_RKNS_3rpc16RequestInterfaceEPNS4_21ReplyBuilderInterfaceE+0xF6A) [0x55d22ac990aa]
 mongod(_ZN5mongo11runCommandsEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS2_21ReplyBuilderInterfaceE+0x240) [0x55d22b2b4480]
 mongod(_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0xD30) [0x55d22aeb8540]
 mongod(_ZN5mongo23ServiceEntryPointMongod12_sessionLoopERKSt10shared_ptrINS_9transport7SessionEE+0x1FD) [0x55d22aab997d]
 mongod(+0x88D2AD) [0x55d22aaba2ad]
 mongod(+0x14E10D1) [0x55d22b70e0d1]
 libpthread.so.0(+0x8064) [0x7fbfda315064]
 libc.so.6(clone+0x6D) [0x7fbfda04a62d]
-----  END BACKTRACE  -----

In syslog only this two lines:

Jul 18 11:50:04 mmhad03a systemd[1]: mongod.service: main process exited, code=exited, status=14/n/a
Jul 18 11:50:04 mmhad03a systemd[1]: Unit mongod.service entered failed state.

Comment by Roberto Rodriguez [ 18/Jul/18 ]

I didn't disabled zone_reclaim_mode, but I do it last night. And the server had another crash a few hours after I disabled it. Anyway, after this crash, I reboot the affected node, I'll keep you updated.

I can provide you all the information, what ever you need for this issue.

Thanks in advance

Comment by Nick Brewer [ 17/Jul/18 ]

roberds

Thanks for your detailed report. While we're looking into this, can you confirm that you've disabled zone reclaim via the steps in our documentation?

Thanks in advance, 

Nick

Generated at Thu Feb 08 04:42:13 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.