Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-36161

pthread_create failed: Resource temporarily unavailable in sharding cluster

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Sharding, WiredTiger
    • Labels:
      None
    • ALL

      Hi,

      We've a sharding, based on 8 servers, with 4 replicaset with this structure:

      • replicaset1: server01a / server01b
      • replicaset2: server02a / server02b
      • replicaset3: server03a / server03b
      • replicaset4: server04a / server04b

      The servers are physical servers, have SSD, 32 threads and 256Gb of RAM.

      The mongodb config on each node is similar to this one:

       

      storage:
       dbPath: /var/lib/mongodb
       journal:
       enabled: true
       wiredTiger:
       engineConfig:
       configString : "session_max=102400"
       cacheSizeGB: 200
      setParameter:
       cursorTimeoutMillis: 120000
      operationProfiling:
       mode: slowOp
       slowOpThresholdMs: 300
      systemLog:
       destination: file
       logAppend: true
       path: /var/log/mongodb/mongod.log
      net:
       port: 27017
       bindIp: 0.0.0.0
       maxIncomingConnections: 102400
      replication:
       replSetName: rsmmhad03
      sharding:
       clusterRole: shardsvr
       
      

      sysctl file:

       

      net.ipv4.ip_local_port_range = 1024 65535
      kernel.shmmax = 1073741824
      fs.file-max=5000000
      vm.swappiness = 1
      vm.dirty_ratio = 15
      vm.dirty_background_ratio = 5
      net.core.somaxconn = 4096
      net.ipv4.tcp_fin_timeout = 30
      net.ipv4.tcp_keepalive_intvl = 30
      net.ipv4.tcp_keepalive_time = 120
      net.ipv4.tcp_max_syn_backlog = 4096
      

       

      /etc/security/limits.d/mongod.conf

      mongod soft nproc 128000
      mongod hard nproc 128000
      mongod soft nofile 128000
      mongod hard nofile 128000

      /lib/systemd/system/mongod.service

       

      [Unit]
       Description=High-performance, schema-free document-oriented database
       After=network.target
       Documentation=https://docs.mongodb.org/manual
      [Service]
       User=mongodb
       Group=mongodb
       ExecStart=/usr/bin/numactl --interleave=all /usr/bin/mongod --config /etc/mongod.conf
       PIDFile=/var/run/mongodb/mongod.pid
      
      file size
       LimitFSIZE=infinity cpu time
       LimitCPU=infinity virtual memory size
       LimitAS=infinity open files
       LimitNOFILE=128000 processes/threads
       LimitNPROC=128000 locked memory
       LimitMEMLOCK=infinity total threads (user+kernel)
       TasksMax=infinity
       TasksAccounting=false   Recommended limits for for mongod as specified in http://docs.mongodb.org/manual/reference/ulimit/#recommended-settings
      [Install]
       WantedBy=multi-user.target
      

       

      The sharding have millions of documents, and millions of queries (more than 100.000.000 queries per day).

      The problem is that randomly, we receive an error like the next one:

       

      2018-07-17T15:57:17.978+0200 I - [thread1] pthread_create failed: Resource temporarily unavailable
      2018-07-17T15:57:17.978+0200 I - [thread1] failed to create service entry worker thread for 10.3.16.1:56153
      2018-07-17T15:57:17.978+0200 I COMMAND [conn16910] command had.hadCompressed command: find { find: "hadCompressed", filter: { chkin: "2018-08-10", n: 4, occ: "1::3-0/", nid: { $in: [ 0, 30115 ] }, rtype: { $in: [ 1, null ] }, hid: { $in: [ 435179, 231562, 38468, 330644, 307226, 359353, 352215, 88059, 321458, 307181, 85590, 87268, 385303, 252432, 242030, 231596, 307182, 172732, 577889, 38743, 38621, 199946, 435167, 149852, 244963, 391702, 260891, 150236, 307227, 307202, 38730, 156100, 297051, 257466, 498152, 174201, 174250, 577903, 424804, 435152, 197357, 242026, 385251, 205997, 330638, 154974, 37600, 38021, 160751, 435137, 86520, 37217, 363892, 375650, 244960, 252441, 261988, 432659, 609717, 156152, 363893, 149696, 149490, 232726, 87413, 252958, 315863, 219739, 231563, 388212, 412850, 501130, 388772, 231607, 369178, 164246, 38029, 330636, 260877, 38156, 236389, 38068, 257418, 282221, 307186, 299255, 199164, 231575, 88191, 199162, 80373, 200283, 246961, 195476, 424809, 286709, 193058, 208323, 435142, 318242 ] }, lchg: { $gte: new Date(1531749437000) } }, shardVersion: [ Timestamp 22129000|0, ObjectId('5af1c64abeee30df3be9f7db') ] } planSummary: IXSCAN { chkin: 1, n: 1, occ: 1, nid: 1, rtype: 1, hid: 1 } keysExamined:117 docsExamined:41 cursorExhausted:1 numYields:1 nreturned:0 reslen:202 locks:{ Global: { acquireCount: { r: 4 } }, Database: { acquireCount: { r: 2 } }, Collection: { acquireCount: { r: 2 } } } protocol:op_command 547ms
      2018-07-17T15:57:17.978+0200 I NETWORK [thread1] connection accepted from 10.3.102.1:53260 #42127 (32627 connections now open)
      2018-07-17T15:57:17.978+0200 I - [thread1] pthread_create failed: Resource temporarily unavailable
      2018-07-17T15:57:17.978+0200 I - [thread1] failed to create service entry worker thread for 10.3.102.1:53260
      2018-07-17T15:57:17.978+0200 I NETWORK [thread1] connection accepted from 10.3.9.1:47587 #42128 (32627 connections now open)
      2018-07-17T15:57:17.978+0200 F - [conn14595] Got signal: 6 (Aborted).
      0x562cd6379171 0x562cd6378389 0x562cd637886d 0x7f49ce038890 0x7f49cdcb3067 0x7f49cdcb4448 0x562cd561a341 0x562cd607e01b 0x562cd607edf0 0x562cd607b18d 0x562cd607bccd 0x562cd607bf30 0x562cd6056ef7 0x562cd5a64478 0x562cd5994b68 0x562cd599508f 0x562cd59a55c3 0x562cd5983d0e 0x562cd59a55c3 0x562cd59b56e7 0x562cd59a55c3 0x562cd5977338 0x562cd5cae7a2 0x562cd5cb0b48 0x562cd5cb17fc 0x562cd5c6ac42 0x562cd5c6b79b 0x562cd58917a0 0x562cd58689af 0x562cd586a0aa 0x562cd5e85480 0x562cd5a89540 0x562cd568a97d 0x562cd568b2ad 0x562cd62df0d1 0x7f49ce031064 0x7f49cdd6662d
      ----- BEGIN BACKTRACE -----
      {"backtrace":[{"b":"562CD4DFE000","o":"157B171","s":"_ZN5mongo15printStackTraceERSo"},{"b":"562CD4DFE000","o":"157A389"},{"b":"562CD4DFE000","o":"157A86D"},{"b":"7F49CE029000","o":"F890"},{"b":"7F49CDC7E000","o":"35067","s":"gsignal"},{"b":"7F49CDC7E000","o":"36448","s":"abort"},{"b":"562CD4DFE000","o":"81C341","s":"_ZN5mongo25fassertFailedWithLocationEiPKcj"},{"b":"562CD4DFE000","o":"128001B","s":"_ZN5mongo17WiredTigerSessionC1EP15__wt_connectionPNS_22WiredTigerSessionCacheEmm"},{"b":"562CD4DFE000","o":"1280DF0","s":"_ZN5mongo22WiredTigerSessionCache10getSessionEv"},{"b":"562CD4DFE000","o":"127D18D"},{"b":"562CD4DFE000","o":"127DCCD","s":"_ZN5mongo22WiredTigerRecoveryUnit8_txnOpenEPNS_16OperationContextE"},{"b":"562CD4DFE000","o":"127DF30","s":"_ZN5mongo16WiredTigerCursorC1ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEmbPNS_16OperationContextE"},{"b":"562CD4DFE000","o":"1258EF7","s":"_ZNK5mongo23WiredTigerIndexStandard9newCursorEPNS_16OperationContextEb"},{"b":"562CD4DFE000","o":"C66478","s":"_ZNK5mongo17IndexAccessMethod9newCursorEPNS_16OperationContextEb"},{"b":"562CD4DFE000","o":"B96B68","s":"_ZN5mongo9IndexScan13initIndexScanEv"},{"b":"562CD4DFE000","o":"B9708F","s":"_ZN5mongo9IndexScan6doWorkEPm"},{"b":"562CD4DFE000","o":"BA75C3","s":"_ZN5mongo9PlanStage4workEPm"},{"b":"562CD4DFE000","o":"B85D0E","s":"_ZN5mongo10FetchStage6doWorkEPm"},{"b":"562CD4DFE000","o":"BA75C3","s":"_ZN5mongo9PlanStage4workEPm"},{"b":"562CD4DFE000","o":"BB76E7","s":"_ZN5mongo16ShardFilterStage6doWorkEPm"},{"b":"562CD4DFE000","o":"BA75C3","s":"_ZN5mongo9PlanStage4workEPm"},{"b":"562CD4DFE000","o":"B79338","s":"_ZN5mongo15CachedPlanStage12pickBestPlanEPNS_15PlanYieldPolicyE"},{"b":"562CD4DFE000","o":"EB07A2","s":"_ZN5mongo12PlanExecutor12pickBestPlanENS0_11YieldPolicyEPKNS_10CollectionE"},{"b":"562CD4DFE000","o":"EB2B48","s":"_ZN5mongo12PlanExecutor4makeEPNS_16OperationContextESt10unique_ptrINS_10WorkingSetESt14default_deleteIS4_EES3_INS_9PlanStageES5_IS8_EES3_INS_13QuerySolutionES5_ISB_EES3_INS_14CanonicalQueryES5_ISE_EEPKNS_10CollectionERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS0_11YieldPolicyE"},{"b":"562CD4DFE000","o":"EB37FC","s":"_ZN5mongo12PlanExecutor4makeEPNS_16OperationContextESt10unique_ptrINS_10WorkingSetESt14default_deleteIS4_EES3_INS_9PlanStageES5_IS8_EES3_INS_13QuerySolutionES5_ISB_EES3_INS_14CanonicalQueryES5_ISE_EEPKNS_10CollectionENS0_11YieldPolicyE"},{"b":"562CD4DFE000","o":"E6CC42","s":"_ZN5mongo11getExecutorEPNS_16OperationContextEPNS_10CollectionESt10unique_ptrINS_14CanonicalQueryESt14default_deleteIS5_EENS_12PlanExecutor11YieldPolicyEm"},{"b":"562CD4DFE000","o":"E6D79B","s":"_ZN5mongo15getExecutorFindEPNS_16OperationContextEPNS_10CollectionERKNS_15NamespaceStringESt10unique_ptrINS_14CanonicalQueryESt14default_deleteIS8_EENS_12PlanExecutor11YieldPolicyE"},{"b":"562CD4DFE000","o":"A937A0","s":"_ZN5mongo7FindCmd3runEPNS_16OperationContextERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERNS_7BSONObjEiRS8_RNS_14BSONObjBuilderE"},{"b":"562CD4DFE000","o":"A6A9AF","s":"_ZN5mongo7Command3runEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS3_21ReplyBuilderInterfaceE"},{"b":"562CD4DFE000","o":"A6C0AA","s":"_ZN5mongo7Command11execCommandEPNS_16OperationContextEPS0_RKNS_3rpc16RequestInterfaceEPNS4_21ReplyBuilderInterfaceE"},{"b":"562CD4DFE000","o":"1087480","s":"_ZN5mongo11runCommandsEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS2_21ReplyBuilderInterfaceE"},{"b":"562CD4DFE000","o":"C8B540","s":"_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE"},{"b":"562CD4DFE000","o":"88C97D","s":"_ZN5mongo23ServiceEntryPointMongod12_sessionLoopERKSt10shared_ptrINS_9transport7SessionEE"},{"b":"562CD4DFE000","o":"88D2AD"},{"b":"562CD4DFE000","o":"14E10D1"},{"b":"7F49CE029000","o":"8064"},{"b":"7F49CDC7E000","o":"E862D","s":"clone"}],"processInfo":{ "mongodbVersion" : "3.4.16", "gitVersion" : "0d6a9242c11b99ddadcfb6e86a850b6ba487530a", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "3.16.0-6-amd64", "version" : "#1 SMP Debian 3.16.56-1+deb8u1 (2018-05-08)", "machine" : "x86_64" }, "somap" : [ { "b" : "562CD4DFE000", "elfType" : 3, "buildId" : "36452F27FE7A41D0E57DDE38A17B3FAE9980B0BE" }, { "b" : "7FFD853E8000", "path" : "linux-vdso.so.1", "elfType" : 3, "buildId" : "90F495E259305E7C4F498541D91C9E1240057F52" }, { "b" : "7F49CEF66000", "path" : "/usr/lib/x86_64-linux-gnu/libssl.so.1.0.0", "elfType" : 3, "buildId" : "EDE40F0BC2115063088BF442E0F2ED84BF76B11E" }, { "b" : "7F49CEB69000", "path" : "/usr/lib/x86_64-linux-gnu/libcrypto.so.1.0.0", "elfType" : 3, "buildId" : "0C9DA403601A5EEA627AF96E1EB63DD22B8DC28B" }, { "b" : "7F49CE961000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "A63C95FB33CCA970E141D2E13774B997C1CF0565" }, { "b" : "7F49CE75D000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "D70B531D672A34D71DB42EB32B68E63F2DCC5B6A" }, { "b" : "7F49CE45C000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "152C93BA3E8590F7ED0BCDDF868600D55EC4DD6F" }, { "b" : "7F49CE246000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "BAC839560495859598E8515CBAED73C7799AE1FF" }, { "b" : "7F49CE029000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "9DA9387A60FFC196AEDB9526275552AFEF499C44" }, { "b" : "7F49CDC7E000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "48C48BC6ABB794461B8A558DD76B29876A0551F0" }, { "b" : "7F49CF1C7000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "1D98D41FBB1EABA7EC05D0FD7624B85D6F51C03C" } ] }}
       mongod(_ZN5mongo15printStackTraceERSo+0x41) [0x562cd6379171]
       mongod(+0x157A389) [0x562cd6378389]
       mongod(+0x157A86D) [0x562cd637886d]
       libpthread.so.0(+0xF890) [0x7f49ce038890]
       libc.so.6(gsignal+0x37) [0x7f49cdcb3067]
       libc.so.6(abort+0x148) [0x7f49cdcb4448]
       mongod(_ZN5mongo25fassertFailedWithLocationEiPKcj+0x0) [0x562cd561a341]
       mongod(_ZN5mongo17WiredTigerSessionC1EP15__wt_connectionPNS_22WiredTigerSessionCacheEmm+0xBB) [0x562cd607e01b]
       mongod(_ZN5mongo22WiredTigerSessionCache10getSessionEv+0xE0) [0x562cd607edf0]
       mongod(+0x127D18D) [0x562cd607b18d]
       mongod(_ZN5mongo22WiredTigerRecoveryUnit8_txnOpenEPNS_16OperationContextE+0x19D) [0x562cd607bccd]
       mongod(_ZN5mongo16WiredTigerCursorC1ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEmbPNS_16OperationContextE+0x90) [0x562cd607bf30]
       mongod(_ZNK5mongo23WiredTigerIndexStandard9newCursorEPNS_16OperationContextEb+0x157) [0x562cd6056ef7]
       mongod(_ZNK5mongo17IndexAccessMethod9newCursorEPNS_16OperationContextEb+0x28) [0x562cd5a64478]
       mongod(_ZN5mongo9IndexScan13initIndexScanEv+0x58) [0x562cd5994b68]
       mongod(_ZN5mongo9IndexScan6doWorkEPm+0x14F) [0x562cd599508f]
       mongod(_ZN5mongo9PlanStage4workEPm+0x63) [0x562cd59a55c3]
       mongod(_ZN5mongo10FetchStage6doWorkEPm+0x29E) [0x562cd5983d0e]
       mongod(_ZN5mongo9PlanStage4workEPm+0x63) [0x562cd59a55c3]
       mongod(_ZN5mongo16ShardFilterStage6doWorkEPm+0x77) [0x562cd59b56e7]
       mongod(_ZN5mongo9PlanStage4workEPm+0x63) [0x562cd59a55c3]
       mongod(_ZN5mongo15CachedPlanStage12pickBestPlanEPNS_15PlanYieldPolicyE+0x198) [0x562cd5977338]
       mongod(_ZN5mongo12PlanExecutor12pickBestPlanENS0_11YieldPolicyEPKNS_10CollectionE+0xF2) [0x562cd5cae7a2]
       mongod(_ZN5mongo12PlanExecutor4makeEPNS_16OperationContextESt10unique_ptrINS_10WorkingSetESt14default_deleteIS4_EES3_INS_9PlanStageES5_IS8_EES3_INS_13QuerySolutionES5_ISB_EES3_INS_14CanonicalQueryES5_ISE_EEPKNS_10CollectionERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS0_11YieldPolicyE+0x2D8) [0x562cd5cb0b48]
       mongod(_ZN5mongo12PlanExecutor4makeEPNS_16OperationContextESt10unique_ptrINS_10WorkingSetESt14default_deleteIS4_EES3_INS_9PlanStageES5_IS8_EES3_INS_13QuerySolutionES5_ISB_EES3_INS_14CanonicalQueryES5_ISE_EEPKNS_10CollectionENS0_11YieldPolicyE+0xEC) [0x562cd5cb17fc]
       mongod(_ZN5mongo11getExecutorEPNS_16OperationContextEPNS_10CollectionESt10unique_ptrINS_14CanonicalQueryESt14default_deleteIS5_EENS_12PlanExecutor11YieldPolicyEm+0x132) [0x562cd5c6ac42]
       mongod(_ZN5mongo15getExecutorFindEPNS_16OperationContextEPNS_10CollectionERKNS_15NamespaceStringESt10unique_ptrINS_14CanonicalQueryESt14default_deleteIS8_EENS_12PlanExecutor11YieldPolicyE+0x8B) [0x562cd5c6b79b]
       mongod(_ZN5mongo7FindCmd3runEPNS_16OperationContextERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERNS_7BSONObjEiRS8_RNS_14BSONObjBuilderE+0xC90) [0x562cd58917a0]
       mongod(_ZN5mongo7Command3runEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS3_21ReplyBuilderInterfaceE+0x4FF) [0x562cd58689af]
       mongod(_ZN5mongo7Command11execCommandEPNS_16OperationContextEPS0_RKNS_3rpc16RequestInterfaceEPNS4_21ReplyBuilderInterfaceE+0xF6A) [0x562cd586a0aa]
       mongod(_ZN5mongo11runCommandsEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS2_21ReplyBuilderInterfaceE+0x240) [0x562cd5e85480]
       mongod(_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0xD30) [0x562cd5a89540]
       mongod(_ZN5mongo23ServiceEntryPointMongod12_sessionLoopERKSt10shared_ptrINS_9transport7SessionEE+0x1FD) [0x562cd568a97d]
       mongod(+0x88D2AD) [0x562cd568b2ad]
       mongod(+0x14E10D1) [0x562cd62df0d1]
       libpthread.so.0(+0x8064) [0x7f49ce031064]
       libc.so.6(clone+0x6D) [0x7f49cdd6662d]
      ----- END BACKTRACE -----
      2018-07-17T15:57:17.978+0200 I - [thread1] pthread_create failed: Resource temporarily unavailable
      2018-07-17T15:57:17.978+0200 I - [thread1] failed to create service entry worker thread for 10.3.9.1:47587
      

       
      In syslog we get the next:

      Jul 17 15:57:15 mmhad03b kernel: [78725.202597] TCP: TCP: Possible SYN flooding on port 27017. Sending cookies.  Check SNMP counters.
      Jul 17 15:57:40 mmhad03b systemd[1]: mongod.service: main process exited, code=killed, status=6/ABRT
      Jul 17 15:57:40 mmhad03b systemd[1]: Unit mongod.service entered failed state.
      

      Randomly, we get this error aswell in syslog:

      Jul 17 16:17:25 mmhad03b numactl[20402]: src/third_party/gperftools-2.5/src/central_freelist.cc:333] tcmalloc: allocation failed 8192
      Jul 17 16:17:25 mmhad03b numactl[20402]: src/third_party/gperftools-2.5/src/central_freelist.cc:333] tcmalloc: allocation failed 8192
      Jul 17 16:17:25 mmhad03b numactl[20402]: src/third_party/gperftools-2.5/src/central_freelist.cc:333] tcmalloc: allocation failed 12288
      Jul 17 16:17:25 mmhad03b numactl[20402]: src/third_party/gperftools-2.5/src/central_freelist.cc:333] tcmalloc: allocation failed 8192
      Jul 17 16:17:25 mmhad03b numactl[20402]: src/third_party/gperftools-2.5/src/central_freelist.cc:333] tcmalloc: allocation failed 12288
      Jul 17 16:17:25 mmhad03b numactl[20402]: src/third_party/gperftools-2.5/src/central_freelist.cc:333] tcmalloc: allocation failed 8192
      Jul 17 16:17:25 mmhad03b numactl[20402]: src/third_party/gperftools-2.5/src/central_freelist.cc:333] tcmalloc: allocation failed 8192
      
      

      We have upgraded all server limits and applied them, but appears that where isn't any improvement.

      Mongo version is 3.4.16 in sharding and also in mongos .
      I'm attaching diagnostic data aswell.

            Assignee:
            nick.brewer Nick Brewer
            Reporter:
            roberds Roberto Rodriguez
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: