-
Type: Bug
-
Resolution: Done
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
ALL
-
(copied to CRM)
Hi Team,
Since migrating to MongoDB 4.4, we have been encountering a recurring issue of client queries timing out frequently. Upon monitoring the CPU usage during these occurrences, we have noticed that mongod's CPU usage is unusually high, despite there not being a significant increase in operations.
This erratic CPU usage has been observed intermittently, and our application has a monitoring thread that checks the replicStatus and reports NO_PRIMARY during such times.
Mongo- Server Version: 4.4.18
Mongo Client Java Driver version: 3.12.9
Standalone or Replica : 4 data bearing + 1 arbiter
Server Option:
root 52039 161 2.5 12649240 7229872 ? S<Ll Apr06 4415:45 /usr/bin/mongod --ipv6 --slowms 500 --storageEngine wiredTiger --wiredTigerCacheSizeGB 22 --enableMajorityReadConcern false --bind_ip_all --keyFile /root/.dbkey --port 27717 --dbpath=/var/data/sessions.1/h --replSet set01h --fork --pidfilepath /var/run/sessionmgr-27717.pid --oplogSize 5120 --logpath /var/log/mongodb-27717.log --logappend --quiet
Storage Engine : WT
Application Monitors the replicaStatus periodically reporting, TYPE=UNKNOWN for PRIMARY intermittently.
2023-04-07 23:02:30,944 [pool-2-thread-1] replicaset status: ReplicaSetStatus{name=set01h, cluster=ClusterDescription{type=REPLICA_SET, connectionMode=MULTIPLE, serverDescriptions=[ServerDescription{address=nd2bwa3psm16vb:27717, type=REPLICA_SET_SECONDARY, state=CONNECTED, ok=true, version=ServerVersion
{versionList=[4, 4, 18]}, minWireVersion=0, maxWireVersion=9, maxDocumentSize=16777216, logicalSessionTimeoutMinutes=30, roundTripTimeNanos=795436, setName='set01h', canonicalAddress=nd2bwa3psm16vb:27717, hosts=[nd2bwa3psm16va:27717, nd2bwa3psm15va:27717, nd2bwa3psm16vb:27717, nd2bwa3psm15vb:27717], passives=[], arbiters=[], primary='null', tagSet=TagSet{[]}, electionId=null, setVersion=27149688, lastWriteDate=Fri Apr 07 23:02:30 UTC 2023, lastUpdateTimeNanos=116414267859844}, ServerDescription{address=nd2bwa3psm15va:27717, type=UNKNOWN, state=CONNECTING}, ServerDescription{address=nd2bwa3psm16va:27717, type=REPLICA_SET_SECONDARY, state=CONNECTED, ok=true, version=ServerVersion{versionList=[4, 4, 18]}, minWireVersion=0, maxWireVersion=9, maxDocumentSize=16777216, logicalSessionTimeoutMinutes=30, roundTripTimeNanos=1152849, setName='set01h', canonicalAddress=nd2bwa3psm16va:27717, hosts=[nd2bwa3psm16va:27717, nd2bwa3psm15va:27717, nd2bwa3psm16vb:27717, nd2bwa3psm15vb:27717], passives=[], arbiters=[], primary='nd2bwa3psm15va:27717', tagSet=TagSet{[]}, electionId=null, setVersion=27149688, lastWriteDate=Fri Apr 07 23:02:30 UTC 2023, lastUpdateTimeNanos=116414649210810}, ServerDescription{address=nd2bwa3psm15vb:27717, type=REPLICA_SET_SECONDARY, state=CONNECTED, ok=true, version=ServerVersion
{versionList=[4, 4, 18]}, minWireVersion=0, maxWireVersion=9, maxDocumentSize=16777216, logicalSessionTimeoutMinutes=30, roundTripTimeNanos=943462, setName='set01h', canonicalAddress=nd2bwa3psm15vb:27717, hosts=[nd2bwa3psm16va:27717, nd2bwa3psm15va:27717, nd2bwa3psm16vb:27717, nd2bwa3psm15vb:27717], passives=[], arbiters=[], primary='nd2bwa3psm15va:27717', tagSet=TagSet{[]}, electionId=null, setVersion=27149688, lastWriteDate=Fri Apr 07 23:02:30 UTC 2023, lastUpdateTimeNanos=116414619113856}]}}Immediately in the next sec, the replicaSet detects PRIMARY
2023-04-07 23:02:31,260 [pool-2-thread-1] replicaset status: ReplicaSetStatus{name=set01h, cluster=ClusterDescription{type=REPLICA_SET, connectionMode=MULTIPLE, serverDescriptions=[ServerDescription{address=nd2bwa3psm16vb:27717, type=REPLICA_SET_SECONDARY, state=CONNECTED, ok=true, version=ServerVersion{versionList=[4, 4, 18]}
, minWireVersion=0, maxWireVersion=9, maxDocumentSize=16777216, logicalSessionTimeoutMinutes=30, roundTripTimeNanos=772997, setName='set01h', canonicalAddress=nd2bwa3psm16vb:27717, hosts=[nd2bwa3psm16va:27717, nd2bwa3psm15va:27717, nd2bwa3psm16vb:27717, nd2bwa3psm15vb:27717], passives=[], arbiters=[], primary='null', tagSet=TagSet{[]}, electionId=null, setVersion=27149688, lastWriteDate=Fri Apr 07 23:02:31 UTC 2023, lastUpdateTimeNanos=116414767866623}, ServerDescription{address=nd2bwa3psm15va:27717, type=REPLICA_SET_PRIMARY, state=CONNECTED, ok=true, version=ServerVersion{versionList=[4, 4, 18]}, minWireVersion=0, maxWireVersion=9, maxDocumentSize=16777216, logicalSessionTimeoutMinutes=30, roundTripTimeNanos=16803136, setName='set01h', canonicalAddress=nd2bwa3psm15va:27717, hosts=[nd2bwa3psm16va:27717, nd2bwa3psm15va:27717, nd2bwa3psm16vb:27717, nd2bwa3psm15vb:27717], passives=[], arbiters=[], primary='nd2bwa3psm15va:27717', tagSet=TagSet{[]}, electionId=7fffffff0000000000000080, setVersion=27149688, lastWriteDate=Fri Apr 07 23:02:31 UTC 2023, lastUpdateTimeNanos=116414870305431}, ServerDescription{address=nd2bwa3psm16va:27717, type=REPLICA_SET_SECONDARY, state=CONNECTED, ok=true, version=ServerVersion
{versionList=[4, 4, 18]}, minWireVersion=0, maxWireVersion=9, maxDocumentSize=16777216, logicalSessionTimeoutMinutes=30, roundTripTimeNanos=1152849, setName='set01h', canonicalAddress=nd2bwa3psm16va:27717, hosts=[nd2bwa3psm16va:27717, nd2bwa3psm15va:27717, nd2bwa3psm16vb:27717, nd2bwa3psm15vb:27717], passives=[], arbiters=[], primary='nd2bwa3psm15va:27717', tagSet=TagSet{[]}, electionId=null, setVersion=27149688, lastWriteDate=Fri Apr 07 23:02:30 UTC 2023, lastUpdateTimeNanos=116414649210810}, ServerDescription{address=nd2bwa3psm15vb:27717, type=REPLICA_SET_SECONDARY, state=CONNECTED, ok=true, version=ServerVersion{versionList=[4, 4, 18]}, minWireVersion=0, maxWireVersion=9, maxDocumentSize=16777216, logicalSessionTimeoutMinutes=30, roundTripTimeNanos=943462, setName='set01h', canonicalAddress=nd2bwa3psm15vb:27717, hosts=[nd2bwa3psm16va:27717, nd2bwa3psm15va:27717, nd2bwa3psm16vb:27717, nd2bwa3psm15vb:27717], passives=[], arbiters=[], primary='nd2bwa3psm15va:27717', tagSet=TagSet{[]}, electionId=null, setVersion=27149688, lastWriteDate=Fri Apr 07 23:02:30 UTC 2023, lastUpdateTimeNanos=116414619113856}]}}
But again moves to UKNOWN and later to a proper state
2023-04-07 23:02:31,544 [pool-2-thread-1] replicaset status: ReplicaSetStatus{name=set01h, cluster=ClusterDescription{type=REPLICA_SET, connectionMode=MULTIPLE, serverDescriptions=[ServerDescription{address=nd2bwa3psm16vb:27717, type=REPLICA_SET_SECONDARY, state=CONNECTED, ok=true, version=ServerVersion
{versionList=[4, 4, 18]}, minWireVersion=0, maxWireVersion=9, maxDocumentSize=16777216, logicalSessionTimeoutMinutes=30, roundTripTimeNanos=729934, setName='set01h', canonicalAddress=nd2bwa3psm16vb:27717, hosts=[nd2bwa3psm16va:27717, nd2bwa3psm15va:27717, nd2bwa3psm16vb:27717, nd2bwa3psm15vb:27717], passives=[], arbiters=[], primary='null', tagSet=TagSet{[]}, electionId=null, setVersion=27149688, lastWriteDate=Fri Apr 07 23:02:31 UTC 2023, lastUpdateTimeNanos=116415267757469}, ServerDescription{address=nd2bwa3psm15va:27717, type=UNKNOWN, state=CONNECTING}, ServerDescription{address=nd2bwa3psm16va:27717, type=REPLICA_SET_SECONDARY, state=CONNECTED, ok=true, version=ServerVersion{versionList=[4, 4, 18]}, minWireVersion=0, maxWireVersion=9, maxDocumentSize=16777216, logicalSessionTimeoutMinutes=30, roundTripTimeNanos=1059673, setName='set01h', canonicalAddress=nd2bwa3psm16va:27717, hosts=[nd2bwa3psm16va:27717, nd2bwa3psm15va:27717, nd2bwa3psm16vb:27717, nd2bwa3psm15vb:27717], passives=[], arbiters=[], primary='nd2bwa3psm15va:27717', tagSet=TagSet{[]}, electionId=null, setVersion=27149688, lastWriteDate=Fri Apr 07 23:02:31 UTC 2023, lastUpdateTimeNanos=116415149889939}, ServerDescription{address=nd2bwa3psm15vb:27717, type=REPLICA_SET_SECONDARY, state=CONNECTED, ok=true, version=ServerVersion
{versionList=[4, 4, 18]}, minWireVersion=0, maxWireVersion=9, maxDocumentSize=16777216, logicalSessionTimeoutMinutes=30, roundTripTimeNanos=941532, setName='set01h', canonicalAddress=nd2bwa3psm15vb:27717, hosts=[nd2bwa3psm16va:27717, nd2bwa3psm15va:27717, nd2bwa3psm16vb:27717, nd2bwa3psm15vb:27717], passives=[], arbiters=[], primary='nd2bwa3psm15va:27717', tagSet=TagSet{[]}, electionId=null, setVersion=27149688, lastWriteDate=Fri Apr 07 23:02:31 UTC 2023, lastUpdateTimeNanos=116415120136591}]}}2023-04-07 23:02:31,764 [pool-3-thread-1] replicaset status: ReplicaSetStatus{name=set01h, cluster=ClusterDescription{type=REPLICA_SET, connectionMode=MULTIPLE, serverDescriptions=[ServerDescription{address=nd2bwa3psm16vb:27717, type=REPLICA_SET_SECONDARY, state=CONNECTED, ok=true, version=ServerVersion{versionList=[4, 4, 18]}
, minWireVersion=0, maxWireVersion=9, maxDocumentSize=16777216, logicalSessionTimeoutMinutes=30, roundTripTimeNanos=729934, setName='set01h', canonicalAddress=nd2bwa3psm16vb:27717, hosts=[nd2bwa3psm16va:27717, nd2bwa3psm15va:27717, nd2bwa3psm16vb:27717, nd2bwa3psm15vb:27717], passives=[], arbiters=[], primary='null', tagSet=TagSet{[]}, electionId=null, setVersion=27149688, lastWriteDate=Fri Apr 07 23:02:31 UTC 2023, lastUpdateTimeNanos=116415267757469}, ServerDescription{address=nd2bwa3psm15va:27717, type=REPLICA_SET_PRIMARY, state=CONNECTED, ok=true, version=ServerVersion{versionList=[4, 4, 18]}, minWireVersion=0, maxWireVersion=9, maxDocumentSize=16777216, logicalSessionTimeoutMinutes=30, roundTripTimeNanos=13549634, setName='set01h', canonicalAddress=nd2bwa3psm15va:27717, hosts=[nd2bwa3psm16va:27717, nd2bwa3psm15va:27717, nd2bwa3psm16vb:27717, nd2bwa3psm15vb:27717], passives=[], arbiters=[], primary='nd2bwa3psm15va:27717', tagSet=TagSet{[]}, electionId=7fffffff0000000000000080, setVersion=27149688, lastWriteDate=Fri Apr 07 23:02:31 UTC 2023, lastUpdateTimeNanos=116415370985974}, ServerDescription{address=nd2bwa3psm16va:27717, type=REPLICA_SET_SECONDARY, state=CONNECTED, ok=true, version=ServerVersion
{versionList=[4, 4, 18]}, minWireVersion=0, maxWireVersion=9, maxDocumentSize=16777216, logicalSessionTimeoutMinutes=30, roundTripTimeNanos=1059673, setName='set01h', canonicalAddress=nd2bwa3psm16va:27717, hosts=[nd2bwa3psm16va:27717, nd2bwa3psm15va:27717, nd2bwa3psm16vb:27717, nd2bwa3psm15vb:27717], passives=[], arbiters=[], primary='nd2bwa3psm15va:27717', tagSet=TagSet{[]}, electionId=null, setVersion=27149688, lastWriteDate=Fri Apr 07 23:02:31 UTC 2023, lastUpdateTimeNanos=116415149889939}, ServerDescription{address=nd2bwa3psm15vb:27717, type=REPLICA_SET_SECONDARY, state=CONNECTED, ok=true, version=ServerVersion{versionList=[4, 4, 18]}, minWireVersion=0, maxWireVersion=9, maxDocumentSize=16777216, logicalSessionTimeoutMinutes=30, roundTripTimeNanos=941532, setName='set01h', canonicalAddress=nd2bwa3psm15vb:27717, hosts=[nd2bwa3psm16va:27717, nd2bwa3psm15va:27717, nd2bwa3psm16vb:27717, nd2bwa3psm15vb:27717], passives=[], arbiters=[], primary='nd2bwa3psm15va:27717', tagSet=TagSet{[]}, electionId=null, setVersion=27149688, lastWriteDate=Fri Apr 07 23:02:31 UTC 2023, lastUpdateTimeNanos=116415120136591}]}}
top - 23:02:27 up 1 day, 7:17, 0 users, load average: 21.07, 23.62, 26.89
Tasks: 303 total, 1 running, 301 sleeping, 0 stopped, 1 zombie
%Cpu(s): 56.5 us, 15.2 sy, 0.0 ni, 22.7 id, 0.0 wa, 2.1 hi, 3.5 si, 0.0 st
MiB Mem : 273809.9 total, 204977.1 free, 30793.4 used, 38039.4 buff/cache
MiB Swap: 4096.0 total, 4096.0 free, 0.0 used. 205453.0 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
52039 root 5 -15 12.0g 7.2g 36920 S 237.7 2.7 2977:27 mongod
–
top - 23:02:30 up 1 day, 7:17, 0 users, load average: 21.07, 23.62, 26.89
Tasks: 301 total, 2 running, 299 sleeping, 0 stopped, 0 zombie
%Cpu(s): 87.8 us, 4.8 sy, 0.0 ni, 5.3 id, 0.0 wa, 0.9 hi, 1.2 si, 0.0 st
MiB Mem : 273809.9 total, 204977.1 free, 30793.4 used, 38039.4 buff/cache
MiB Swap: 4096.0 total, 4096.0 free, 0.0 used. 205453.0 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
52039 root 5 -15 12.1g 7.3g 36920 S 999.9 2.7 2977:48 mongod
–
PID : 52039 mongostats
insert query update delete getmore command dirty used flushes vsize res qrw arw net_in net_out conn set repl time
65 1449 969 68 701 821|0 1.9% 30.2% 0 12.0G 7.19G 0|1 1|1 5.79m 14.8m 741 set01h PRI Apr 7 23:02:21.449
71 1572 1063 76 870 959|0 1.7% 30.2% 0 12.0G 7.20G 0|0 2|0 6.45m 16.2m 744 set01h PRI Apr 7 23:02:22.426
69 1581 1064 74 841 988|0 1.8% 30.2% 0 12.0G 7.20G 0|0 1|0 6.53m 16.2m 744 set01h PRI Apr 7 23:02:23.423
88 1602 1090 77 793 870|0 1.8% 30.3% 0 12.0G 7.20G 0|0 1|1 6.55m 16.5m 741 set01h PRI Apr 7 23:02:24.431
67 1507 1026 78 954 1013|0 1.9% 30.1% 0 12.0G 7.20G 0|0 1|1 6.44m 15.8m 741 set01h PRI Apr 7 23:02:25.427
59 1601 1122 86 778 817|0 1.9% 30.1% 0 12.0G 7.19G 0|1 1|0 6.48m 16.6m 741 set01h PRI Apr 7 23:02:26.419
81 1609 1068 73 779 908|0 2.0% 30.2% 0 12.0G 7.19G 0|1 1|0 6.46m 16.3m 741 set01h PRI Apr 7 23:02:27.435
15 417 189 32 225 364|0 1.9% 30.1% 0 11.9G 7.31G 0|0 2|8 1.44m 2.93m 612 set01h PRI Apr 7 23:02:30.352
*0 184 *0 9 18 2643|0 1.9% 30.2% 0 11.8G 7.32G 0|0 1|1 318k 2.89m 413 set01h PRI Apr 7 23:02:30.564
140 1499 809 94 364 6105|0 2.0% 30.2% 0 11.6G 7.31G 0|0 1|5 5.90m 17.7m 292 set01h PRI Apr 7 23:02:31.422
insert query update delete getmore command dirty used flushes vsize res qrw arw net_in net_out conn set repl time
The response time goes to minutes
ns total read write 2023-04-07T23:02:27Z
session_cache_2.session 321ms 86ms 235ms
session_cache.session 289ms 63ms 225ms
local.oplog.rs 274ms 274ms 0ms
session_cache_4.session 251ms 67ms 184ms
session_cache_3.session 223ms 58ms 165ms
admin.system.keys 0ms 0ms 0ms
admin.system.roles 0ms 0ms 0ms
admin.system.users 0ms 0ms 0ms
admin.system.version 0ms 0ms 0ms
config.image_collection 0ms 0ms 0ms
ns total read write 2023-04-07T23:02:29Z
local.oplog.rs 147ms 147ms 0ms
session_cache_2.session 96ms 27ms 68ms
session_cache_3.session 73ms 20ms 53ms
session_cache.session 71ms 17ms 53ms
session_cache_4.session 63ms 14ms 48ms
admin.system.keys 0ms 0ms 0ms
admin.system.roles 0ms 0ms 0ms
admin.system.users 0ms 0ms 0ms
admin.system.version 0ms 0ms 0ms
config.image_collection 0ms 0ms 0ms
ns total read write 2023-04-07T23:02:30Z
session_cache_4.session 187974ms 126810ms 61164ms
session_cache.session 159628ms 116939ms 42689ms
session_cache_2.session 157818ms 114013ms 43804ms
session_cache_3.session 132527ms 88974ms 43552ms
local.oplog.rs 4639ms 4639ms 0ms
admin.system.keys 0ms 0ms 0ms
admin.system.roles 0ms 0ms 0ms
admin.system.users 0ms 0ms 0ms
admin.system.version 0ms 0ms 0ms
config.image_collection 0ms 0ms 0ms
ns total read write 2023-04-07T23:02:31Z
session_cache.session 1541ms 374ms 1166ms
session_cache_3.session 1306ms 294ms 1011ms
session_cache_2.session 1266ms 247ms 1018ms
session_cache_4.session 1064ms 232ms 831ms
local.oplog.rs 338ms 338ms 0ms
admin.system.keys 0ms 0ms 0ms
admin.system.roles 0ms 0ms 0ms
admin.system.users 0ms 0ms 0ms
admin.system.version 0ms 0ms 0ms
config.image_collection 0ms 0ms 0ms
ns total read write 2023-04-07T23:02:32Z
session_cache_4.session 1453ms 178ms 1274ms
session_cache.session 1367ms 188ms 1178ms
session_cache_3.session 1303ms 95ms 1207ms
session_cache_2.session 1210ms 175ms 1034ms
local.oplog.rs 273ms 273ms 0ms
admin.system.keys 0ms 0ms 0ms
admin.system.roles 0ms 0ms 0ms
admin.system.users 0ms 0ms 0ms
admin.system.version 0ms 0ms 0ms
config.image_collection 0ms 0ms 0ms
ns total read write 2023-04-07T23:02:33Z
session_cache_2.session 1407ms 116ms 1291ms
session_cache_4.session 1188ms 302ms 886ms
session_cache_3.session 1147ms 79ms 1068ms
session_cache.session 839ms 92ms 746ms
local.oplog.rs 280ms 280ms 0ms
admin.system.keys 0ms 0ms 0ms
admin.system.roles 0ms 0ms 0ms
admin.system.users 0ms 0ms 0ms
admin.system.version 0ms 0ms 0ms
config.image_collection 0ms 0ms 0ms
ns total read write 2023-04-07T23:02:34Z
session_cache_2.session 1230ms 87ms 1142ms
session_cache_3.session 1208ms 136ms 1071ms
session_cache.session 1133ms 56ms 1076ms
session_cache_4.session 1118ms 65ms 1053ms
local.oplog.rs 347ms 347ms 0ms
admin.system.keys 0ms 0ms 0ms
admin.system.roles 0ms 0ms 0ms
admin.system.users 0ms 0ms 0ms
admin.system.version 0ms 0ms 0ms
config.image_collection 0ms 0ms 0ms
CPU idle% goes to 0.0
Linux 4.18.0-372.26.1.el8_6.x86_64 (nd2bwa3psm15va) 04/07/2023 x86_64 (10 CPU)
11:02:09 PM CPU %user %nice %system %iowait %steal %idle
11:02:10 PM all 69.90 0.00 21.21 0.00 0.00 8.89
11:02:11 PM all 65.22 0.00 22.38 0.00 0.00 12.40
11:02:12 PM all 54.65 0.00 20.63 0.00 0.00 24.72
11:02:13 PM all 53.48 0.00 21.00 0.00 0.00 25.51
11:02:14 PM all 50.76 0.00 23.45 0.00 0.00 25.79
11:02:15 PM all 56.42 0.00 21.44 0.00 0.00 22.14
11:02:16 PM all 56.44 0.00 22.49 0.00 0.00 21.06
11:02:17 PM all 55.44 0.00 21.15 0.00 0.00 23.41
11:02:18 PM all 52.41 0.00 20.16 0.00 0.00 27.43
11:02:19 PM all 54.22 0.00 21.67 0.00 0.00 24.11
11:02:20 PM all 52.24 0.00 20.88 0.00 0.00 26.88
11:02:21 PM all 61.88 0.00 21.44 0.00 0.00 16.68
11:02:22 PM all 62.61 0.00 21.78 0.00 0.00 15.60
11:02:23 PM all 50.36 0.00 19.16 0.00 0.00 30.48
11:02:24 PM all 54.06 0.00 22.72 0.00 0.00 23.23
11:02:25 PM all 55.25 0.00 20.39 0.00 0.00 24.36
11:02:26 PM all 50.36 0.00 18.04 0.00 0.00 31.60
11:02:27 PM all 52.86 0.00 19.29 0.00 0.00 27.86
11:02:28 PM all 67.58 0.00 15.15 0.00 0.00 17.27
11:02:29 PM all 97.20 0.00 2.80 0.00 0.00 0.00
11:02:30 PM all 88.59 0.00 9.11 0.00 0.00 2.30
11:02:31 PM all 59.90 0.00 26.40 0.00 0.00 13.71
11:02:32 PM all 63.96 0.00 23.49 0.00 0.00 12.55
11:02:33 PM all 70.56 0.00 21.88 0.00 0.00 7.56
11:02:34 PM all 65.70 0.00 23.00 0.00 0.00 11.30
Linux 4.18.0-372.26.1.el8_6.x86_64 (nd2bwa3psm15va) 04/07/2023 x86_64 (10 CPU)
11:02:09 PM runq-sz plist-sz ldavg-1 ldavg-5 ldavg-15 blocked
11:02:10 PM 28 2985 22.28 23.99 27.08 0
11:02:11 PM 17 2994 22.28 23.99 27.08 0
11:02:12 PM 24 3025 22.28 23.99 27.08 0
11:02:13 PM 10 3024 22.58 24.02 27.07 0
11:02:14 PM 6 2956 22.58 24.02 27.07 0
11:02:15 PM 25 2959 22.58 24.02 27.07 0
11:02:16 PM 37 2960 22.58 24.02 27.07 0
11:02:17 PM 7 2964 22.58 24.02 27.07 0
11:02:18 PM 10 2968 21.89 23.86 27.00 0
11:02:19 PM 21 2968 21.89 23.86 27.00 0
11:02:20 PM 25 3001 21.89 23.86 27.00 0
11:02:21 PM 35 2999 21.89 23.86 27.00 0
11:02:22 PM 20 3037 21.89 23.86 27.00 0
11:02:23 PM 23 3037 21.42 23.73 26.94 0
11:02:24 PM 29 2983 21.42 23.73 26.94 0
11:02:25 PM 11 2981 21.42 23.73 26.94 0
11:02:26 PM 18 2971 21.42 23.73 26.94 0
11:02:27 PM 13 2972 21.42 23.73 26.94 0
11:02:28 PM 300 2985 21.07 23.62 26.89 0
11:02:29 PM 741 3160 21.07 23.62 26.89 0
11:02:30 PM 115 2612 21.07 23.62 26.89 0
SM16va: SECONDARY member mongostat
insert query update delete getmore command dirty used flushes vsize res qrw arw net_in net_out conn set repl time
*69 18 *1018 *83 0 120|0 0.9% 44.1% 0 13.0G 10.7G 0|0 1|0 25.2k 170k 153 set01h SEC Apr 7 23:02:15.245
*67 23 *935 *79 0 134|0 1.0% 44.1% 0 13.0G 10.7G 0|0 1|0 29.4k 167k 153 set01h SEC Apr 7 23:02:16.231
*71 24 *984 *67 0 132|0 1.0% 44.1% 0 13.0G 10.7G 0|0 1|0 29.5k 229k 156 set01h SEC Apr 7 23:02:17.227
*67 27 *1061 *72 0 136|0 1.0% 44.0% 0 13.0G 10.7G 0|0 1|0 31.2k 263k 153 set01h SEC Apr 7 23:02:18.229
*76 24 *1051 *65 0 136|0 1.0% 44.0% 0 13.0G 10.7G 0|0 1|0 31.0k 170k 156 set01h SEC Apr 7 23:02:19.226
*59 28 *1028 *65 0 128|0 1.0% 44.0% 0 13.0G 10.7G 0|0 1|0 29.7k 166k 156 set01h SEC Apr 7 23:02:20.225
*67 27 *934 *65 0 123|0 1.1% 43.9% 0 13.0G 10.7G 0|0 1|0 28.7k 157k 153 set01h SEC Apr 7 23:02:21.232
*64 28 *1079 *72 0 125|0 1.1% 43.9% 0 13.0G 10.7G 0|0 2|0 29.2k 166k 153 set01h SEC Apr 7 23:02:22.226
*70 22 *900 *60 0 118|0 1.1% 43.9% 0 13.0G 10.7G 0|0 1|0 26.1k 158k 153 set01h SEC Apr 7 23:02:23.249
*83 29 *1140 *69 0 140|0 1.1% 43.9% 0 13.0G 10.7G 0|1 1|0 32.8k 180k 153 set01h SEC Apr 7 23:02:24.235
insert query update delete getmore command dirty used flushes vsize res qrw arw net_in net_out conn set repl time
*80 35 *1039 *74 0 121|0 1.1% 43.8% 0 13.0G 10.7G 0|0 1|0 30.9k 169k 153 set01h SEC Apr 7 23:02:25.240
*50 28 *993 *78 0 127|0 1.1% 43.8% 0 13.0G 10.7G 0|0 1|0 29.5k 163k 153 set01h SEC Apr 7 23:02:26.227
*85 33 *1113 *81 0 129|0 1.1% 43.8% 0 13.0G 10.7G 0|1 1|0 31.6k 162k 156 set01h SEC Apr 7 23:02:27.251
*66 16 *653 *74 0 129|0 1.1% 43.8% 0 13.0G 10.7G 0|0 1|0 26.3k 212k 156 set01h SEC Apr 7 23:02:28.223
*0 5 *0 *0 0 149|0 1.1% 43.8% 0 13.0G 10.7G 0|0 1|0 27.9k 310k 156 set01h SEC Apr 7 23:02:29.229
*0 261 *0 *0 0 126|0 1.1% 43.8% 0 13.0G 10.7G 0|0 1|0 123k 1.23m 159 set01h SEC Apr 7 23:02:30.227
*64 947 *564 *96 0 175|0 1.1% 43.8% 0 13.0G 10.7G 0|0 1|0 400k 3.81m 171 set01h SEC Apr 7 23:02:31.238
*87 251 *933 *82 0 115|0 1.1% 43.8% 0 13.0G 10.7G 0|0 1|0 116k 1.13m 170 set01h SEC Apr 7 23:02:32.231
*124 41 *1562 *155 0 128|0 1.2% 43.6% 0 13.0G 10.7G 0|0 1|0 34.0k 198k 170 set01h SEC Apr 7 23:02:33.227
*119 33 *1553 *71 0 124|0 1.2% 43.6% 0 13.0G 10.7G 0|0 1|0 31.2k 179k 170 set01h SEC Apr 7 23:02:34.236
insert query update delete getmore command dirty used flushes vsize res qrw arw net_in net_out conn set repl time
15vb:SECONDARY member mongostat
insert query update delete getmore command dirty used flushes vsize res qrw arw net_in net_out conn set repl time
*61 29 *952 *77 434 532|0 0.9% 42.7% 0 13.1G 9.96G 0|0 1|0 733k 4.49m 158 set01h SEC Apr 7 23:02:15.917
*74 20 *978 *63 369 413|0 0.9% 42.7% 0 13.1G 9.96G 0|0 1|0 540k 7.30m 161 set01h SEC Apr 7 23:02:16.909
*64 24 *1010 *77 342 465|0 0.9% 42.8% 0 13.1G 9.96G 0|0 1|0 582k 4.76m 161 set01h SEC Apr 7 23:02:17.925
*76 18 *1094 *69 389 483|0 1.0% 42.7% 0 13.1G 9.96G 0|1 1|0 640k 5.01m 161 set01h SEC Apr 7 23:02:18.916
*62 24 *962 *60 247 351|0 1.0% 42.5% 0 13.1G 9.96G 0|0 1|0 426k 4.30m 161 set01h SEC Apr 7 23:02:19.939
*67 26 *1046 *61 260 356|0 1.0% 42.5% 0 13.1G 9.96G 0|0 1|0 429k 4.66m 158 set01h SEC Apr 7 23:02:20.925
*63 22 *1004 *72 317 423|0 1.0% 42.5% 0 13.1G 9.96G 0|0 1|0 543k 4.90m 158 set01h SEC Apr 7 23:02:21.912
*73 22 *963 *68 286 374|0 0.9% 42.5% 0 13.1G 9.96G 0|0 1|0 470k 4.37m 158 set01h SEC Apr 7 23:02:22.924
*75 33 *1097 *67 221 308|0 1.0% 42.5% 0 13.1G 9.96G 0|0 1|0 358k 4.81m 158 set01h SEC Apr 7 23:02:23.926
*86 31 *1079 *63 361 487|0 1.0% 42.6% 0 13.1G 9.96G 0|2 1|0 640k 5.02m 158 set01h SEC Apr 7 23:02:24.912
insert query update delete getmore command dirty used flushes vsize res qrw arw net_in net_out conn set repl time
*48 19 *948 *90 245 328|0 1.1% 42.6% 0 13.1G 9.96G 0|0 1|0 394k 4.18m 158 set01h SEC Apr 7 23:02:25.934
*78 41 *1095 *72 255 328|0 1.1% 42.7% 0 13.1G 9.96G 0|0 1|0 397k 7.23m 158 set01h SEC Apr 7 23:02:26.913
*80 32 *1015 *82 442 521|0 1.2% 42.6% 0 13.1G 9.96G 0|0 1|0 712k 4.91m 161 set01h SEC Apr 7 23:02:27.924
*10 4 *172 *13 91 217|0 1.1% 42.6% 0 13.1G 9.96G 0|0 1|0 162k 1.06m 158 set01h SEC Apr 7 23:02:28.907
*0 48 *0 *0 0 201|0 1.1% 42.6% 0 13.1G 9.96G 0|0 1|0 53.3k 408k 150 set01h SEC Apr 7 23:02:29.910
*37 895 *333 *86 49 438|0 1.1% 42.6% 0 13.1G 9.96G 0|0 1|0 507k 5.37m 166 set01h SEC Apr 7 23:02:30.914
*98 464 *750 *76 253 359|0 1.2% 42.6% 0 13.1G 9.96G 0|0 1|0 609k 5.84m 167 set01h SEC Apr 7 23:02:31.908
*122 36 *1411 *149 346 453|0 1.2% 42.6% 0 13.1G 9.96G 0|0 1|0 604k 6.59m 167 set01h SEC Apr 7 23:02:32.912
*123 44 *1739 *80 311 411|0 1.2% 42.6% 0 13.1G 9.96G 0|1 1|0 534k 7.54m 167 set01h SEC Apr 7 23:02:33.917
*84 45 *1324 *72 325 444|0 1.2% 42.4% 0 13.1G 9.96G 1|0 1|0 573k 5.95m 168 set01h SEC Apr 7 23:02:34.911
insert query update delete getmore command dirty used flushes vsize res qrw arw net_in net_out conn set repl time
16vb:SECONDARY member mongostat
nsert query update delete getmore command dirty used flushes vsize res qrw arw net_in net_out conn set repl time
*75 18 *1046 *66 0 135|0 0.9% 43.4% 0 13.0G 10.7G 0|0 1|0 28.8k 160k 166 set01h SEC Apr 7 23:02:18.807
*65 31 *1004 *66 0 131|0 0.9% 43.4% 0 13.0G 10.7G 0|0 1|0 31.6k 177k 166 set01h SEC Apr 7 23:02:19.784
*67 37 *1038 *56 0 128|0 0.7% 42.8% 0 13.0G 10.7G 0|1 1|0 33.1k 172k 163 set01h SEC Apr 7 23:02:20.785
*59 28 *973 *69 0 122|0 0.8% 42.8% 0 13.0G 10.7G 0|1 1|0 29.1k 157k 163 set01h SEC Apr 7 23:02:21.794
*76 30 *991 *73 0 127|0 0.9% 42.8% 0 13.0G 10.7G 0|0 1|0 30.0k 166k 163 set01h SEC Apr 7 23:02:22.792
*70 26 *1110 *70 0 124|0 0.9% 42.9% 0 13.0G 10.7G 0|0 1|0 28.5k 156k 163 set01h SEC Apr 7 23:02:23.783
*85 22 *1031 *63 0 122|0 1.0% 42.9% 0 13.0G 10.7G 0|0 2|0 26.8k 175k 163 set01h SEC Apr 7 23:02:24.794
*58 21 *958 *87 0 132|0 1.1% 43.0% 0 13.0G 10.7G 0|1 1|0 29.3k 209k 166 set01h SEC Apr 7 23:02:25.787
*66 31 *1079 *68 0 134|0 1.1% 43.0% 0 13.0G 10.7G 0|0 1|0 32.4k 569k 164 set01h SEC Apr 7 23:02:26.796
*85 44 *1058 *81 0 152|0 1.2% 43.1% 0 13.0G 10.7G 0|0 1|0 40.0k 235k 166 set01h SEC Apr 7 23:02:27.785
insert query update delete getmore command dirty used flushes vsize res qrw arw net_in net_out conn set repl time
*19 5 *215 *26 0 126|0 1.2% 43.1% 0 13.0G 10.7G 0|0 1|0 21.6k 162k 166 set01h SEC Apr 7 23:02:28.784
*0 31 *0 *0 0 125|0 1.2% 43.1% 0 13.0G 10.7G 0|0 2|0 32.7k 274k 161 set01h SEC Apr 7 23:02:29.788
*13 808 *214 *76 0 148|0 1.2% 43.1% 0 13.0G 10.7G 0|0 1|0 341k 3.32m 170 set01h SEC Apr 7 23:02:30.788
*108 592 *792 *72 0 115|0 1.3% 43.1% 0 13.0G 10.7G 0|1 1|0 251k 2.52m 172 set01h SEC Apr 7 23:02:31.787
*124 42 *1368 *154 0 128|0 1.4% 43.2% 0 13.0G 10.7G 0|1 1|0 34.7k 430k 172 set01h SEC Apr 7 23:02:32.784
*130 47 *1759 *80 0 119|0 1.5% 43.3% 0 13.0G 10.7G 0|1 1|0 34.7k 169k 172 set01h SEC Apr 7 23:02:33.789
*74 48 *1345 *74 0 127|0 1.6% 43.3% 0 13.0G 10.7G 0|1 1|0 35.9k 176k 172 set01h SEC Apr 7 23:02:34.787
*80 43 *1274 *80 0 130|0 1.6% 43.4% 0 13.0G 10.7G 0|0 1|0 35.9k 210k 175 set01h SEC Apr 7 23:02:35.786
*92 29 *1167 *85 0 129|0 1.7% 43.4% 0 13.0G 10.7G 0|0 1|0 30.5k 209k 172 set01h SEC Apr 7 23:02:36.786
*75 26 *930 *58 0 150|0 1.7% 43.4% 0 13.0G 10.7G 0|0 1|0 33.9k 227k 175 set01h SEC Apr 7 23:02:37.783
Since the PRIMARY seems unavailable, infact NO primaries are seen for 1-2 sec, the application(client) queries are getting timedout resulting in bad customer experience.
We tried to reduce the load(query) to the mongo but it looks the same. We use /etc/hosts for name resolution and do not use DNS at all.
We need to know why time to time, mongod hogs.
Please provide a secure portal, I can share the diagnostics.data and complete mongodb log during this period.