-
Type: Bug
-
Resolution: Gone away
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
Environment:linux 2.6.32-504.30.3.el6.x86_64 #1 SMP Wed Jul 15 10:13:09 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
nodejs v6.9.4
-
Empty show more show less
This is very similar to the closed/abandoned issue in JAVA driver https://jira.mongodb.org/browse/JAVA-1972
There does not seem to be a `2.6.12` or `2.6` version option for your bugs.
I have been suffering with this same issue for a number of months. Every couple weeks this happens. The master (in 3 server cluster) CPU spikes and the system becomes slow to respond (eg: ssh to the box takes a couple minutes)
The logs are constantly spewing a similar issue to above:
2017-05-30T14:44:17.137+0100 [conn78072] query <redacted>.use query: { jsonWebToken: ObjectId('592d75b8709e022f21c1cf0f') } planSummary: COLLSCAN ntoreturn:1000 ntoskip:0 nscanned:11609884 nscannedObjects:11609884 keyUpdates:0 numYields:14629 locks(micros) r:609927364 nreturned:0 reslen:20 375938ms 2017-05-30T14:44:17.245+0100 [conn78072] SocketException handling request, closing client connection: 9001 socket exception [SEND_ERROR] server [10.182.0.39:53706] 2017-05-30T14:44:17.204+0100 [conn78071] query <redacted>.use query: { jsonWebToken: ObjectId('592d75b8709e022f21c1cf0d') } planSummary: COLLSCAN ntoreturn:1000 ntoskip:0 nscanned:11609884 nscannedObjects:11609884 keyUpdates:0 numYields:14440 locks(micros) r:612202392 nreturned:0 reslen:20 375993ms 2017-05-30T14:44:17.245+0100 [conn78071] SocketException handling request, closing client connection: 9001 socket exception [SEND_ERROR] server [10.182.0.39:53705] 2017-05-30T14:44:17.209+0100 [conn78074] query <redacted>.use query: { jsonWebToken: ObjectId('592d75b8709e022f21c1cf11') } planSummary: COLLSCAN ntoreturn:1000 ntoskip:0 nscanned:11609884 nscannedObjects:11609884 keyUpdates:0 numYields:14393 locks(micros) r:608454980 nreturned:0 reslen:20 376007ms 2017-05-30T14:44:17.246+0100 [conn78074] SocketException handling request, closing client connection: 9001 socket exception [SEND_ERROR] server [10.182.0.39:53708] 2017-05-30T14:44:17.213+0100 [conn78073] query <redacted>.use query: { jsonWebToken: ObjectId('592d75b8709e022f21c1cf12') } planSummary: COLLSCAN ntoreturn:1000 ntoskip:0 nscanned:11609884 nscannedObjects:11609884 keyUpdates:0 numYields:14715 locks(micros) r:607931769 nreturned:0 reslen:20 376001ms 2017-05-30T14:44:17.260+0100 [conn78073] SocketException handling request, closing client connection: 9001 socket exception [SEND_ERROR] server [10.182.0.39:53707] 2017-05-30T14:44:17.222+0100 [conn78070] query <redacted>.use query: { jsonWebToken: ObjectId('592d75b8709e022f21c1cf0c') } planSummary: COLLSCAN ntoreturn:1000 ntoskip:0 nscanned:11609884 nscannedObjects:11609884 keyUpdates:0 numYields:14319 locks(micros) r:609798832 nreturned:0 reslen:20 376024ms 2017-05-30T14:44:17.301+0100 [conn78070] SocketException handling request, closing client connection: 9001 socket exception [SEND_ERROR] server [10.182.0.39:53704] 2017-05-30T14:44:21.404+0100 [initandlisten] connection accepted from 10.182.0.39:53916 #78205 (127 connections now open) 2017-05-30T14:44:21.405+0100 [initandlisten] connection accepted from 10.182.0.39:53917 #78206 (128 connections now open) 2017-05-30T14:44:21.406+0100 [initandlisten] connection accepted from 10.182.0.39:53918 #78207 (129 connections now open) 2017-05-30T14:44:21.407+0100 [initandlisten] connection accepted from 10.182.0.39:53919 #78208 (130 connections now open) 2017-05-30T14:44:21.408+0100 [initandlisten] connection accepted from 10.182.0.39:53920 #78209 (131 connections now open)
selection for mongostat during the issue.
insert query update delete getmore command flushes mapped vsize res faults locked db idx miss % qr|qw ar|aw netIn netOut conn set repl time *0 *0 *0 *0 0 11|0 0 26.1g 53g 316m 304 <redacted>:0.0% 0 0|0 39|0 629b 7k 85 <redacted-set> PRI 14:49:22 *0 *0 *0 *0 2 5|0 0 26.1g 53g 317m 260 <redacted>:0.1% 0 0|0 39|0 671b 4k 85 <redacted-set> PRI 14:49:23 *0 *0 *0 *0 0 1|0 0 26.1g 53g 317m 265 <redacted>:0.0% 0 0|0 39|0 62b 3k 85 <redacted-set> PRI 14:49:24 *0 *0 *0 *0 0 5|0 0 26.1g 53g 315m 300 <redacted>:0.0% 0 0|0 39|0 478b 4k 85 <redacted-set> PRI 14:49:25 *0 *0 *0 *0 0 2|0 0 26.1g 53g 319m 282 <redacted>:0.0% 0 0|0 39|0 117b 4k 85 <redacted-set> PRI 14:49:26 *0 *0 *0 *0 0 3|0 0 26.1g 53g 316m 326 <redacted>:0.0% 0 0|0 39|0 368b 3k 85 <redacted-set> PRI 14:49:27 *0 *0 *0 *0 2 2|0 0 26.1g 53g 318m 277 <redacted>:0.0% 0 0|0 39|0 211b 4k 85 <redacted-set> PRI 14:49:28 *0 *0 *0 *0 0 3|0 0 26.1g 53g 319m 257 <redacted>:0.0% 0 0|0 39|0 368b 3k 85 <redacted-set> PRI 14:49:29 *0 *0 *0 *0 0 12|0 0 26.1g 53g 319m 250 <redacted>:0.0% 0 0|0 39|0 690b 8k 85 <redacted-set> PRI 14:49:30 *0 *0 *0 *0 0 7|0 0 26.1g 53g 318m 294 <redacted>:0.0% 0 0|0 39|0 592b 5k 85 <redacted-set> PRI 14:49:31 insert query update delete getmore command flushes mapped vsize res faults locked db idx miss % qr|qw ar|aw netIn netOut conn set repl time 7 *0 7 *0 20 35|0 0 26.1g 53g 315m 258 <redacted>:0.5% 0 6|0 2|0 11k 21k 84 <redacted-set> PRI 14:49:32 8 5 2 *0 10 13|0 0 26.1g 53g 305m 166 <redacted>:0.3% 0 0|0 39|0 6k 12k 84 <redacted-set> PRI 14:49:33 6 2 2 *0 9 11|0 0 26.1g 53g 258m 199 <redacted>:0.7% 0 23|0 0|0 5k 10k 84 <redacted-set> PRI 14:49:34 *0 *0 8 *0 12 19|0 0 26.1g 53g 284m 114 <redacted>:0.3% 0 20|0 1|0 6k 12k 84 <redacted-set> PRI 14:49:35 6 3 3 *0 8 12|0 0 26.1g 53g 250m 163 <redacted>:0.8% 0 0|0 39|0 4k 10k 89 <redacted-set> PRI 14:49:36 6 3 2 *0 8 12|0 0 26.1g 53g 278m 211 <redacted>:0.8% 0 1|0 2|0 4k 9k 89 <redacted-set> PRI 14:49:37 4 2 3 *0 4 17|0 0 26.1g 53g 246m 182 <redacted>:0.8% 0 0|0 39|0 4k 9k 89 <redacted-set> PRI 14:49:38 *0 *0 2 *0 0 10|0 0 26.1g 53g 282m 199 <redacted>:0.0% 0 16|0 17|0 818b 5k 89 <redacted-set> PRI 14:49:39 10 2 1 *0 12 31|0 0 26.1g 53g 281m 182 <redacted>:0.3% 0 0|0 39|0 10k 21k 89 <redacted-set> PRI 14:49:41 *0 15 9 *0 9 17|0 0 26.1g 53g 297m 198 <redacted>:0.2% 0 0|0 39|0 8k 16k 89 <redacted-set> PRI 14:49:42
My previous digging into the problem lead me to SERVER-5793 but as we are running 2.6.12 it was a long shot.
It happened again today and I tried the connection pool flush (on master) from SERVER-5632 without success.
After restarting the master a new master is elected and within a matter of minutes the new master starts to have the same problem.
A number of similar threads on SO and similar point to ulimit being reached but that does not seem to be the case, lsof for the mongodb process was only around 300.
Let me know if I can provide any other info
- backports
-
NODE-1439 SocketException on mongo nodes
- Closed