Uploaded image for project: 'Node.js Driver'
  1. Node.js Driver
  2. NODE-1013

SocketException handling request, closing client connection: 9001 socket exception [SEND_ERROR]

    • Type: Icon: Bug Bug
    • Resolution: Gone away
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Environment:
      linux 2.6.32-504.30.3.el6.x86_64 #1 SMP Wed Jul 15 10:13:09 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

      nodejs v6.9.4

      This is very similar to the closed/abandoned issue in JAVA driver https://jira.mongodb.org/browse/JAVA-1972

      There does not seem to be a `2.6.12` or `2.6` version option for your bugs.

      I have been suffering with this same issue for a number of months. Every couple weeks this happens. The master (in 3 server cluster) CPU spikes and the system becomes slow to respond (eg: ssh to the box takes a couple minutes)

      The logs are constantly spewing a similar issue to above:

      2017-05-30T14:44:17.137+0100 [conn78072] query <redacted>.use query: { jsonWebToken: ObjectId('592d75b8709e022f21c1cf0f') } planSummary: COLLSCAN ntoreturn:1000 ntoskip:0 nscanned:11609884 nscannedObjects:11609884 keyUpdates:0 numYields:14629 locks(micros) r:609927364 nreturned:0 reslen:20 375938ms
      2017-05-30T14:44:17.245+0100 [conn78072] SocketException handling request, closing client connection: 9001 socket exception [SEND_ERROR] server [10.182.0.39:53706] 
      2017-05-30T14:44:17.204+0100 [conn78071] query <redacted>.use query: { jsonWebToken: ObjectId('592d75b8709e022f21c1cf0d') } planSummary: COLLSCAN ntoreturn:1000 ntoskip:0 nscanned:11609884 nscannedObjects:11609884 keyUpdates:0 numYields:14440 locks(micros) r:612202392 nreturned:0 reslen:20 375993ms
      2017-05-30T14:44:17.245+0100 [conn78071] SocketException handling request, closing client connection: 9001 socket exception [SEND_ERROR] server [10.182.0.39:53705] 
      2017-05-30T14:44:17.209+0100 [conn78074] query <redacted>.use query: { jsonWebToken: ObjectId('592d75b8709e022f21c1cf11') } planSummary: COLLSCAN ntoreturn:1000 ntoskip:0 nscanned:11609884 nscannedObjects:11609884 keyUpdates:0 numYields:14393 locks(micros) r:608454980 nreturned:0 reslen:20 376007ms
      2017-05-30T14:44:17.246+0100 [conn78074] SocketException handling request, closing client connection: 9001 socket exception [SEND_ERROR] server [10.182.0.39:53708] 
      2017-05-30T14:44:17.213+0100 [conn78073] query <redacted>.use query: { jsonWebToken: ObjectId('592d75b8709e022f21c1cf12') } planSummary: COLLSCAN ntoreturn:1000 ntoskip:0 nscanned:11609884 nscannedObjects:11609884 keyUpdates:0 numYields:14715 locks(micros) r:607931769 nreturned:0 reslen:20 376001ms
      2017-05-30T14:44:17.260+0100 [conn78073] SocketException handling request, closing client connection: 9001 socket exception [SEND_ERROR] server [10.182.0.39:53707] 
      2017-05-30T14:44:17.222+0100 [conn78070] query <redacted>.use query: { jsonWebToken: ObjectId('592d75b8709e022f21c1cf0c') } planSummary: COLLSCAN ntoreturn:1000 ntoskip:0 nscanned:11609884 nscannedObjects:11609884 keyUpdates:0 numYields:14319 locks(micros) r:609798832 nreturned:0 reslen:20 376024ms
      2017-05-30T14:44:17.301+0100 [conn78070] SocketException handling request, closing client connection: 9001 socket exception [SEND_ERROR] server [10.182.0.39:53704] 
      2017-05-30T14:44:21.404+0100 [initandlisten] connection accepted from 10.182.0.39:53916 #78205 (127 connections now open)
      2017-05-30T14:44:21.405+0100 [initandlisten] connection accepted from 10.182.0.39:53917 #78206 (128 connections now open)
      2017-05-30T14:44:21.406+0100 [initandlisten] connection accepted from 10.182.0.39:53918 #78207 (129 connections now open)
      2017-05-30T14:44:21.407+0100 [initandlisten] connection accepted from 10.182.0.39:53919 #78208 (130 connections now open)
      2017-05-30T14:44:21.408+0100 [initandlisten] connection accepted from 10.182.0.39:53920 #78209 (131 connections now open)
      

      selection for mongostat during the issue.

      insert  query update delete getmore command flushes mapped  vsize    res faults           locked db idx miss %     qr|qw   ar|aw  netIn netOut  conn              set repl       time 
          *0     *0     *0     *0       0    11|0       0  26.1g    53g   316m    304 <redacted>:0.0%          0       0|0    39|0   629b     7k    85 <redacted-set>  PRI   14:49:22 
          *0     *0     *0     *0       2     5|0       0  26.1g    53g   317m    260 <redacted>:0.1%          0       0|0    39|0   671b     4k    85 <redacted-set>  PRI   14:49:23 
          *0     *0     *0     *0       0     1|0       0  26.1g    53g   317m    265 <redacted>:0.0%          0       0|0    39|0    62b     3k    85 <redacted-set>  PRI   14:49:24 
          *0     *0     *0     *0       0     5|0       0  26.1g    53g   315m    300 <redacted>:0.0%          0       0|0    39|0   478b     4k    85 <redacted-set>  PRI   14:49:25 
          *0     *0     *0     *0       0     2|0       0  26.1g    53g   319m    282 <redacted>:0.0%          0       0|0    39|0   117b     4k    85 <redacted-set>  PRI   14:49:26 
          *0     *0     *0     *0       0     3|0       0  26.1g    53g   316m    326 <redacted>:0.0%          0       0|0    39|0   368b     3k    85 <redacted-set>  PRI   14:49:27 
          *0     *0     *0     *0       2     2|0       0  26.1g    53g   318m    277 <redacted>:0.0%          0       0|0    39|0   211b     4k    85 <redacted-set>  PRI   14:49:28 
          *0     *0     *0     *0       0     3|0       0  26.1g    53g   319m    257 <redacted>:0.0%          0       0|0    39|0   368b     3k    85 <redacted-set>  PRI   14:49:29 
          *0     *0     *0     *0       0    12|0       0  26.1g    53g   319m    250 <redacted>:0.0%          0       0|0    39|0   690b     8k    85 <redacted-set>  PRI   14:49:30 
          *0     *0     *0     *0       0     7|0       0  26.1g    53g   318m    294 <redacted>:0.0%          0       0|0    39|0   592b     5k    85 <redacted-set>  PRI   14:49:31 
      insert  query update delete getmore command flushes mapped  vsize    res faults           locked db idx miss %     qr|qw   ar|aw  netIn netOut  conn              set repl       time 
           7     *0      7     *0      20    35|0       0  26.1g    53g   315m    258 <redacted>:0.5%          0       6|0     2|0    11k    21k    84 <redacted-set>  PRI   14:49:32 
           8      5      2     *0      10    13|0       0  26.1g    53g   305m    166 <redacted>:0.3%          0       0|0    39|0     6k    12k    84 <redacted-set>  PRI   14:49:33 
           6      2      2     *0       9    11|0       0  26.1g    53g   258m    199 <redacted>:0.7%          0      23|0     0|0     5k    10k    84 <redacted-set>  PRI   14:49:34 
          *0     *0      8     *0      12    19|0       0  26.1g    53g   284m    114 <redacted>:0.3%          0      20|0     1|0     6k    12k    84 <redacted-set>  PRI   14:49:35 
           6      3      3     *0       8    12|0       0  26.1g    53g   250m    163 <redacted>:0.8%          0       0|0    39|0     4k    10k    89 <redacted-set>  PRI   14:49:36 
           6      3      2     *0       8    12|0       0  26.1g    53g   278m    211 <redacted>:0.8%          0       1|0     2|0     4k     9k    89 <redacted-set>  PRI   14:49:37 
           4      2      3     *0       4    17|0       0  26.1g    53g   246m    182 <redacted>:0.8%          0       0|0    39|0     4k     9k    89 <redacted-set>  PRI   14:49:38 
          *0     *0      2     *0       0    10|0       0  26.1g    53g   282m    199 <redacted>:0.0%          0      16|0    17|0   818b     5k    89 <redacted-set>  PRI   14:49:39 
          10      2      1     *0      12    31|0       0  26.1g    53g   281m    182 <redacted>:0.3%          0       0|0    39|0    10k    21k    89 <redacted-set>  PRI   14:49:41 
          *0     15      9     *0       9    17|0       0  26.1g    53g   297m    198 <redacted>:0.2%          0       0|0    39|0     8k    16k    89 <redacted-set>  PRI   14:49:42
      

      My previous digging into the problem lead me to SERVER-5793 but as we are running 2.6.12 it was a long shot.

      It happened again today and I tried the connection pool flush (on master) from SERVER-5632 without success.

      After restarting the master a new master is elected and within a matter of minutes the new master starts to have the same problem.

      A number of similar threads on SO and similar point to ulimit being reached but that does not seem to be the case, lsof for the mongodb process was only around 300.

      Let me know if I can provide any other info

            Assignee:
            Unassigned Unassigned
            Reporter:
            dogmatic69 carl
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: