Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-16998

rc5 mongod stops reading client TCP socket

    • Type: Icon: Bug Bug
    • Resolution: Incomplete
    • Priority: Icon: Critical - P2 Critical - P2
    • None
    • Affects Version/s: 2.8.0-rc5
    • Component/s: Concurrency, Networking, Storage
    • Labels:
    • ALL
    • Hide

      Longevity workload (at least 12 hours?) with sysbench or YCSB.
      Both of these use the Java 2.12.4 but symptoms also show with mongo.

      Show
      Longevity workload (at least 12 hours?) with sysbench or YCSB. Both of these use the Java 2.12.4 but symptoms also show with mongo.

      Seen with sysbench and YCSB workloads during standalone, single-node longevity testing, the mongod process can enter a persistent error condition where it appears to stop reading sockets from client connections. sysbench and YCSB use the 2.12.4 java driver. Client and server are on the same machine.

      Nothing useful logged on server log. Tried db.runCommand(

      { setParameter: 1, logLevel: 5 }

      ) from mongo client but still nothing.

      mongo client can run "show collections" but, after displaying the collection anems, hangs.

      netstat shows evidence of waiting data in a socket for mongod:

      [qconner@slave-4.perf.ny.cbi ~/apps/mongo]$ netstat -n|grep ESTAB
      tcp        0      0 172.30.200.12:60219         184.106.28.85:443           ESTABLISHED
      tcp        0      0 172.30.200.12:46284         54.209.155.229:3232         ESTABLISHED
      tcp   221981      0 127.0.0.1:27017             127.0.0.1:40262             ESTABLISHED
      tcp   221981      0 127.0.0.1:27017             127.0.0.1:40256             ESTABLISHED
      tcp        0      0 172.30.200.12:22            172.30.200.8:61008          ESTABLISHED
      tcp        0      0 127.0.0.1:27017             127.0.0.1:40261             ESTABLISHED
      tcp        0      0 127.0.0.1:27017             127.0.0.1:40253             ESTABLISHED
      tcp        0      0 127.0.0.1:27017             127.0.0.1:40257             ESTABLISHED
      tcp        0      0 127.0.0.1:27017             127.0.0.1:40259             ESTABLISHED
      tcp        0      0 127.0.0.1:27017             127.0.0.1:40258             ESTABLISHED
      tcp        0      0 127.0.0.1:27017             127.0.0.1:40255             ESTABLISHED
      tcp        0      0 172.30.200.12:56842         68.203.14.106:4506          ESTABLISHED
      tcp        0      0 172.30.200.12:45664         68.203.14.106:4505          ESTABLISHED
      tcp        0      0 172.30.200.12:57205         54.209.155.229:3233         ESTABLISHED
      tcp        0      0 127.0.0.1:27017             127.0.0.1:40254             ESTABLISHED
      tcp        0      0 172.30.200.12:58762         68.203.14.106:4506          ESTABLISHED
      

      strace of mongod shows only calls to poll: or select():

      [qconner@slave-4.perf.ny.cbi ~/apps/mongo]$ strace -p 26297
      Process 26297 attached - interrupt to quit
      select(8, [6 7], NULL, NULL, {0, 5891}) = 0 (Timeout)
      select(8, [6 7], NULL, NULL, {0, 10000}) = 0 (Timeout)
      select(8, [6 7], NULL, NULL, {0, 10000}) = 0 (Timeout)
      select(8, [6 7], NULL, NULL, {0, 10000}) = 0 (Timeout)
      select(8, [6 7], NULL, NULL, {0, 10000}) = 0 (Timeout)
      select(8, [6 7], NULL, NULL, {0, 10000}) = 0 (Timeout)
      

      Bottom of mongod log file is unremarkable:

      2015-01-21T22:50:13.239+0000 I COMMAND  [conn5] command sbtest.$cmd command: insert { insert: "sbtest6", ordered: true, documents: 1000 } keyUpdates:0 writeConflicts:0 n
      umYields:0 reslen:40 1294ms
      2015-01-21T22:50:17.213+0000 I WRITE    [conn5] insert sbtest.sbtest6 query: { _id: 17816202, k: 47394, c: "38386020571-30959444150-14559542433-02413298585-47740660800-2
      7829143017-40155805950-73693273374-88438959896-98232277051", pad: "25264851037-22995234305-81217374636-99284513390-23152612121" } ninserted:1 keyUpdates:0 writeConflicts
      :0 numYields:0 143ms
      2015-01-21T22:50:17.242+0000 I COMMAND  [conn5] command sbtest.$cmd command: insert { insert: "sbtest6", ordered: true, documents: 1000 } keyUpdates:0 writeConflicts:0 numYields:0 reslen:40 179ms
      2015-01-22T16:32:57.907+0000 I NETWORK  [initandlisten] connection accepted from 127.0.0.1:42321 #10 (10 connections now open)
      2015-01-22T16:33:07.237+0000 I NETWORK  [conn10] end connection 127.0.0.1:42321 (9 connections now open)
      2015-01-22T16:33:08.497+0000 I NETWORK  [initandlisten] connection accepted from 127.0.0.1:42322 #11 (10 connections now open)
      2015-01-22T16:34:13.257+0000 I NETWORK  [initandlisten] connection accepted from 127.0.0.1:42324 #12 (11 connections now open)
      2015-01-22T16:34:13.258+0000 I NETWORK  [conn12] end connection 127.0.0.1:42324 (10 connections now open)
      2015-01-22T16:34:49.896+0000 I NETWORK  [initandlisten] connection accepted from 127.0.0.1:42326 #13 (11 connections now open)
      2015-01-22T16:50:15.766+0000 I NETWORK  [initandlisten] connection accepted from 127.0.0.1:42345 #14 (12 connections now open)
      2015-01-22T16:50:15.766+0000 I NETWORK  [conn14] end connection 127.0.0.1:42345 (11 connections now open)
      2015-01-22T16:52:55.395+0000 I NETWORK  [initandlisten] connection accepted from 127.0.0.1:42375 #15 (12 connections now open)
      2015-01-22T16:53:10.491+0000 I NETWORK  [initandlisten] connection accepted from 127.0.0.1:42376 #16 (13 connections now open)
      2015-01-22T16:53:10.492+0000 I NETWORK  [conn16] end connection 127.0.0.1:42376 (12 connections now open)
      2015-01-22T16:55:48.084+0000 I NETWORK  [initandlisten] connection accepted from 127.0.0.1:42380 #17 (13 connections now open)
      

      Sorry I don't have more to go on. Running it again with symbols so gdb attach can be leveraged.

            Assignee:
            Unassigned Unassigned
            Reporter:
            quentin.conner Quentin Conner
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: