Details
-
Bug
-
Resolution: Incomplete
-
Critical - P2
-
None
-
2.8.0-rc5
-
ALL
-
Description
Seen with sysbench and YCSB workloads during standalone, single-node longevity testing, the mongod process can enter a persistent error condition where it appears to stop reading sockets from client connections. sysbench and YCSB use the 2.12.4 java driver. Client and server are on the same machine.
Nothing useful logged on server log. Tried db.runCommand(
{ setParameter: 1, logLevel: 5 }) from mongo client but still nothing.
mongo client can run "show collections" but, after displaying the collection anems, hangs.
netstat shows evidence of waiting data in a socket for mongod:
[qconner@slave-4.perf.ny.cbi ~/apps/mongo]$ netstat -n|grep ESTAB
|
tcp 0 0 172.30.200.12:60219 184.106.28.85:443 ESTABLISHED
|
tcp 0 0 172.30.200.12:46284 54.209.155.229:3232 ESTABLISHED
|
tcp 221981 0 127.0.0.1:27017 127.0.0.1:40262 ESTABLISHED
|
tcp 221981 0 127.0.0.1:27017 127.0.0.1:40256 ESTABLISHED
|
tcp 0 0 172.30.200.12:22 172.30.200.8:61008 ESTABLISHED
|
tcp 0 0 127.0.0.1:27017 127.0.0.1:40261 ESTABLISHED
|
tcp 0 0 127.0.0.1:27017 127.0.0.1:40253 ESTABLISHED
|
tcp 0 0 127.0.0.1:27017 127.0.0.1:40257 ESTABLISHED
|
tcp 0 0 127.0.0.1:27017 127.0.0.1:40259 ESTABLISHED
|
tcp 0 0 127.0.0.1:27017 127.0.0.1:40258 ESTABLISHED
|
tcp 0 0 127.0.0.1:27017 127.0.0.1:40255 ESTABLISHED
|
tcp 0 0 172.30.200.12:56842 68.203.14.106:4506 ESTABLISHED
|
tcp 0 0 172.30.200.12:45664 68.203.14.106:4505 ESTABLISHED
|
tcp 0 0 172.30.200.12:57205 54.209.155.229:3233 ESTABLISHED
|
tcp 0 0 127.0.0.1:27017 127.0.0.1:40254 ESTABLISHED
|
tcp 0 0 172.30.200.12:58762 68.203.14.106:4506 ESTABLISHED
|
strace of mongod shows only calls to poll: or select():
[qconner@slave-4.perf.ny.cbi ~/apps/mongo]$ strace -p 26297
|
Process 26297 attached - interrupt to quit
|
select(8, [6 7], NULL, NULL, {0, 5891}) = 0 (Timeout)
|
select(8, [6 7], NULL, NULL, {0, 10000}) = 0 (Timeout)
|
select(8, [6 7], NULL, NULL, {0, 10000}) = 0 (Timeout)
|
select(8, [6 7], NULL, NULL, {0, 10000}) = 0 (Timeout)
|
select(8, [6 7], NULL, NULL, {0, 10000}) = 0 (Timeout)
|
select(8, [6 7], NULL, NULL, {0, 10000}) = 0 (Timeout)
|
Bottom of mongod log file is unremarkable:
2015-01-21T22:50:13.239+0000 I COMMAND [conn5] command sbtest.$cmd command: insert { insert: "sbtest6", ordered: true, documents: 1000 } keyUpdates:0 writeConflicts:0 n
|
umYields:0 reslen:40 1294ms
|
2015-01-21T22:50:17.213+0000 I WRITE [conn5] insert sbtest.sbtest6 query: { _id: 17816202, k: 47394, c: "38386020571-30959444150-14559542433-02413298585-47740660800-2
|
7829143017-40155805950-73693273374-88438959896-98232277051", pad: "25264851037-22995234305-81217374636-99284513390-23152612121" } ninserted:1 keyUpdates:0 writeConflicts
|
:0 numYields:0 143ms
|
2015-01-21T22:50:17.242+0000 I COMMAND [conn5] command sbtest.$cmd command: insert { insert: "sbtest6", ordered: true, documents: 1000 } keyUpdates:0 writeConflicts:0 numYields:0 reslen:40 179ms
|
2015-01-22T16:32:57.907+0000 I NETWORK [initandlisten] connection accepted from 127.0.0.1:42321 #10 (10 connections now open)
|
2015-01-22T16:33:07.237+0000 I NETWORK [conn10] end connection 127.0.0.1:42321 (9 connections now open)
|
2015-01-22T16:33:08.497+0000 I NETWORK [initandlisten] connection accepted from 127.0.0.1:42322 #11 (10 connections now open)
|
2015-01-22T16:34:13.257+0000 I NETWORK [initandlisten] connection accepted from 127.0.0.1:42324 #12 (11 connections now open)
|
2015-01-22T16:34:13.258+0000 I NETWORK [conn12] end connection 127.0.0.1:42324 (10 connections now open)
|
2015-01-22T16:34:49.896+0000 I NETWORK [initandlisten] connection accepted from 127.0.0.1:42326 #13 (11 connections now open)
|
2015-01-22T16:50:15.766+0000 I NETWORK [initandlisten] connection accepted from 127.0.0.1:42345 #14 (12 connections now open)
|
2015-01-22T16:50:15.766+0000 I NETWORK [conn14] end connection 127.0.0.1:42345 (11 connections now open)
|
2015-01-22T16:52:55.395+0000 I NETWORK [initandlisten] connection accepted from 127.0.0.1:42375 #15 (12 connections now open)
|
2015-01-22T16:53:10.491+0000 I NETWORK [initandlisten] connection accepted from 127.0.0.1:42376 #16 (13 connections now open)
|
2015-01-22T16:53:10.492+0000 I NETWORK [conn16] end connection 127.0.0.1:42376 (12 connections now open)
|
2015-01-22T16:55:48.084+0000 I NETWORK [initandlisten] connection accepted from 127.0.0.1:42380 #17 (13 connections now open)
|
Sorry I don't have more to go on. Running it again with symbols so gdb attach can be leveraged.