[SERVER-11743] Add debugging to socket pending data check (was: auth.js and authConnectionHook.js intermittently failing linux32 debug) Created: 16/Nov/13  Updated: 10/Dec/14  Resolved: 09/Jan/14

Status: Closed
Project: Core Server
Component/s: Build
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Daniel Pasette (Inactive) Assignee: Eric Milkie
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Participants:

 Description   

There are a couple more examples, but this is the earliest I saw.

https://mci.10gen.com/ui/task/mongodb_mongo_master_linux_32_debug_103da34ddbc32d8345655471fc46afc6352a6624_13_11_14_14_28_21_sharding_linux_32

https://mci.10gen.com/ui/task/mongodb_mongo_master_linux_32_debug_58ed9775c96676a8c6dab2a83ea4ee55f1dcaa38_13_11_14_15_55_13_sharding_linux_32

 m31100| 2013-11-16T01:09:57.803-0500 [initandlisten] connection accepted from 10.116.42.47:49186 #58 (23 connections now open)
 m31100| 2013-11-16T01:09:57.811-0500 [conn58]  authenticate db: local { authenticate: 1, nonce: "xxx", user: "__system", key: "xxx" }
 m31100| 2013-11-16T01:09:59.034-0500 [ReplicaSetMonitorWatcher] Socket recv() timeout  10.116.42.47:31100
 m31100| 2013-11-16T01:09:59.034-0500 [ReplicaSetMonitorWatcher] SocketException: remote: 10.116.42.47:31100 error: 9001 socket exception [RECV_TIMEOUT] server [10.116.42.47:31100] 
 m31100| 2013-11-16T01:09:59.034-0500 [ReplicaSetMonitorWatcher] DBClientCursor::init call() failed
 m31100| 2013-11-16T01:09:59.034-0500 [ReplicaSetMonitorWatcher] dev: lastError==0 won't report:DBClientBase::findN: transport error: ip-10-116-42-47:31100 ns: admin.$cmd query: { ismaster: 1 }
 
 ==> m31100| 2013-11-16T01:10:00.092-0500 [ReplicaSetMonitorWatcher] ERROR: Socket found pending data during connectivity check (idle 7 secs, remote host 10.116.42.47:31100)
 
 m31100| 2013-11-16T01:10:00.092-0500 [ReplicaSetMonitorWatcher] Fatal Assertion 16199
 m31100| 2013-11-16T01:10:00.101-0500 [ReplicaSetMonitorWatcher] 0x8e1d41d 0x8dc0c26 0x8d9ee51 0x8ddd2f0 0x83b919b 0x83c2a72 0x83d011f 0x83d381f 0x83d3a6d 0x83e840b 0x8da32c0 0x8da5d50 0x8e6782e 0x97c542 0x1e5b6e 
 m31100|  /data/mci/git@github.commongodb/mongo.git/master/mongod(_ZN5mongo15printStackTraceERSo+0x2d) [0x8e1d41d]
 m31100|  /data/mci/git@github.commongodb/mongo.git/master/mongod(_ZN5mongo10logContextEPKc+0x166) [0x8dc0c26]
 m31100|  /data/mci/git@github.commongodb/mongo.git/master/mongod(_ZN5mongo13fassertFailedEi+0xf1) [0x8d9ee51]
 m31100|  /data/mci/git@github.commongodb/mongo.git/master/mongod(_ZN5mongo6Socket16isStillConnectedEv+0x390) [0x8ddd2f0]
 m31100|  /data/mci/git@github.commongodb/mongo.git/master/mongod(_ZN5mongo18DBClientConnection16isStillConnectedEv+0x2b) [0x83b919b]
 m31100|  /data/mci/git@github.commongodb/mongo.git/master/mongod(_ZN5mongo19_getConnWithRefreshERNS_17ReplicaSetMonitor4NodeE+0x32) [0x83c2a72]
 m31100|  /data/mci/git@github.commongodb/mongo.git/master/mongod(_ZN5mongo17ReplicaSetMonitor6_checkEv+0xbf) [0x83d011f]
 m31100|  /data/mci/git@github.commongodb/mongo.git/master/mongod(_ZN5mongo17ReplicaSetMonitor5checkEv+0x1df) [0x83d381f]
 m31100|  /data/mci/git@github.commongodb/mongo.git/master/mongod(_ZN5mongo17ReplicaSetMonitor8checkAllEv+0x1ed) [0x83d3a6d]
 m31100|  /data/mci/git@github.commongodb/mongo.git/master/mongod(_ZN5mongo24ReplicaSetMonitorWatcher3runEv+0xeb) [0x83e840b]
 m31100|  /data/mci/git@github.commongodb/mongo.git/master/mongod(_ZN5mongo13BackgroundJob7jobBodyEv+0xe0) [0x8da32c0]
 m31100|  /data/mci/git@github.commongodb/mongo.git/master/mongod(_ZN5boost6detail11thread_dataINS_3_bi6bind_tIvNS_4_mfi3mf0IvN5mongo13BackgroundJobEEENS2_5list1INS2_5valueIPS7_EEEEEEE3runEv+0x30) [0x8da5d50]
 m31100|  /data/mci/git@github.commongodb/mongo.git/master/mongod [0x8e6782e]
 m31100|  /lib/i686/nosegneg/libpthread.so.0 [0x97c542]
 m31100|  /lib/i686/nosegneg/libc.so.6(clone+0x5e) [0x1e5b6e]
 m31100| 2013-11-16T01:10:00.101-0500 [ReplicaSetMonitorWatcher] 
 m31100| 
 m31100| ***aborting after fassert() failure
 m31100| 
 m31100| 
 m31100| Got signal: 6 (Aborted).



 Comments   
Comment by Daniel Pasette (Inactive) [ 09/Jan/14 ]

duplicate: SERVER-11730

Comment by Daniel Pasette (Inactive) [ 18/Nov/13 ]

Should leave the dassert and add logging in 2.5.5

Comment by Eric Milkie [ 17/Nov/13 ]

This is failing on a dassert that Greg and I added as part of the stale connection network checking. It's discovering that there is unsolicited data on the socket when it's doing the poll to check that the socket is still alive. We have plans to write code in 2.5.5 that will dump the unsolicited data when it happens, so that we can track down which command is responding incorrectly. I don't think this should block the 2.5.4 release – it's not a regression.
Should we disable the dassert for now?

Generated at Thu Feb 08 03:26:39 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.