[SERVER-20578] stale_clustered.js in noPassthroughWithMongod_WT fails with "waiting for state indicator state for 300000ms" Created: 14/Sep/15  Updated: 07/Oct/15  Resolved: 25/Sep/15

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 3.1.9

Type: Bug Priority: Critical - P2
Reporter: Charlie Swanson Assignee: Kaloian Manassiev
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Sharding A (10/09/15)
Participants:

 Description   

stale_clustered.js in noPassthroughWithMongod_WT fails with "waiting for state indicator state for 300000ms". Failures observed on ASIO SSL Windows 2008R2, OS X 10.8, and SSL OS X 10.8 variants. See example task, example logs.

Excerpt:

[js_test:stale_clustered] 2015-09-14T05:33:48.188+0000 Status  : 5  target state : 3
[js_test:stale_clustered] 2015-09-14T05:33:48.436+0000  m20011| 2015-09-14T05:33:48.436+0000 I NETWORK  [conn39] end connection 10.168.85.54:64817 (9 connections now open)
2015-09-14 05:33:51 +0000	
[js_test:stale_clustered] 2015-09-14T05:33:51.602+0000 assert.soon failed, msg:waiting for state indicator state for 300000ms
[js_test:stale_clustered] 2015-09-14T05:33:51.602+0000 doassert@src/mongo/shell/assert.js:15:14
[js_test:stale_clustered] 2015-09-14T05:33:51.602+0000 assert.soon@src/mongo/shell/assert.js:194:13
[js_test:stale_clustered] 2015-09-14T05:33:51.602+0000 ReplSetTest.prototype.waitForIndicator@src/mongo/shell/replsettest.js:987:1
[js_test:stale_clustered] 2015-09-14T05:33:51.602+0000 ReplSetTest.prototype.waitForState@src/mongo/shell/replsettest.js:944:5
[js_test:stale_clustered] 2015-09-14T05:33:51.602+0000 ReplSetTest.prototype.overflow@src/mongo/shell/replsettest.js:1101:5
[js_test:stale_clustered] 2015-09-14T05:33:51.602+0000 @jstests\noPassthroughWithMongod\stale_clustered.js:77:1



 Comments   
Comment by Kaloian Manassiev [ 25/Sep/15 ]

This is a different failure in the server selection logic it seems. I have opened SERVER-20646 to track it.

Comment by J Rassi [ 25/Sep/15 ]

Re-opening, as this test is still failing in master.

Failures from the past 24 hours:

The test appears to still fail on the same line (line 83), but now with a "node is recovering" message instead of a "waiting for state" message:

[js_test:stale_clustered] 2015-09-25T08:32:02.727+0000 2015-09-25T08:32:02.726+0000 E QUERY    [thread1] Error: error: { "$err" : "node is recovering", "code" : 13436 } :
[js_test:stale_clustered] 2015-09-25T08:32:02.727+0000 _getErrorWithCode@src/mongo/shell/utils.js:23:13
[js_test:stale_clustered] 2015-09-25T08:32:02.727+0000 DBQuery.prototype.next@src/mongo/shell/query.js:278:1
[js_test:stale_clustered] 2015-09-25T08:32:02.727+0000 DBQuery.prototype.itcount@src/mongo/shell/query.js:372:9
[js_test:stale_clustered] 2015-09-25T08:32:02.727+0000 @jstests/noPassthroughWithMongod/stale_clustered.js:83:35
[js_test:stale_clustered] 2015-09-25T08:32:02.727+0000 
[js_test:stale_clustered] 2015-09-25T08:32:02.727+0000 failed to load: jstests/noPassthroughWithMongod/stale_clustered.js

Kal, please investigate.

Comment by Githook User [ 24/Sep/15 ]

Author:

{u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}

Message: SERVER-20578 ReplSetTest.overflow should wait for replication

Also reduce the ShardingTest oplog size in order to make tests run faster.

In addition, this reverts commit eee325e63005939199f6081b1899f1c2863b0530.
Branch: master
https://github.com/mongodb/mongo/commit/6d62d7f7bc0841ab48ae6b3f6fc69fa11682e2e9

Comment by David Storch [ 23/Sep/15 ]

kaloian.manassiev, as part of renabling this test, please remove the useClusterClientCursor setParameter at the beginning:

https://github.com/mongodb/mongo/blob/063d9b2dcc46a7b42ceea7d9596b96e5e7080225/jstests/noPassthroughWithMongod/stale_clustered.js#L19-L21

Rassi and I think that this was added unnecessarily in b1982bb7fb610.

Comment by Githook User [ 23/Sep/15 ]

Author:

{u'username': u'jrassi', u'name': u'Jason Rassi', u'email': u'rassi@10gen.com'}

Message: SERVER-20578 Temporarily disable stale_clustered.js
Branch: master
https://github.com/mongodb/mongo/commit/eee325e63005939199f6081b1899f1c2863b0530

Comment by J Rassi [ 23/Sep/15 ]

Three failures observed on OS X in the past 48 hours, one failure observed on SSL OS X 10.8 in the past 48 hours.

Bumping to P2. kaloian.manassiev, are you the right assignee for this ticket? If so, please work on this today, or point me towards someone else more appropriate.

Comment by Charlie Swanson [ 16/Sep/15 ]

Lowering to P4 since I haven't seen this very often. Still don't know why it's happening though.

Comment by Charlie Swanson [ 14/Sep/15 ]

spencer, any idea what might be happening? Or who might be able to answer that?

Generated at Thu Feb 08 03:54:38 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.