[SERVER-4037] mongos: writeback failed because of stale config Created: 07/Oct/11  Updated: 11/Jul/16  Resolved: 08/Oct/11

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 2.0.0-rc2
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Y. Wayne Huang Assignee: Unassigned
Resolution: Done Votes: 0
Labels: mongos
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

ubuntu 10.04 x86_64


Operating System: Linux
Participants:

 Description   

we are seeing mongos output about 300 errors/sec of the following:
Fri Oct 7 14:15:12 [WriteBackListener-10.x:27021] writeback failed because of stale config, retrying attempts: 58673730
Fri Oct 7 14:15:12 [WriteBackListener-10.x:27021] writeback failed because of stale config, retrying attempts: 58673731
...

the attempts appear to be going to a shard which previously had a similar problem of outputting a ton of errors of the following:
Sat Sep 24 14:44:24 [conn175294] Assertion: 13388:[discovery.items] shard version not ok in Client::Context: client in sharded mode, but doesn't have version set for this collection: discovery.items myVersion: 1|98

this is with mongos nightly as of oct 6.

we are also seeing sporadic errors of "db assertion failed" when running map reduce jobs, similar to SERVER-3081:
Wed Sep 28 03:27:08 [conn11550] ERROR: sharded m/r failed on shard:
shard1/10.x:27017,10.y:27017 error:

{ assertion: "[mydb.mycoll] shard version not ok in Client::Context: client in sharded mode, but doesn't have version set for this collection...", assertionCode: 13388, errmsg: "db assertion failure", ok: 0.0 }


Thu Sep 29 02:12:13 [conn52340] ERROR: sharded m/r failed on shard:
shard4/10.x:27017,10.y:27017 error:

{ assertion: "[mydb.mycoll] shard version not ok in Client::Context: client in sharded mode, but doesn't have version set for this collection...", assertionCode: 13388, errmsg: "db assertion failure", ok: 0.0 }


Wed Oct 5 14:50:03 [conn360741] ERROR: sharded m/r failed on shard:
shard1/10.x:27017,10.y:27017 error:

{ assertion: "[mydb.mycoll] shard version not ok in Client::Context: client in sharded mode, but doesn't have version set for this collection...", assertionCode: 13388, errmsg: "db assertion failure", ok: 0.0 }


Fri Oct 7 02:07:41 [conn26339] ERROR: sharded m/r failed on shard:
shard2/10.x:27017,10.y:27017 error:

{ assertion: "[mydb.mycoll] shard version not ok in Client::Context: client in sharded mode, but doesn't have version set for this collection: revinate_p...", assertionCode: 13388, errmsg: "db assertion failure", ok: 0.0 }



 Comments   
Comment by Y. Wayne Huang [ 21/Oct/11 ]

SERVER-4118

it appears mongos does not spew log entries anymore but it certainly is getting into a tight loop of connecting and issuing queries for some reason

Comment by Eliot Horowitz (Inactive) [ 21/Oct/11 ]

Can you open a new ticket with those logs, etc...

Comment by Y. Wayne Huang [ 20/Oct/11 ]

Eliot, we upgraded to the nightly build the day of your comment and hadn't seen the problem in a while but it returned this morning. This time it didn't seem to generate log messages on mongos but it created upwards of 1-2k of command ops/sec on two of our shards. Bouncing mongos fixed the issue. Therefore I don't believe the fix works in all cases. Are there subsequent fixes in the 2.0.1 rc that are also related to this problem? If not, we should re-open this.

Comment by Eliot Horowitz (Inactive) [ 10/Oct/11 ]

The fix was not in that version but is in the current nightly.

Comment by Y. Wayne Huang [ 10/Oct/11 ]

Mon Oct 10 13:39:52 ./mongos db version v2.0.1-pre-, pdfile version 4.5 starting (--help for usage)
Mon Oct 10 13:39:52 git version: 671479a616924e405d25cb79581f27b934c7fcff
Mon Oct 10 13:39:52 build info: Linux bs-linux64.10gen.cc 2.6.21.7-2.ec2.v1.2.fc8xen #1 SMP Fri Nov 20 17:48:28 EST 2009 x86_64 BOOST_LIB_VERSION=1_41

Comment by Eliot Horowitz (Inactive) [ 10/Oct/11 ]

what git hash?
nightly is built from head, but may be delayed.

Comment by Y. Wayne Huang [ 10/Oct/11 ]

hi Eliot – you mentioned these issues are fixed in 2.0.0 and 2.0 head, respectively. we are running 2.0 nightly (assume that comes from 2.0 head) and we still see this issue as of this morning. we can try a new mongos from last night but since you indicated the issues were fixed already and we have the nightly from friday, it seems it would not help. also, can you link the two issues you're referring to?

Comment by Eliot Horowitz (Inactive) [ 08/Oct/11 ]

There were cases for this.
1 is fixed in 2.0.0
The other in 2.0 head for 2.0.1

Comment by Y. Wayne Huang [ 07/Oct/11 ]

restarting mongos stopped the infinite retry, which was causing 300-400 command ops/sec on one shard and increasing the write lock % from < 1% to 8%. this was effectively dos'ing one of our shards.

Generated at Thu Feb 08 03:04:46 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.