[SERVER-3967] could not initialize cursor across all shards Created: 28/Sep/11  Updated: 30/Mar/12  Resolved: 02/Nov/11

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 2.0.0
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Y. Wayne Huang Assignee: Greg Studer
Resolution: Duplicate Votes: 1
Labels: mongos, sharding
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

linux x86_64 ubuntu 10.04.3 10gen stable package


Attachments: File mongos-1.log.gz    
Issue Links:
Duplicate
duplicates SERVER-3889 Possible for setShardVersion to never... Closed
Related
Operating System: Linux
Participants:

 Description   

after updating from 1.8.3 to 2.0.0, we are seeing numerous 'could not initialize cursor across all shards' reported by mongos. we have 4 shards, each a replicaset of two replicas and an arbiter. all nodes--mongos, config, replicas, & arbiters were updated to 2.0.0.

Tue Sep 27 21:19:26 [conn900] ns: my_db.my_coll could not initialize cursor across all shards because : ns: my_db.my_coll ClusteredCursor::query @ shard3/10.x.x.46:27017,10.x.x.47:27017 attempt: 5



 Comments   
Comment by Greg Studer [ 15/Oct/11 ]

This is the same issue as SERVER-3889, manifesting in a query across multiple shards. Fix is in rc and testing.

Comment by Alan Shang [ 14/Oct/11 ]

We got the same problem. Noticed exactly the same error after running for 5 days. After a mongos process was restarted, the error was gone, but the mongos process that's not restarted still produced the same error. This stops any one from using shards in production with 2.0. Quick fix is critical.

Comment by Y. Wayne Huang [ 29/Sep/11 ]

it happened about 60 times in 24 hours--most of them reported the same namespace/collection & shard.

Comment by Y. Wayne Huang [ 28/Sep/11 ]

it happens occasionally--i'll monitor the next 24 hours and report the frequency but it seems to be about once every couple of hours. if the same error was also reported in 1.8.x, we had 0 instances of the error since this mongos instance was created and only after updating to 2.0.0 did we start to see it.

Comment by Eliot Horowitz (Inactive) [ 28/Sep/11 ]

Is the count not initialize error happening consistently or occasionally?

Comment by Y. Wayne Huang [ 28/Sep/11 ]

yes, all shards are healthy (i can post rs.status() if you'd like). each shard has 3 members in the states PRIMARY, SECONDARY and ARBITER. there was some flipping of primary/secondary during the update and in a couple instances, we asked the new primary to step down (we're not using slaveOk so secondaries are more or less cold). we noticed that on step down, both non-arbiter nodes would become secondary for several seconds until one node was elected primary. i'm not sure if this is relevant. this is the same behavior we typically observe when calling rs.stepDown(), even in 1.8.x.

Comment by Eliot Horowitz (Inactive) [ 28/Sep/11 ]

Are all shards healthy?
Looks like 1 shard doesn't have a primary.

Comment by Eliot Horowitz (Inactive) [ 28/Sep/11 ]

Can you send the full mongos log?

Generated at Thu Feb 08 03:04:34 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.