[SERVER-3967] could not initialize cursor across all shards Created: 28/Sep/11 Updated: 30/Mar/12 Resolved: 02/Nov/11 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 2.0.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Y. Wayne Huang | Assignee: | Greg Studer |
| Resolution: | Duplicate | Votes: | 1 |
| Labels: | mongos, sharding | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
linux x86_64 ubuntu 10.04.3 10gen stable package |
||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Operating System: | Linux | ||||||||||||
| Participants: | |||||||||||||
| Description |
|
after updating from 1.8.3 to 2.0.0, we are seeing numerous 'could not initialize cursor across all shards' reported by mongos. we have 4 shards, each a replicaset of two replicas and an arbiter. all nodes--mongos, config, replicas, & arbiters were updated to 2.0.0. Tue Sep 27 21:19:26 [conn900] ns: my_db.my_coll could not initialize cursor across all shards because : ns: my_db.my_coll ClusteredCursor::query @ shard3/10.x.x.46:27017,10.x.x.47:27017 attempt: 5 |
| Comments |
| Comment by Greg Studer [ 15/Oct/11 ] |
|
This is the same issue as |
| Comment by Alan Shang [ 14/Oct/11 ] |
|
We got the same problem. Noticed exactly the same error after running for 5 days. After a mongos process was restarted, the error was gone, but the mongos process that's not restarted still produced the same error. This stops any one from using shards in production with 2.0. Quick fix is critical. |
| Comment by Y. Wayne Huang [ 29/Sep/11 ] |
|
it happened about 60 times in 24 hours--most of them reported the same namespace/collection & shard. |
| Comment by Y. Wayne Huang [ 28/Sep/11 ] |
|
it happens occasionally--i'll monitor the next 24 hours and report the frequency but it seems to be about once every couple of hours. if the same error was also reported in 1.8.x, we had 0 instances of the error since this mongos instance was created and only after updating to 2.0.0 did we start to see it. |
| Comment by Eliot Horowitz (Inactive) [ 28/Sep/11 ] |
|
Is the count not initialize error happening consistently or occasionally? |
| Comment by Y. Wayne Huang [ 28/Sep/11 ] |
|
yes, all shards are healthy (i can post rs.status() if you'd like). each shard has 3 members in the states PRIMARY, SECONDARY and ARBITER. there was some flipping of primary/secondary during the update and in a couple instances, we asked the new primary to step down (we're not using slaveOk so secondaries are more or less cold). we noticed that on step down, both non-arbiter nodes would become secondary for several seconds until one node was elected primary. i'm not sure if this is relevant. this is the same behavior we typically observe when calling rs.stepDown(), even in 1.8.x. |
| Comment by Eliot Horowitz (Inactive) [ 28/Sep/11 ] |
|
Are all shards healthy? |
| Comment by Eliot Horowitz (Inactive) [ 28/Sep/11 ] |
|
Can you send the full mongos log? |