[SERVER-5797] Uncaught exception in count_slaveok.js Created: 09/May/12 Updated: 11/Jul/16 Resolved: 07/Jun/12 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication, Sharding |
| Affects Version/s: | None |
| Fix Version/s: | 2.1.2 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Ian Whalen (Inactive) | Assignee: | Randolph Tan |
| Resolution: | Done | Votes: | 0 |
| Labels: | 212push, buildbot | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
OS X 10.5 32-bit |
||
| Issue Links: |
|
||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
http://buildbot.mongodb.org/builders/OS%20X%2010.5%2032-bit/builds/3697/steps/test_9/logs/stdio |
| Comments |
| Comment by auto [ 08/Jun/12 ] | ||||||
|
Author: {u'login': u'', u'name': u'Randolph Tan', u'email': u'randolph@10gen.com'}Message: | ||||||
| Comment by auto [ 07/Jun/12 ] | ||||||
|
Author: {u'login': u'', u'name': u'Randolph Tan', u'email': u'randolph@10gen.com'}Message: Makre ReplicaSetMonitor fail fast when no usable master is available. | ||||||
| Comment by Randolph Tan [ 01/Jun/12 ] | ||||||
|
Similar problem, triggered in a different code path. To make this 100% reproducible, simply comment out the code that spawns the ReplicaSetMonitorWatcher. | ||||||
| Comment by Ian Whalen (Inactive) [ 01/Jun/12 ] | ||||||
|
Same failure just showed up again: http://buildbot.mongodb.org/builders/Windows%2064-bit%202008%2B/builds/366/steps/test_9/logs/stdio | ||||||
| Comment by auto [ 22/May/12 ] | ||||||
|
Author: {u'login': u'', u'name': u'Randolph Tan', u'email': u'randolph@10gen.com'}Message: Rewrote the count command in mongos to use ShardStrategy::commandOP and push handling of StaleConfigException to the caller. Also modified ParallelSortClusteredCursor to handle SyncClusterConnection not allowing the call method to be used on commands. | ||||||
| Comment by Randolph Tan [ 10/May/12 ] | ||||||
|
Cause: Primary of a 2 member replica set is down (part of the test) and checkShardVersion tries to call ReplicaSetMonitor::getMaster which would assert because there is no master. This happens intermittently depending on whether the ReplicaSetMonitorWatcher has realized that the primary is already down. To make this reproduce easily, simply insert a sleep in the test:
Because of this bug, you can't do a query/commands on a replica set shard that has no master... |