[SERVER-6684] multi_mongos2.js SCE thrown in 2.0 mongod/2.2 mongos test Created: 01/Aug/12 Updated: 11/Jul/16 Resolved: 08/Aug/12 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | 2.2.0-rc1 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Greg Studer | Assignee: | Spencer Brody (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | buildbot | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Operating System: | ALL |
| Participants: |
| Description |
|
Not sure what is causing this. |
| Comments |
| Comment by auto [ 09/Aug/12 ] |
|
Author: {u'date': u'2012-08-09T08:25:33-07:00', u'email': u'spencer@10gen.com', u'name': u'Spencer T Brody'}Message: Make handling of stale configs in ParallelSortClusteredCursor safer |
| Comment by auto [ 08/Aug/12 ] |
|
Author: {u'date': u'2012-08-07T12:52:51-07:00', u'email': u'spencer@10gen.com', u'name': u'Spencer T Brody'}Message: Check result object of command for stale config in ParallelSortClusteredCursor. |
| Comment by Spencer Brody (Inactive) [ 07/Aug/12 ] |
|
The problem seems to be with the handling of StaleConfigExceptions in the count command. Running this same test with both mongos and mongod using 2.0.6 passes. In the logs from a purely 2.0 run you can see that at the point where it fails with a 2.2 mongos (during a count command), the same StaleConfigException is thrown and shows up in the logs, but the 2.0 mongos recovers while the 2.2 mongos does not. The mongos implementation of count in 2.0 had a bunch of logic for retrying on stale config exceptions (and sending an authoritative setShardVersion after 3 failed attempts), whereas the 2.2 mongos uses SHARDED->commandOp, which uses a ParallelSortClusteredCursor. I suspect that the ParallelSortClusteredCursor isn't handling StaleConfigExceptions in the same way that the count command did in 2.0, and that's what's causing the problem. |
| Comment by Greg Studer [ 01/Aug/12 ] |
|
Could be due to better detection of dropped collections, but unsure why it wouldn't fail in 2.0 tests also. |
| Comment by Greg Studer [ 01/Aug/12 ] |
|
bouncing_count.js may have a similar intermittent failure. |