[SERVER-33207] geo_borders.js fails in 2 shards sharded collections passthrough Created: 08/Feb/18 Updated: 29/Oct/23 Resolved: 12/Feb/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 3.7.2 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Charlie Swanson | Assignee: | David Storch |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Backwards Compatibility: | Fully Compatible | ||||
| Operating System: | ALL | ||||
| Sprint: | Query 2018-02-12, Query 2018-02-26 | ||||
| Participants: | |||||
| Linked BF Score: | 0 | ||||
| Description |
|
The index build at geo_borders.js:24, which is expected to fail, may fail on only one of the shards. This can happen by chance if the out-of-bounds points that cause the failure all get assigned to the same shard, leaving one shard that has no illegal points. |
| Comments |
| Comment by Max Hirschhorn [ 12/Feb/18 ] |
Sounds good to me. |
| Comment by Charlie Swanson [ 12/Feb/18 ] |
|
Yep. I'm on the same page - I don't see a reason to add another passthrough suite. |
| Comment by David Storch [ 12/Feb/18 ] |
I think it's unlikely that, aside from the aggregation system, there would be a routing bug that manifests in the single-shard case, but not the multi-shard or unsharded cases.
I'm not sure I'd characterize it that way. In my view, the problem is that the index build behavior presented to a client for the sharded case does not match the behavior presented to a client in the standalone case. The sharding team needs to do additional work in order to make failed index builds clean up properly in a sharded cluster in the way that they do on a standalone. In a way, this is much like having to blacklist a test for command x from the sharded collections passthrough because it does not function correctly against a sharded collection.
I guess we could audit tests in jstests/core/ looking for those that make assertions about failed index builds? I would propose holding off on such an audit unless we start seeing more failures like this one, however. It sounds like we're converging on leaving the sharded passthrough testing as is, i.e. not adding a new single-shard jsCore passthrough suite. max.hirschhorn charlie.swanson, can we consider this thread closed? |
| Comment by Githook User [ 12/Feb/18 ] |
|
Author: {'email': 'david.storch@10gen.com', 'name': 'David Storch', 'username': 'dstorch'}Message: |
| Comment by Max Hirschhorn [ 12/Feb/18 ] |
I'll reaffirm that I still don't know much about the codepath for these commands in mongos very well, and would defer to either of you or the Sharding team on whether it seems likely that the routing logic could have a bug in the single-shard vs unsharded case. The failure observed with the geo_borders.js test seems to me more of an issue that our JavaScript tests make assertions that depend on the chunk distribution among the shards. (Prior to the changes from |
| Comment by Charlie Swanson [ 09/Feb/18 ] |
|
Would you care to make an argument? This failure doesn't convince me that we're missing coverage. To me, this looks like a success of replacing that suite, since we figured out something that does not work when the collection is sharded, but does when it's unsharded. I don't think there's much value in providing guarantees/coverage of things that work when your collection is sharded but only lives on one shard? That describes the unsharded configuration? I think a motivation for a suite with one shard would look different. A bug that only manifested when all the data lived on a single shard, but worked fine in two shards would be more motivating. |
| Comment by David Storch [ 09/Feb/18 ] |
|
charlie.swanson max.hirschhorn, my planned fix is to blacklist geo_borders.js from the sharded_collections_jscore_passthrough suite. This makes me wonder whether or not it would be wise to reintroduce a variant of sharded_collections_jscore_passthrough that uses a single shard (in other words, a jsCore variant of aggregation_one_shard_sharded_collections). Do we still believe that this wouldn't add valuable coverage beyond sharding_jscore_passthrough? |
| Comment by David Storch [ 09/Feb/18 ] |
|
I spoke with kaloian.manassiev, and he confirmed that this falls within a known category of issues where we don't clean up properly on failure. Many related improvements are planned as future work. For now, we should change our testing to work around the problem. Since the sharded_collections_jscore_passthrough implicitly shards by {_id: "hashed"}, there isn't a good way to guarantee that both shards have an invalid out-of-bounds point. |