[SERVER-25038] Comparisons made by $bucket at parse time must respect the collation Created: 13/Jul/16 Updated: 19/Nov/16 Resolved: 04/Oct/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Aggregation Framework, Querying |
| Affects Version/s: | None |
| Fix Version/s: | 3.4.0-rc1 |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | David Storch | Assignee: | David Storch |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Backwards Compatibility: | Fully Compatible | ||||
| Sprint: | Query 18 (08/05/16), Query 2016-08-29, Query 2016-09-19, Query 2016-10-10 | ||||
| Participants: | |||||
| Linked BF Score: | 0 | ||||
| Description |
|
The difficulty is that parsing of the aggregation pipeline is done before lock acquisition, but the final collation is not always known until after the collection lock has been acquired (since we may need to read the collection's metadata describing the collection default collation). max.hirschhorn and myself brainstormed a few potential solutions:
|
| Comments |
| Comment by Githook User [ 04/Oct/16 ] | |
|
Author: {u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}Message: | |
| Comment by Githook User [ 03/Oct/16 ] | |
|
Author: {u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}Message: This provides a way to do pre-parse validity checks. Full | |
| Comment by Charlie Swanson [ 10/Aug/16 ] | |
|
Code review url: https://mongodbcr.appspot.com/86280001/ | |
| Comment by Max Hirschhorn [ 21/Jul/16 ] | |
|
david.storch, I discussed this ticket a bit more with Charlie and he suggested adding a non-documented, internal option to ExpressionSwitch that verifies its case statements are sorted and returns an error if they aren't. This proposal is more akin to #3 in the list in the ticket's description, but at the level of the expression and not the stage. The argument against doing #4 is that it is currently possible to run a non-optimized pipeline and that would no longer be the case if a DocumentSourceBucket instance was added to the pipeline - there'd be no reason to give DocumentSourceBucket::getNext() a real implementation because it is expected to replace itself with a DocumentSourceGroup and a DocumentSourceSort. There is a separate consideration that we should give for work on this ticket related to when expressions specified in $bucket are optimized. In the current implementation in the specification in
While it is currently the case that we don't resolve the $cond expression to an ExpressionConstant unless all three of its operands can be resolved to constants, we may in the future want to optimize cases like the following to simply be the "then" or "else" expression.
However, we would need to know the correct collation to compare "foo" and "FOO" under in order to do so. asya, are we willing to change the specification for $bucket to only accept constants and not expressions that resolve to constants in order to simplify some of the integration with collation in $bucket? Otherwise if we go with Charlie's proposal, we'd want to defer those checks to $switch as well. |