[SERVER-49828] Call "MatchExpression::optimize()" on collection validator expressions Created: 23/Jul/20 Updated: 06/Dec/22 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Mihai Andrei | Assignee: | Backlog - Query Optimization |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | qopt-team | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Assigned Teams: |
Query Optimization
|
| Participants: |
| Description |
|
Currently, collection validator expressions are not optimized. In theory, it would not be a breaking change to optimize these expressions as MatchExpression::optimize() produces a semantically equivalent expression. However, some work would need to be done to ensure that validation error generation works properly when optimize() gets called. More precisely, we would need to store two copies of the MatchExpression: the original and the optimized one. While the optimized one would be used for matching, the unoptimized one would be used to produce detailed validation errors. This extra copy of the original is necessary because the optimized expression is not guaranteed to match the original shape of the user-specified validator expression and would potentially produce an incorrect error. |
| Comments |
| Comment by Mihai Andrei [ 18/Sep/20 ] |
|
Well, the code which generates descriptive error messages runs under the assumption that the document failed to validate against the expression, so if the document does match the original validator expression, you will get a bogus error if you run the error generating code. The correct thing to do would be to add a check in the error generating code that verifies that the document fails to match against the original expression before generating an error. In the case that the document does match, you wouldn't provide any details, but this doesn't seem ideal at all since we should be generating descriptive errors for every failure. Given that it's theoretically possible to have different results for the optimized version and the original, I'm skeptical of pursuing this change. |
| Comment by David Percy [ 17/Sep/20 ] |
|
What should happen in this scenario? |
| Comment by David Storch [ 17/Sep/20 ] |
|
To elaborate on what Mihai said, simply calling optimize() on the validator would break the code we've developed for producing detailed doc validation errors. However, doing so could hypothetically have a performance benefit. I doubt this work is high priority, but it seems worth at least keeping on the backlog. |
| Comment by Mihai Andrei [ 17/Sep/20 ] |
|
james.wahlin I'm going to update the description in a bit, but we will have to do something like what david.percy suggested where we have to store the optimized expression for matching and the unoptimized for producing errors because the optimized expression is not guaranteed to match the original shape of the user-specified validator expression. |
| Comment by James Wahlin [ 17/Sep/20 ] |
|
david.storch / mihai.andrei can you provide an update on where this stands and update the description if needed? Is it the case that error reporting has been improved and expected to be sufficient if we start optimize the validator expression? |
| Comment by David Storch [ 17/Sep/20 ] |
|
mihai.andrei I'm going to move this out of the doc validation project and into the triage queue. james.wahlin charlie.swanson is this something that should be triaged by the QO team? |
| Comment by David Percy [ 20/Aug/20 ] |
|
Another strategy would be to keep both the optimized and the original MatchExpression. If the optimized one fails, you can rerun the original to get the explanation. The problem with this idea is that sometimes optimize() does change the behavior of an expression. (For one example, it assumes $add is associative.) So in theory the original and optimized expressions could disagree: true vs false. |