[SERVER-18500] Resolve ambiguity of positional projections with multiple implicit array traversal predicates in certain cases Created: 15/May/15  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: Querying
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: J Rassi Assignee: Backlog - Query Optimization
Resolution: Unresolved Votes: 3
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by SERVER-30998 Update and $ operator mismatch with m... Closed
is duplicated by SERVER-53896 positional $ operator when there are ... Closed
is duplicated by SERVER-54687 `$pull`ed index should not override q... Closed
is duplicated by SERVER-21794 Array field limitation in query docum... Closed
is duplicated by SERVER-19866 Allow more than one array in query wh... Closed
Related
related to SERVER-27738 Positional operator ($) can choose wr... Closed
related to SERVER-14662 Positional projection queries (and po... Closed
Assigned Teams:
Query Optimization
Participants:

 Description   

In general, if there are multiple query predicates that require implicit array traversal, the behavior of the positional projection operator is undefined. For instance, should the following projection return the zeroth or first array element?

db.collection.drop();
db.collection.insert({a:[{b:1},{c:1}]});
db.collection.find({"a.b":1,"a.c":1},{"a.$":1}); // Return "a.0" element, or "a.1" element?

SERVER-14662 tracks work to make such queries return an error to the user.

However, there are cases in which this ambiguity can be resolved. Consider the following projection:

db.collection.drop();
db.collection.insert({a:[{b:1},{c:1}],d:[{e:1},{f:1}]});
db.collection.find({"a.b":1,"d.f":1}, {"a.$":1});

In this case, we can resolve the ambiguity because "a.$" matches the prefix of path "a.b" but does not match a prefix of path "d.f".



 Comments   
Comment by Asya Kamsky [ 11/Aug/17 ]

Please note that with SERVER-831 and SERVER-1243 resolved, there is new syntax available to updates which completely avoids the types of ambiguities mentioned in the comments of this ticket for updates.

This ticket remains open because it's tracking the ability to return a single array value with positional element in projection only.

Comment by Tess Avitabile (Inactive) [ 17/Nov/16 ]

SERVER-14662, which caused positional projections/updates to error if multiple arrays were encountered when matching the filter, will be reverted in SERVER-27088. We are currently working on defining the semantics we would like for array updates.

Comment by Tess Avitabile (Inactive) [ 06/Apr/16 ]

Thanks for your suggestion on how to define positional operator semantics when there are multiple array traversals. We are considering what semantics are best to implement.

That is correct that currently we do not allow redundant predicates if they result in multiple array traversals. We may resolve this problem separately from the larger problem of ambiguous positional projections/updates. Stripping of redundant query predicates is tracked in SERVER-22857.

Comment by NOVALUE Mitar [ 06/Apr/16 ]

Oh, I missed that this ticket is just about projections, not also updates. My comments were about updates.

But for my proposal above I do not see any ambiguity. The core of the proposal is that you should allow updates when all conditions match. This is not true at the moment. So the semantics would be that $ corresponds to the first document which matched all the predicates, and if multiple predicates match, you pick the $ with highest value.

So:

db.coll.update({$and: [{a: 1}, {a: 2}]}, {$set: {'a.$': 3}})

Would update document to:

{a: [1, 3], b: [3, 4], c: 5}

Also for me the important case is to allow such query:

db.test.update({ 'nested.required._id': 'phitZKwpTqiH76BWS',
  nested: 
   { '$elemMatch': 
      { 'required._id': 'phitZKwpTqiH76BWS',
        'required.field2': { '$ne': 'Field 2 - 2' } } } }, { '$set': 
   { 'nested.$.required.field2': 'Field 2 - 2' } })

Which simply has only redundant predicates, but it is otherwise completely unambiguous.

Comment by Tess Avitabile (Inactive) [ 23/Mar/16 ]

mitar, thanks for your question. We encountered some design uncertainties, and have temporarily put this work on hold. As of SERVER-14662, we only allow positional projections in queries with one implicit array traversal. For example, if we have the document

{a: [1, 2], b: [3, 4], c: 5}

then we allow the query

db.coll.find({a: 1, c: 5}, {'a.$': 1})

because we only traverse one array while matching the document. We do not allow the query

db.coll.find({a: 1, b: 1}, {'a.$': 1})

because we traverse two arrays while matching the document. The same requirements hold for positional updates.

Our intention in this ticket is to allow all queries with positional projections, where the positional projection matches the filter once. For example, if we have the document

{a: [1, 2], b: [3, 4], c: 5}

then we allow the query

db.coll.find({a: 1, b: 4}, {'a.$': 1})

because the positional projection matches the filter once. We do not allow the query

db.coll.find({$and: [{a: 1}, {a: 2}]}, {'a.$': 1})

because the positional projection matches the filter twice. This is consistent with our requirement that the positional projection match the filter at least once, and that there be at most one positional projection.

However, positional update operators do not behave in this way. Positional update operators are not required to match the filter, and you are allowed to have multiple positional update operators. Our design for projections thus will not work for updates. However, we do not wish to remove this functionality from updates, since it allows you to update one array based on the index of another, e.g.

db.coll.update({a: 1}, {$set: {'a.$': 2, 'b.$': 2}})

This behavior may be important to some customers. We're trying to find a design that will be consistent between projections and updates, while preserving this update functionality, so we have put this work on hold for now.

Comment by NOVALUE Mitar [ 07/Mar/16 ]

What are plans to address this? To my understanding, latest changes are simply preventing completely any queries of this nature? And this ticket is to allow some special cases? What about my comment above? Will ambiguity resolving be implemented how I described? Or are there other plans?

Comment by NOVALUE Mitar [ 15/May/15 ]

I will repeat what I wrote in the SERVER-14662. I think there is pretty clear and expected semantics for many more cases than those described above.

As I reported in SERVER-17999, I think the ambiguity comes from the fact that sometimes there are redundant conditions which make $ misbehave and that is surprising. In the case above, $exists which is not really needed. And in my case:

db.test.update({ 'nested.required._id': 'phitZKwpTqiH76BWS',
  nested: 
   { '$elemMatch': 
      { 'required._id': 'phitZKwpTqiH76BWS',
        'required.field2': { '$ne': 'Field 2 - 2' } } } }, { '$set': 
   { 'nested.$.required.field2': 'Field 2 - 2' } })

The first nested.required._id is duplicated by one in $elemMatch. The issue is that if such queries are made automatically (like they are in my case) then it is hard to detect such things in advance.

Wouldn't a simple semantic always be that the highest position should win? So the highest index number is the one $ uses? Because if you are traversal the array the later are you in the array the more specific you are in some way. If one predicate can determine the value by just looking at the first value of the array, and another by looking at 100 values, then the second one wins.

As I understand, you already ask all predicates for their $ value. So you should just add code which picks the largest number between two. Not the second number.

So semantics could be: $ corresponds to the first document which matched all the predicates, which mean the location with the highest index number, because if there was a match earlier, then some other highest index number would be the $ position. I think this is pretty like one expect and it does not surprise.

Also, in SERVER-17999 I reported that the number of affected documents is being returned wrongly for such queries. They are returned always non-zero, because match succeeded, but there was no real modification necessary. I think that should be fixed somehow. (Maybe the number of affected documents should be equal to the number of modified documents and not do the number of found documents.)

$ should be returning the position of an element which satisfied all conditions, not an arbitrary condition. Because also selecting is done based on all conditions (when there is AND between predicates). I think the fact that you internally split combined AND predicate into smaller predicates is not a reason to not then compute what is the position of an overall AND predicate. You want $ to represent the first match of all predicates. Currently you are not returning that. So this is a bug?

And luckily for arrays it seems there is an easy way to compute that position. You just take max between all sub-predicates.

Even with OR query there is a reasonable position to return if $or matches in order, like in most programming languages does. Then the first predicate with matches also has an associated position. But I think that in Mongo you do not have a defined order for $or and short-circuiting?

Moreover, it is not that multiple predicated contradict. But they specify their own conditions. It is like in any boolean logic AND clauses. A AND B does not mean that A and B contradict, but that both A and B has to be true. If multiple predicates would contradict, then there would be a no matching document to begin with. Even when A is stricter (matches less) than B, that is not a contradiction. They obviously do not contradict because there is a non-empty set of documents (with the first one in that set) of where they overlap.

You maybe mean that there is multiple values they return as a position and those values are not all the same. But that is normal, because each sub-predicate returns the value for themselves. But for the predicate as a whole then those values have to be combined. And it is so that the stricter one is the one which is the result of an AND top-level predicate. In arrays it is easy to know which is the stricter, the latter.

Generated at Thu Feb 08 03:47:52 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.