[SERVER-14662] Positional projection queries (and positional update ops) should fail with error if multiple arrays encountered Created: 23/Jul/14  Updated: 03/Jun/19  Resolved: 28/Jul/18

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: João Acabado Assignee: Tess Avitabile (Inactive)
Resolution: Won't Fix Votes: 5
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File match_expr.png     PNG File repro.png    
Issue Links:
Duplicate
is duplicated by SERVER-17159 $ Positional Operator Only Updating F... Closed
is duplicated by SERVER-1013 positional $ operator field mismatch Closed
is duplicated by SERVER-11537 Incosistent behavior with $or and $el... Closed
is duplicated by SERVER-14837 Positional Operator matching wrong ne... Closed
is duplicated by SERVER-16409 Updates against paths composed with a... Closed
is duplicated by SERVER-17999 Regression between MongoDB 2.4 and 2.6 Closed
is duplicated by SERVER-41382 Update in nested array based on Date ... Closed
Related
related to DOCS-12773 Explicitly refer to "Single Field Mat... Closed
related to SERVER-27088 Revert changes to positional projecti... Closed
related to DOCS-3874 Add warning about undefined behavior ... Closed
is related to SERVER-16888 $ positional operator with $elemMatch... Closed
is related to SERVER-11537 Incosistent behavior with $or and $el... Closed
is related to SERVER-21794 Array field limitation in query docum... Closed
is related to SERVER-18500 Resolve ambiguity of positional proje... Backlog
Sprint: Query 10 (02/22/16)
Participants:
Case:

 Description   

Running the following commands on 2.6 and 2.4 gives different output when running following commands:

db.createCollection("PositionalOperatorRepro")
db.PositionalOperatorRepro.insert({"_id": "test1",  "Items": [ { "Id": 1 }, { "Id": 2 }  ] })
db.PositionalOperatorRepro.find({ "_id" : "test1", "Items" : { "$exists" : true, "$elemMatch" : { "Id" : 2 } } } , { "Items.$" : 1 })

// 2.4 returns
{.. Items: [{Id:2}]}
// 2.6 returns
{.. Items: [{Id:1}]}
// 2.6 without $exists
{.. Items: [{Id:2}]}

If we remove the "$exists" : true criteria the output will be the same.

Is this change documented anywhere? There could be compatibility issues when upgrading.

I'm attaching a screen shot where I reproduced this behavior.



 Comments   
Comment by Ramon Fernandez Marina [ 28/Jul/18 ]

I'm updating the metadata on this ticket to close it as "Won't Fix", since the changes in it were later reverted and never made it into a stable MongoDB version.

SERVER-18500 remains open to discuss and implement improvements in this area. See also the documentation on using arrayFilters for more information.

Regards,
Ramón.

Comment by Githook User [ 17/Nov/16 ]

Author:

{u'username': u'tessavitabile', u'name': u'Tess Avitabile', u'email': u'tess.avitabile@mongodb.com'}

Message: Revert "SERVER-14662 Reject ambiguous positional projections and updates"

This reverts commit 2eea3f09aea9c92df9aa0d4e47840869bf04d7b8.
Branch: v3.4
https://github.com/mongodb/mongo/commit/5c5fe01994749cdd9ef35d0bf13e0fd67abf7e31

Comment by Githook User [ 17/Nov/16 ]

Author:

{u'username': u'tessavitabile', u'name': u'Tess Avitabile', u'email': u'tess.avitabile@mongodb.com'}

Message: Revert "SERVER-14662 fix duplicate uassert error code"

This reverts commit 0df1eabb730f54c72dd545fc2b5f5c7f630e7f3c.
Branch: v3.4
https://github.com/mongodb/mongo/commit/b9c8db05bbdb8001eca3139dc76dbc9bcfe7470e

Comment by Githook User [ 17/Nov/16 ]

Author:

{u'username': u'tessavitabile', u'name': u'Tess Avitabile', u'email': u'tess.avitabile@mongodb.com'}

Message: Revert "SERVER-14662 Reject ambiguous positional projections and updates"

This reverts commit 2eea3f09aea9c92df9aa0d4e47840869bf04d7b8.
Branch: master
https://github.com/mongodb/mongo/commit/5c3f83de911a2d7a512954349925b3c6056b886e

Comment by Githook User [ 17/Nov/16 ]

Author:

{u'username': u'tessavitabile', u'name': u'Tess Avitabile', u'email': u'tess.avitabile@mongodb.com'}

Message: Revert "SERVER-14662 fix duplicate uassert error code"

This reverts commit 0df1eabb730f54c72dd545fc2b5f5c7f630e7f3c.
Branch: master
https://github.com/mongodb/mongo/commit/d9c598ac6d5f79cde4d5439eb21f773b025e917f

Comment by Tess Avitabile (Inactive) [ 17/Nov/16 ]

This change produced undesirable effects, and will be reverted in SERVER-27088. We are currently working on defining the semantics we would like for array updates.

Comment by Tess Avitabile (Inactive) [ 10/Feb/16 ]

If a positional operator appears in a projection or update, and there were multiple array matches against the document, we will issue an error.

Example:

db.coll.insert({a: [1, 2], b: [3, 4]});
db.coll.find({a: 1, b: 4}, {_id: 0, 'a.$': 1});

Previously:

> WriteResult({ "nInserted" : 1 })
> { "a" : [ 2 ] }

Now:

> WriteResult({ "nInserted" : 1 })
> Error: error: {
	"waitedMS" : NumberLong(0),
	"ok" : 0,
	"errmsg" : "Executor error during find command: InternalError: ambiguous positional projection",
	"code" : 96
}

Example:

db.coll.insert({a: [1, 2], b: [3, 4]});
db.coll.update({a: 1, b: 4}, {$set: {'a.$': 5}});
db.coll.find({}, {_id: 0})

Previously:

> WriteResult({ "nInserted" : 1 })
> WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
> { "a" : [ 1, 5 ], "b" : [ 3, 4 ] }

Now:

> WriteResult({ "nInserted" : 1 })
> WriteResult({
	"nMatched" : 0,
	"nUpserted" : 0,
	"nModified" : 0,
	"writeError" : {
		"code" : 34412,
		"errmsg" : "ambiguous positional update operation"
	}
})
> { "a" : [ 1, 2 ], "b" : [ 3, 4 ] }

Comment by Githook User [ 10/Feb/16 ]

Author:

{u'username': u'tessavitabile', u'name': u'Tess Avitabile', u'email': u'tess.avitabile@mongodb.com'}

Message: SERVER-14662 fix duplicate uassert error code
Branch: master
https://github.com/mongodb/mongo/commit/0df1eabb730f54c72dd545fc2b5f5c7f630e7f3c

Comment by Githook User [ 10/Feb/16 ]

Author:

{u'username': u'tessavitabile', u'name': u'Tess Avitabile', u'email': u'tess.avitabile@mongodb.com'}

Message: SERVER-14662 Reject ambiguous positional projections and updates
Branch: master
https://github.com/mongodb/mongo/commit/2eea3f09aea9c92df9aa0d4e47840869bf04d7b8

Comment by J Rassi [ 15/May/15 ]

Example of a case where the ambiguity can't be resolved:

db.collection.drop();
db.collection.insert({a:[{b:1},{c:1}]});
db.collection.find({"a.b":1,"a.c":1},{"a.$":1}); // Return "a.0" element, or "a.1" element?

I'm not convinced that the positional operator can be made to have intuitive semantics for the example quoted above.

Comment by NOVALUE Mitar [ 15/May/15 ]

But why then even do this, if we can simply fix this for all cases and have pretty simple and expected semantics?

Comment by J Rassi [ 15/May/15 ]

mitar: improving semantics for these operations is being tracked at SERVER-18500.

Comment by NOVALUE Mitar [ 15/May/15 ]

Why fail? There is clear semantics which could be done here?

Comment by NOVALUE Mitar [ 11/Apr/15 ]

Moreover, it is not that multiple predicated contradict. But they specify their own conditions. It is like in any boolean logic AND clauses. A AND B does not mean that A and B contradict, but that both A and B has to be true. If multiple predicates would contradict, then there would be a no matching document to begin with. Even when A is stricter (matches less) than B, that is not a contradiction. They obviously do not contradict because there is a non-empty set of documents (with the first one in that set) of where they overlap.

You maybe mean that there is multiple values they return as a position and those values are not all the same. But that is normal, because each sub-predicate returns the value for themselves. But for the predicate as a whole then those values have to be combined. And it is so that the stricter one is the one which is the result of an AND top-level predicate. In arrays it is easy to know which is the stricter, the latter.

Comment by NOVALUE Mitar [ 11/Apr/15 ]

After more thought I think this is a bug, not a feature request. $ should be returning the position of an element which satisfied all conditions, not an arbitrary condition. Because also selecting is done based on all conditions (when there is AND between predicates). I think the fact that you internally split combined AND predicate into smaller predicates is not a reason to not then compute what is the position of an overall AND predicate. You want $ to represent the first match of all predicates. Currently you are not returning that. So this is a bug?

And luckily for arrays it seems there is an easy way to compute that position. You just take max between all sub-predicates.

Even with OR query there is a reasonable position to return if $or matches in order, like in most programming languages does. Then the first predicate with matches also has an associated position. But I think that in Mongo you do not have a defined order for $or and short-circuiting?

Comment by NOVALUE Mitar [ 11/Apr/15 ]

As I reported in SERVER-17999, I think the ambiguity comes from the fact that sometimes there are redundant conditions which make $ misbehave and that is surprising. In the case above, $exists which is not really needed. And in my case:

db.test.update({ 'nested.required._id': 'phitZKwpTqiH76BWS',
  nested: 
   { '$elemMatch': 
      { 'required._id': 'phitZKwpTqiH76BWS',
        'required.field2': { '$ne': 'Field 2 - 2' } } } }, { '$set': 
   { 'nested.$.required.field2': 'Field 2 - 2' } })

The first nested.required._id is duplicated by one in $elemMatch. The issue is that if such queries are made automatically (like they are in my case) then it is hard to detect such things in advance.

Wouldn't a simple semantic always be that the highest position should win? So the highest index number is the one $ uses? Because if you are traversal the array the later are you in the array the more specific you are in some way. If one predicate can determine the value by just looking at the first value of the array, and another by looking at 100 values, then the second one wins.

As I understand, you already ask all predicates for their $ value. So you should just add code which picks the largest number between two. Not the second number.

So semantics could be: $ corresponds to the first document which matched all the predicates, which mean the location with the highest index number, because if there was a match earlier, then some other highest index number would be the $ position. I think this is pretty like one expect and it does not surprise.

Also, in SERVER-17999 I reported that the number of affected documents is being returned wrongly for such queries. They are returned always non-zero, because match succeeded, but there was no real modification necessary. I think that should be fixed somehow. (Maybe the number of affected documents should be equal to the number of modified documents and not do the number of found documents.)

Comment by J Rassi [ 16/Jan/15 ]

To reiterate Thomas above: when multiple expressions in the query predicate perform implicit array matches, the value of the positional operator is not specified. This is documented in the reference page for the positional operator, but users that have not read the reference page will understandably be surprised when the element they are intending to project/update does not match with the element actually projected/updated.

The most straightforward way to address this issue in the server is to fail these operations with an error. As a more ambitious approach, we could actually define the semantics for these operations. For example, we could determine which match expression should be generating the positional operator's value by using prefix matching on the path containing the positional operator. It's not always possible to disambiguate in these cases, but we could generate an error for cases when the ambiguity can't be resolved.

Example of a case where the ambiguity can be resolved:

db.collection.drop();
db.collection.insert({a:[{b:1},{c:1}],d:[{e:1},{f:1}]});
db.collection.find({"a.b":1,"d.f":1},{"a.$":1}); // The "a.0" element should be returned.

Example of a case where the ambiguity can't be resolved:

db.collection.drop();
db.collection.insert({a:[{b:1},{c:1}]});
db.collection.find({"a.b":1,"a.c":1},{"a.$":1}); // Return "a.0" element, or "a.1" element?

Comment by João Acabado [ 20/Nov/14 ]

rc2.8 still has this behavior. Migrations from 2.4 to rc2.8 will suffer from this.

Comment by David Storch [ 08/Aug/14 ]

As described in SERVER-11537, positional update has the same undefined semantics for contradictory predicates as positional projection. For instance,

> db.test.insert({nested: [{a: 1, slug: 'aaa'}, {a: 1, slug: 'aaa'}, {a: 2, slug: 'aaa'}]})
> db.test.find()
{ "_id" : ObjectId("52763c5cb6786ce4ff406cbf"), "nested" : [ { "a" : 1, "slug" : "aaa" }, { "a" : 1, "slug" : "aaa" }, { "a" : 2, "slug" : "aaa" } ] }
> db.test.update({'nested.a': 1, $or: [{'nested': {$elemMatch: {a: 1, slug: {$exists: true}}}}]}, {$unset: {'nested.$.slug': ''}})
> db.test.find()
{ "_id" : ObjectId("52763c5cb6786ce4ff406cbf"), "nested" : [ { "a" : 1 }, { "a" : 1, "slug" : "aaa" }, { "a" : 2, "slug" : "aaa" } ] }
> db.test.update({'nested.a': 1, $or: [{'nested': {$elemMatch: {a: 1, slug: {$exists: true}}}}]}, {$unset: {'nested.$.slug': ''}})
> db.test.find()
{ "_id" : ObjectId("52763c5cb6786ce4ff406cbf"), "nested" : [ { "a" : 1 }, { "a" : 1, "slug" : "aaa" }, { "a" : 2, "slug" : "aaa" } ] }

One might expect the second update() above to unset the slug field from the second array element. However, there are contradictory predicates over "nested.a", causing the update to be applied to the first array element rather than the second.

Comment by Thomas Rueckstiess [ 06/Aug/14 ]

Hi João,

Thanks, we're following your suggestion and are going to update the documentation to make the interaction of $elemMatch and positional projection clearer.

A good starting point in the source code for the implementation of projection is here: https://github.com/mongodb/mongo/blob/35f827aef4ddfcf9acb9e4b90cb200ff29183b7c/src/mongo/db/exec/projection_exec.cpp#L237

We're rephrasing this ticket to track the fact that the positional operator semantics are not well-defined and keep it open as a feature request.

Regards,
Thomas

Comment by João Acabado [ 04/Aug/14 ]

Thanks a lot for your explanation, we went with the little change.

From a user point of view I would suggest to clarify the documentation regarding the criteria on the find command. I would have avoided this had I known if the $elemMatch implies the $exists but from the little I read on the $ operator and the find I thought it would return me some element that matched all the criteria.

For curiosity, could point me to the source code where this is implemented? I fiddled around your solution but didn't manage to get to it.

Comment by Thomas Rueckstiess [ 01/Aug/14 ]

Hi João,

Thanks for reporting this. We have re-written the query engine in 2.6 and there have been some changes. You are correct in that the two versions return different results; this is due to the fact that the projection language in such a case is not well-defined, because the two predicates provide contradicting behavior. This has been the case for both 2.4 and 2.6. They each return somewhat arbitrarily one of the array elements, they just happen to be different in both versions due to implementation changes. I'd expect there to be many other similar cases where the results in both versions happen to be the same.

I'd like to explain some more what is going on (specifically in the new implementation in 2.6)

When the query engine gets a projection query like the one you propose

db.PositionalOperatorRepro.find({ "_id" : "test1", "Items" : { "$exists" : true, "$elemMatch" : { "Id" : 2 } } } , { "Items.$" : 1 })

it turns the query into a match expression tree, that would look like this:

The tree always gets sorted into a canonical form, and the EXISTS node happens to be at the end. After evaluating the match expression, the projection stage queries each of the leaves (the yellow nodes in the graph: EQ, EQ inside ELEMMATCH, EXISTS) for the position of the match (the value of $). It just happens that the EXISTS is asked at the end, and because the existence of "Items" can be determined after looking at the first element ($ = 0) of the array, it returns 0 and overwrites all previous $ values, including $=1 from the EQ inside the ELEMMATCH.

We are discussing internally if this special case of providing contradicting information is something the query planner should be able to handle and if there is a "correct" solution. For now, I would suggest the workaround is to not provide conflicting predicates in terms of the position operator. You can for example just remove the {$exists: true} clause from your query, because the $elemMatch clause implies that the element exists. This would return the result that I think you're expecting:

> db.PositionalOperatorRepro.find({ "Items" : { "$elemMatch" : { "Id" : 2 } } } , { "Items.$" : 1 })
{ "_id" : "test1", "Items" : [ { "Id" : 2 } ] }
>

We'll update this ticket when we made a decision whether the query planner should break the tie when two predicates contradict in their positional $ value.

Regards,
Thomas

Generated at Thu Feb 08 03:35:34 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.