[SERVER-37182] Different values when referencing whole object vs. a field of that object after $arrayToObject Created: 18/Sep/18  Updated: 29/Oct/23  Resolved: 10/Oct/18

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: 4.0.1
Fix Version/s: 3.4.19, 3.6.10, 4.0.5, 4.1.4

Type: Bug Priority: Major - P3
Reporter: Jakub Szypulka Assignee: Ian Boros
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File 屏幕快照 2018-09-19 18.01.51.png     PNG File 屏幕快照 2018-09-19 18.18.32.png    
Issue Links:
Backports
Documented
is documented by DOCS-12144 Docs for SERVER-37182: Different valu... Closed
Duplicate
is duplicated by SERVER-37198 Add tests for $arrayToObject behaviou... Closed
Related
Backwards Compatibility: Minor Change
Operating System: ALL
Backport Requested:
v4.0, v3.6, v3.4
Steps To Reproduce:

Collection

[
  {
    "array": [
      {
        "k": "field",
        "v": 1
      },
      {
        "k": "field",
        "v": 2
      }
    ]
  }
]

Query

db.collection.aggregate([
  {
    $project: {
      object: {
        $arrayToObject: "$array"
      }
    }
  },
  {
    $project: {
      "object": "$object",
      "field": "$object.field"
    }
  }
])

Expected output

[
  {
    "field": 1,
    "object": {
      "field": 1
    }
  }
]

Actual output

[
  {
    "field": 1,
    "object": {
      "field": 2
    }
  }
]

Sprint: Query 2018-10-22
Participants:
Case:

 Description   

Using $arrayToObject on an array of {k: ..., v: ...} objects where some k are repeating outputs an object that behaves inconsistently.

Referencing this object as a whole using "$object" outputs an object with a field value that is different compared to the value obtained by referencing the field directly using "$object.field".



 Comments   
Comment by Githook User [ 20/Nov/18 ]

Author:

{'name': 'Ian Boros', 'email': 'ian.boros@10gen.com'}

Message: SERVER-37182 Correctly handle duplicate fields in $arrayToObject

(cherry picked from commit 5145d0a4f4df8216cc5dadb550d79bc65b981fb4)
Branch: v4.0
https://github.com/mongodb/mongo/commit/ad258ca487abd8143aea4577998b70d8dc39ea76

Comment by Githook User [ 19/Nov/18 ]

Author:

{'name': 'Ian Boros', 'email': 'ian.boros@10gen.com'}

Message: SERVER-37182 Correctly handle duplicate fields in $arrayToObject
Branch: v3.6
https://github.com/mongodb/mongo/commit/4ca136a772111d80c6667209574d9c43146ba2e7

Comment by Githook User [ 19/Nov/18 ]

Author:

{'name': 'Ian Boros', 'email': 'ian.boros@10gen.com'}

Message: SERVER-37182 Correctly handle duplicate fields in $arrayToObject
Branch: v3.4
https://github.com/mongodb/mongo/commit/bff975230455d579071cae624fd7c8720b75a57d

Comment by Githook User [ 10/Oct/18 ]

Author:

{'name': 'Ian Boros', 'email': 'ian.boros@10gen.com'}

Message: SERVER-37182 Correctly handle duplicate fields in $arrayToObject
Branch: master
https://github.com/mongodb/mongo/commit/5145d0a4f4df8216cc5dadb550d79bc65b981fb4

Comment by Jakub Szypulka [ 20/Sep/18 ]

Thanks for the quick response, Charlie.

I urge you not to raise an error, as EAV database models rely on being able to „overwrite“ a key when doing an $arrayToObject operation.

Whether your team chooses the first or last key, however, does not matter, as one can sort the array (e.g. using a timestamp) to one's liking before passing it to $arrayToObject.
By that I mean, as long as either the first or the last key is chosen and the operation does not raise an error, EAV database models will be fine, so feel free to decide whichever way you consider more logical; I personally prefer the last key, as it reflects the real world where things change over time and the last status quo is the one most likely relevant to the database user, but according to the documentation MongoDB has already been using the first key, so perhaps it would be better to keep that so no existing MongoDB installations running EAV models get broken.

I would be glad if you could bring up these arguments during your query team meeting, should the team consider to change the behaviour to raise an error.

Thanks a lot.

Comment by Charlie Swanson [ 20/Sep/18 ]

I haven't personally verified, but the C++ unit tests should be able to test this. I'm pretty sure there are mechanisms to detect if a field is repeated within a Document. The bigger question for me is whether the first or the last iteration should be chosen, or if we should raise an error. The query team will discuss this in an upcoming meeting, likely next Friday if I had to guess.

Comment by Jakub Szypulka [ 20/Sep/18 ]

A follow up question - after fixing the bug, how is it going to be tested? As far as I know, there are low level C++ unit test, which cannot cover this bug, and there are JS integration tests, but they use the mongo shell, so they also do not cover the bug.

Comment by Nick Brewer [ 19/Sep/18 ]

jaksz I was able to reproduce what you're seeing using PyMongo. I'm passing this along to our Query team for further evaluation.

-Nick

Comment by Charlie Swanson [ 19/Sep/18 ]

nick.brewer upon a glance at the code, it looks like the $arrayToObject code is mistakenly generating a document with duplicate field names. The behavior in this case is probably undefined by the drivers. If you know of an easy way to confirm this from a driver - that would be useful. Otherwise I'm pretty confident we could do it from C++ in a unit test.

If it does reproduce, or if there's no easy way to do so, please just adjust the title and put it in 'Needs Scheduling' for the query team to take a look at.

Comment by Jakub Szypulka [ 19/Sep/18 ]

Hi Nick,

Python driver: pymongo 3.5.1

node.js driver: mongodb 3.1.6

Comment by Nick Brewer [ 19/Sep/18 ]

jaksz Thanks for the details. I'll attempt to reproduce with the Python driver - can you confirm the Python driver version you're using?

-Nick

Comment by Jakub Szypulka [ 19/Sep/18 ]

Nick, sorry for the confusion, the test I added in SERVER-37198 is not related to the bug here, it just increases test coverage to cover a behaviour that is working, but has no test. I did not write a test code pull request for this bug here, yet, as I am waiting for you to be able to reproduce the bug first.

I can confirm that using the mongo shell, everything works correctly.

However, I just tested it again using Python, and it gave me the same erroneous result as reported in this bug:
.

Mongoplayground, which also gives erroneous results, is written in Go and as such uses the Go MongoDB driver.

I also just tested it using node.js, and got the same erroneous results:

As all of the Python, Go and node.js drivers exhibit the bug, and the mongo shell doesn't, the bug can be traced down to the differences between external drivers and the mongo shell.

A wild guess, but perhaps using the mongo shell - which I assume is more of a debugging environment - disables some kind of optimisation that is applied during normal use? That would explain why it works in the shell, and doesn't with external drivers, which are made for real usage which includes optimisations.

Could you try to reproduce the bug using an external driver?

Comment by Nick Brewer [ 19/Sep/18 ]

jaksz I've run the test you added in SERVER-37198, and it passes. I also haven't been able to recreate what you're describing manually, via:

> db.test.insertOne({"array": [ {"k": "field", "v": 1},{"k": "field", "v": 2} ]})
{
	"acknowledged" : true,
	"insertedId" : ObjectId("5ba268d2b31848e2d119ca25")
}

Then:

> db.test.aggregate([
...   {
...     $project: {
...       object: {
...         $arrayToObject: "$array"
...       }
...     }
...   },
...   {
...     $project: {
...       "object": "$object",
...       "field": "$object.field"
...     }
...   }
... ])
{ "_id" : ObjectId("5ba268d2b31848e2d119ca25"), "object" : { "field" : 1 }, "field" : 1 }

Comment by Jakub Szypulka [ 18/Sep/18 ]

Regarding your point that $arrayToObject only uses the first field when duplicate fields are present:

This is correct, but has nothing to do with the inconsistency between the field of $object and $object.field. Either both values are the first occurence of the key when traversing the array (the design MongoDB chose, as you pointed out), or both values are the last occurence (the other option, which MongoDB did not choose), but they should never differ.

Yet they appear to differ (1 vs 2)

Comment by Jakub Szypulka [ 18/Sep/18 ]

Nick, would this mongoplayground help you? They are running 4.0.1 and I also ran into problem this using a 3.6 installation on mlab.com.

Comment by Nick Brewer [ 18/Sep/18 ]

jaksz Can you provide a reproduction that is similar to actual field names / values you're using? I have not managed to reproduce this so far. Note that $arrayToObject only uses the first field when duplicate fields are present.

-Nick

Generated at Thu Feb 08 04:45:15 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.