[SERVER-37182] Different values when referencing whole object vs. a field of that object after $arrayToObject Created: 18/Sep/18 Updated: 29/Oct/23 Resolved: 10/Oct/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Aggregation Framework |
| Affects Version/s: | 4.0.1 |
| Fix Version/s: | 3.4.19, 3.6.10, 4.0.5, 4.1.4 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Jakub Szypulka | Assignee: | Ian Boros |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
|||||||||||||||||||||||||||||||||||||||||||||
| Issue Links: |
|
|||||||||||||||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Minor Change | |||||||||||||||||||||||||||||||||||||||||||||
| Operating System: | ALL | |||||||||||||||||||||||||||||||||||||||||||||
| Backport Requested: |
v4.0, v3.6, v3.4
|
|||||||||||||||||||||||||||||||||||||||||||||
| Steps To Reproduce: | Collection
Query
Expected output
Actual output
|
|||||||||||||||||||||||||||||||||||||||||||||
| Sprint: | Query 2018-10-22 | |||||||||||||||||||||||||||||||||||||||||||||
| Participants: | ||||||||||||||||||||||||||||||||||||||||||||||
| Case: | (copied to CRM) | |||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
Using $arrayToObject on an array of {k: ..., v: ...} objects where some k are repeating outputs an object that behaves inconsistently. Referencing this object as a whole using "$object" outputs an object with a field value that is different compared to the value obtained by referencing the field directly using "$object.field". |
| Comments |
| Comment by Githook User [ 20/Nov/18 ] | |||||||||||||||||||||
|
Author: {'name': 'Ian Boros', 'email': 'ian.boros@10gen.com'}Message: (cherry picked from commit 5145d0a4f4df8216cc5dadb550d79bc65b981fb4) | |||||||||||||||||||||
| Comment by Githook User [ 19/Nov/18 ] | |||||||||||||||||||||
|
Author: {'name': 'Ian Boros', 'email': 'ian.boros@10gen.com'}Message: | |||||||||||||||||||||
| Comment by Githook User [ 19/Nov/18 ] | |||||||||||||||||||||
|
Author: {'name': 'Ian Boros', 'email': 'ian.boros@10gen.com'}Message: | |||||||||||||||||||||
| Comment by Githook User [ 10/Oct/18 ] | |||||||||||||||||||||
|
Author: {'name': 'Ian Boros', 'email': 'ian.boros@10gen.com'}Message: | |||||||||||||||||||||
| Comment by Jakub Szypulka [ 20/Sep/18 ] | |||||||||||||||||||||
|
Thanks for the quick response, Charlie. I urge you not to raise an error, as EAV database models rely on being able to „overwrite“ a key when doing an $arrayToObject operation. Whether your team chooses the first or last key, however, does not matter, as one can sort the array (e.g. using a timestamp) to one's liking before passing it to $arrayToObject. I would be glad if you could bring up these arguments during your query team meeting, should the team consider to change the behaviour to raise an error. Thanks a lot. | |||||||||||||||||||||
| Comment by Charlie Swanson [ 20/Sep/18 ] | |||||||||||||||||||||
|
I haven't personally verified, but the C++ unit tests should be able to test this. I'm pretty sure there are mechanisms to detect if a field is repeated within a Document. The bigger question for me is whether the first or the last iteration should be chosen, or if we should raise an error. The query team will discuss this in an upcoming meeting, likely next Friday if I had to guess. | |||||||||||||||||||||
| Comment by Jakub Szypulka [ 20/Sep/18 ] | |||||||||||||||||||||
|
A follow up question - after fixing the bug, how is it going to be tested? As far as I know, there are low level C++ unit test, which cannot cover this bug, and there are JS integration tests, but they use the mongo shell, so they also do not cover the bug. | |||||||||||||||||||||
| Comment by Nick Brewer [ 19/Sep/18 ] | |||||||||||||||||||||
|
jaksz I was able to reproduce what you're seeing using PyMongo. I'm passing this along to our Query team for further evaluation. -Nick | |||||||||||||||||||||
| Comment by Charlie Swanson [ 19/Sep/18 ] | |||||||||||||||||||||
|
nick.brewer upon a glance at the code, it looks like the $arrayToObject code is mistakenly generating a document with duplicate field names. The behavior in this case is probably undefined by the drivers. If you know of an easy way to confirm this from a driver - that would be useful. Otherwise I'm pretty confident we could do it from C++ in a unit test. If it does reproduce, or if there's no easy way to do so, please just adjust the title and put it in 'Needs Scheduling' for the query team to take a look at. | |||||||||||||||||||||
| Comment by Jakub Szypulka [ 19/Sep/18 ] | |||||||||||||||||||||
|
Hi Nick, Python driver: pymongo 3.5.1 node.js driver: mongodb 3.1.6 | |||||||||||||||||||||
| Comment by Nick Brewer [ 19/Sep/18 ] | |||||||||||||||||||||
|
jaksz Thanks for the details. I'll attempt to reproduce with the Python driver - can you confirm the Python driver version you're using? -Nick | |||||||||||||||||||||
| Comment by Jakub Szypulka [ 19/Sep/18 ] | |||||||||||||||||||||
|
Nick, sorry for the confusion, the test I added in — I can confirm that using the mongo shell, everything works correctly. However, I just tested it again using Python, and it gave me the same erroneous result as reported in this bug: Mongoplayground, which also gives erroneous results, is written in Go and as such uses the Go MongoDB driver. I also just tested it using node.js, and got the same erroneous results: As all of the Python, Go and node.js drivers exhibit the bug, and the mongo shell doesn't, the bug can be traced down to the differences between external drivers and the mongo shell. A wild guess, but perhaps using the mongo shell - which I assume is more of a debugging environment - disables some kind of optimisation that is applied during normal use? That would explain why it works in the shell, and doesn't with external drivers, which are made for real usage which includes optimisations. Could you try to reproduce the bug using an external driver? | |||||||||||||||||||||
| Comment by Nick Brewer [ 19/Sep/18 ] | |||||||||||||||||||||
|
jaksz I've run the test you added in
Then:
| |||||||||||||||||||||
| Comment by Jakub Szypulka [ 18/Sep/18 ] | |||||||||||||||||||||
|
Regarding your point that $arrayToObject only uses the first field when duplicate fields are present: This is correct, but has nothing to do with the inconsistency between the field of $object and $object.field. Either both values are the first occurence of the key when traversing the array (the design MongoDB chose, as you pointed out), or both values are the last occurence (the other option, which MongoDB did not choose), but they should never differ. Yet they appear to differ (1 vs 2) | |||||||||||||||||||||
| Comment by Jakub Szypulka [ 18/Sep/18 ] | |||||||||||||||||||||
|
Nick, would this mongoplayground help you? They are running 4.0.1 and I also ran into problem this using a 3.6 installation on mlab.com. | |||||||||||||||||||||
| Comment by Nick Brewer [ 18/Sep/18 ] | |||||||||||||||||||||
|
jaksz Can you provide a reproduction that is similar to actual field names / values you're using? I have not managed to reproduce this so far. Note that $arrayToObject only uses the first field when duplicate fields are present. -Nick |