[SERVER-5030] Document equality should be independent of field insertion order Created: 21/Feb/12 Updated: 06/Dec/22 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Querying |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Minor - P4 |
| Reporter: | Glenn Maynard | Assignee: | Backlog - Query Execution |
| Resolution: | Unresolved | Votes: | 9 |
| Labels: | bson2, query_triage | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||||||
| Assigned Teams: |
Query Execution
|
||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||||||||||||||
| Description |
|
Equality of documents should not depend on the order of the fields. The following documents should be considered equal: {a: 1, b: 2} {b: 2, a: 1}The above documents would be considered the same document for purposes of unique indexes, and would be considered a match for searching. This allows documents to map much more naturally to the native, built-in dictionary types in various languages. Most native dictionary types don't preserve order, and builtin dictionary types that support equality (Python, Ruby) don't compare insertion order. Ruby, which does explicitly preserve insertion order, also still does not consider insertion order for equality; {:a=>1, :b=>2} == {:b=>2, :a=>1} is true. This would reduce the need to use inconvenient non-builtin dictionary types, and align documents to typical expectations of dictionary types. Note: this is not a recommendation to stop preserving insertion order. Although I don't find that useful, it's not obviously harmful, and it should be possible to do both efficiently. Additional discussion at http://groups.google.com/group/mongodb-user/browse_thread/thread/c908e572d6e89c49. |
| Comments |
| Comment by Daniel Sinclair [ 08/Oct/13 ] | |||||||||||
|
I totally agree. Don't get me wrong, I don't want to have order dependent Since it might hamper DB performance, and we can't guarantee order in the Either way - you're absolutely right, it's a real issue that needs to be | |||||||||||
| Comment by Glenn Maynard [ 07/Oct/13 ] | |||||||||||
|
The issue isn't that it's completely impossible to preserve dictionary order. The problem is that it's easy to get lurking bugs in day-to-day user code. For example, it's easy to write {"a": 1, "b": 2}in Python, and without realizing it depend on the key order that Python's hashing happens to give you. You'll often get the same key order every time, so everything seems to work, but it's actually depending on the hash behavior of your current version of Python, so it'll break if Python changes its hash algorithm, and it won't work in any language except Python. I've done this myself without noticing what I was doing, even though I know about it, and it's very easy for this to slip through testing. It's also painful to not be able to use dictionary literals in most languages. It doesn't mean that it can't be done at all; drivers could easily guarantee a particular order in their BSON handling if that was wanted. The issue here is about sanity of user code, but the drivers can handle this however they want internally. | |||||||||||
| Comment by Daniel Sinclair [ 07/Oct/13 ] | |||||||||||
|
Isn't the main issue that most languages don't preserve dictionary order? | |||||||||||
| Comment by Daniel Sinclair [ 05/Oct/13 ] | |||||||||||
|
So can you ensure a particular field order from every language ? I'm | |||||||||||
| Comment by Eliot Horowitz (Inactive) [ 05/Oct/13 ] | |||||||||||
|
Glenn, yes, unless in bson 2 we declare all bson document fields are sorted by key name. Or say that bson 2 documents compare differently than 1 docs in this regard. | |||||||||||
| Comment by Glenn Maynard [ 05/Oct/13 ] | |||||||||||
|
Just to be clear, James's idea of updating subdocuments in-place is what would require BSON format changes, not the change this bug is about. | |||||||||||
| Comment by Glenn Maynard [ 05/Oct/13 ] | |||||||||||
|
James: That sounds like it would require significant changes to the BSON data format, so it doesn't sound like an optimization that would ever happen. (Even simpler fixes, like not storing the index of each item in BSON arrays, are hard to get any traction on, since changing the BSON format affects every driver and probably third-party software.) | |||||||||||
| Comment by James Blackburn [ 05/Oct/13 ] | |||||||||||
I've got one strong counter-argument to this: efficiency of small updates to large documents. As things currently stand updating a document dirties all the document's pages, even if only a single fixed-length field is changed. For arrays this is worse: ss the document size changes subsequent fields move. This means multi-document updates can cause significant unnecessary I/O. If instead mutated fields, and variably-sized fields were stored/moved to the end of the document then the update code could be easily optimised to touch just the page(s) containing the mutated fields at the end of the document. If this happened dynamically during update then mongo would be much friendlier to the underlying storage. | |||||||||||
| Comment by Daniel Sinclair [ 04/Oct/13 ] | |||||||||||
|
Actually, you're correct about JS - I misspoke, I guess I was just thinking Your point about the drivers I agree with. Sorting is OK, but I'd much Indexes though was how I first found this thread. I wanted to use an object If we can't guarantee order of fields from the client, then the DB needs to | |||||||||||
| Comment by Eliot Horowitz (Inactive) [ 04/Oct/13 ] | |||||||||||
|
Glenn, correct. JavaScript maintains order and has no notion of document equality. Daniel, all the drivers support building order dependent documents, but agree its probably not ideal. Its just not totally clear what the right path forward is. Going to move this ticket to put it near the top of things to think about. | |||||||||||
| Comment by Glenn Maynard [ 04/Oct/13 ] | |||||||||||
|
Sorting the keys is fine. In most languages, dictionaries are inherently unordered, so having keys come back in a different order is OK. I prefer this, actually, since it helps guarantee that I won't accidentally write code that depends on the ordering behavior of a particular language/driver. For people depending on order-sensitivity or order-preservation, an option would be to make this a collection-level flag to enable and disable this. It would probably be OK if changing this flag after collection creation wasn't allowed, so supporting migrating the data wouldn't be required. For example, since this flag would change compound shard keys, this restriction means upgrading wouldn't cause lots of documents to be moved across shards. On upgrade, set this flag for all existing collections; users would need to copy data to a new collection if they want to migrate existing data. system.indexes does depend on order-preservation, so that collection could use that flag, avoiding the need to restructure it. > fundamentally broken. By not supporting equivalence the way JavaScript does Actually, JavaScript is one of the only languages I've used that does always preserve dictionary order. Every other major language I've used doesn't unless you jump hoops. (I don't think JS has any concept of object equality.) | |||||||||||
| Comment by Daniel Sinclair [ 04/Oct/13 ] | |||||||||||
|
Presumably you can have a "strict" or "fast" mode for those that don't want If we can't guarantee on the order of fields out of the driver (and I don't If you were going to keep to an order, then I'd have said sorted I'd vote firstly for equivalence {a:1,b:2}=== {b:2,a:1}, failing that, | |||||||||||
| Comment by Eliot Horowitz (Inactive) [ 04/Oct/13 ] | |||||||||||
|
We've gone back and forth on this one a lot over the last couple of years. Many strong opinions on both sides. In 2.6, we're actually changing the update code so that it never re-orders, which should guarantee that the order that documents come in stay in that order. If we were going to change, to make things reasonably efficient, we'd probably have to sort all keys so that comparison are fast. Also not sure how to migrate if we did want to change. | |||||||||||
| Comment by Daniel Sinclair [ 04/Oct/13 ] | |||||||||||
|
I also think this is a problem; db.xxx.drop(); "MongoDB does not make guarantees regarding the order of fields in a BSON document. Drivers and MongoDB will reorder the fields of a documents upon insertion and following updates." Which is clearly only true for top-level fields. However, id is also considered a BsonDocument. | |||||||||||
| Comment by Glenn Maynard [ 27/Sep/12 ] | |||||||||||
|
It happens everywhere subdocuments are compared against each other; from what I recall it's actually just a byte-for-byte comparison of the BSON data, without even looking at its contents. | |||||||||||
| Comment by Caleb Jones [ 27/Sep/12 ] | |||||||||||
|
Note that this occurs with $addToSet:
| |||||||||||
| Comment by Glenn Maynard [ 18/Sep/12 ] | |||||||||||
|
(Why is a bugtracker hideously image-izing smileys? Who seriously thought that was a good idea?) | |||||||||||
| Comment by Glenn Maynard [ 18/Sep/12 ] | |||||||||||
|
Because it doesn't solve the problem. | |||||||||||
| Comment by Julien [ 18/Sep/12 ] | |||||||||||
|
Hum ... when you say : "For example, if you insert a record manually in the shell, you need to remember to apply the same ordering" it's true, with AND without my solution ... | |||||||||||
| Comment by Glenn Maynard [ 18/Sep/12 ] | |||||||||||
|
That's brittle: you have to make sure you apply something like this in every place you access the database. For example, if you insert a record manually in the shell, you need to remember to apply the same ordering. If you forget, or get it wrong (eg. if the sort order isn't identical for Unicode characters), you get subtle breakage. Avoiding brittleness is a major reason to want this in the first place. This also doesn't work when setting subkeys of an object, rather than replacing the whole object. The only way I can think of to fix this properly is a collection-level flag. | |||||||||||
| Comment by Julien [ 18/Sep/12 ] | |||||||||||
|
Or a SON ctor from a dict, something like :
Does it seems a good idea ? | |||||||||||
| Comment by Julien [ 18/Sep/12 ] | |||||||||||
|
And what about providing a example code of an "Ordering key SON manipulator" in drivers ? | |||||||||||
| Comment by Glenn Maynard [ 13/Sep/12 ] | |||||||||||
|
It would have been good to do that from the start, but unfortunately, MongoDB currently guarantees preservation of dictionary order. Applications probably depends on that, so normalizing dictionaries would probably cause backwards-compatibility issues. | |||||||||||
| Comment by Julien [ 13/Sep/12 ] | |||||||||||
|
I got an idea about this problem, what about ordering by key, server side, at insert time ? It would hide the "order sensitive" problem, leaving a natural, order insensitive database, without dropping any performance nor touching any loc of the storage engine ... |