[JAVA-403] UUIDs are stored as little endian (should be big endian) Created: 28/Jul/11 Updated: 01/May/19 Resolved: 09/Oct/14 |
|
| Status: | Closed |
| Project: | Java Driver |
| Component/s: | Codecs, Configuration |
| Affects Version/s: | None |
| Fix Version/s: | 3.0.0 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Victor Boivie | Assignee: | Robert Guo (Inactive) |
| Resolution: | Done | Votes: | 7 |
| Labels: | SERVER_V2, uuid | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
MacOS X 10.6 (which implies a little endian x86/64 CPU) |
||
| Attachments: |
|
||||||||||||||||
| Issue Links: |
|
||||||||||||||||
| Description |
|
In python: ) In java: Result: (yes, with the patch from Java seems to serialize/deserialize the UUIDs as little endian according to BSONEncoder.java, line 354 Changing this would however break a lot of applications out there. However, the python/java incompatibility is really bad, so it should be fixed in my humble opinion... |
| Comments |
| Comment by Jeffrey Yemin [ 31/Mar/15 ] |
|
Closing all resolved 3.0.0 issues, as 3.0.0 has been tagged and released. |
| Comment by Githook User [ 30/Jan/15 ] |
|
Author: {u'username': u'guoyr', u'name': u'Robert Guo', u'email': u'robert.guo@10gen.com'}Message: |
| Comment by Githook User [ 09/Oct/14 ] |
|
Author: {u'username': u'guoyr', u'name': u'Robert Guo', u'email': u'robert.guo@10gen.com'}Message: |
| Comment by Jeffrey Yemin [ 12/Jan/13 ] |
|
Moving to 3.0 release, unfortunately. The only way to do this in 2.x is with the DBEncoder/DBDecoder framework, and we're likely going to be getting rid of that framework in 3.0. |
| Comment by Bernie Hackett [ 26/Dec/12 ] |
|
Philipp, you can already do this in PyMongo: Valid settings are OLD_UUID_SUBTYPE (3 - the current default), UUID_SUBTYPE (4 - will be the default in some future release), JAVA_LEGACY, and CSHARP_LEGACY. This is a per-collection setting so that you can use subtype 4 in new collections but use legacy byte orders in existing collections. |
| Comment by Philipp Schneider (coresystems) [ 26/Dec/12 ] |
|
@Nils From a mongodb user point, it would be nice if the final fix/solution would have an option where the user of the driver can define if he wants to use subtype4 or subtype3 (3 will have no backward compatibility). On option in some kind of settings would be nice. |
| Comment by Nils [ 19/Dec/12 ] |
|
The fix appears to still use subtype 3, which will break backwards compatibility. Any reason not to switch to subtype 4, which arguably is is the correct subtype, while retaining backwards compatibility? |
| Comment by Philipp Schneider (coresystems) [ 14/Nov/12 ] |
|
Hello, Use at own risk and only if you have NO DATA yet in the database |
| Comment by Jeffrey Yemin [ 30/Oct/12 ] |
|
Nils, we're working on a way to do this without breaking compatibility. Should know more in a week or so. |
| Comment by Nils [ 24/Oct/12 ] |
|
According to the BSON spec, binary subtype 4 is standard UUID, and 3 is legacy, so there should be no problem implementing subtype 4 encoding, while falling back to properly decoding old subtype 3 legacy. |
| Comment by Nils [ 24/Oct/12 ] |
|
Can we at the very least get something like a static global switch to fix this issue for those of us who do not have legacy issues and would like to not start creating legacy issues? The fix should be simple enough and be backwards compatible. |
| Comment by Nils [ 24/Oct/12 ] |
|
I've just been hit with this problem. Unable to get consistent view of data in Java and everything else. And apparently no pending fix. I don't understand how this is not a higher priority. The UUID, as seen from Java, does not match what the shell or any other tool is seeing. It makes it almost impossible to work with the data. |
| Comment by Armin Ronacher [ 24/Jan/12 ] |
|
The official BSON specification does not list Binary subtype 4 at all at the moment but some drivers have already started to accept it (for instance the Python one) and MongoDB itself also already handles binary subtype 4. May I suggest adding at the very least a note to the binary specification and adding a link to this issue or related ones? |
| Comment by Mark Lewis [ 31/Oct/11 ] |
|
A graph showing insertion speed of an object with a UUID value as its _id. For this issue, the difference between Timestamp UUID and Timestamp (order swap) is the important bit. |
| Comment by Mark Lewis [ 31/Oct/11 ] |
|
I recently performed some benchmarking of Mongo insert performance where a UUID field is indexed. The current (wrong) implementation performed about 50% slower than a corrected UUID implementation for timestamp-based UUIDs, because timestamp-based UUIDs are supposed to have the same mostly-increasing behavior as ObjectId, but due to this issue they are not. So there is a performance aspect to this issue as well. |
| Comment by Robert Stam [ 03/Oct/11 ] |
|
Maybe not, because the binary subtype would be different (3 for legacy representations and 4 for the new standard representation). |
| Comment by Jeff Yemin (Inactive) [ 03/Oct/11 ] |
|
Isn't it the case that $in may produce false positives? |
| Comment by Antoine Girbal [ 29/Jul/11 ] |
|
this is the plan. |
| Comment by Victor Boivie [ 29/Jul/11 ] |
|
>Also, keeping the type as a binary subtypes makes it easer and more like the current impl. Understandable. >the old one will be read as new class (UUIDJava/Legacy probably) Can't you do it like this: (we here assume that the new binary subtype for 'new-style UUID' is 6) dbo.put("foo", new java.util.UUID(1,2)) -> save as subtype 6 That way, old code will work (and will automatically convert to the new format as the documents are updated). And we could still use the defacto java.lang.UUID class and not care about the legacy type provided that we don't care how the documents are stored in the DB |
| Comment by Scott Hernandez (Inactive) [ 28/Jul/11 ] |
|
It would be great to be able to subclass UUID but it is final unfortunately. We will make UUID be stored as the new subtype and the old one will be read as new class (UUIDJava/Legacy probably). The rest of the details are yet to finished. You will be able to store/retrieve either type, but new UUIDs will be stored as the new, and compatible, bson type. Also, keeping the type as a binary subtypes makes it easer and more like the current impl. |
| Comment by Victor Boivie [ 28/Jul/11 ] |
|
It could even be a top level BSON type (and not a binary type), which could save 5 additional bytes per entry as the length is always the same. How would that be activated? By setting an option to the driver somehow? Because I really hope that you will still use the languages' own native UUID classes as it is today. And not reinvent a new one in the BSON namespace. |
| Comment by Scott Hernandez (Inactive) [ 28/Jul/11 ] |
|
Yes, there is a plan to deprecate the current UUID binary subtype and create a new one which is binary compatible for all drivers. It will be a breaking change but one that the developers can choose to implement by migrating data, or having handler code in the app. |