[JAVA-403] UUIDs are stored as little endian (should be big endian) Created: 28/Jul/11  Updated: 01/May/19  Resolved: 09/Oct/14

Status: Closed
Project: Java Driver
Component/s: Codecs, Configuration
Affects Version/s: None
Fix Version/s: 3.0.0

Type: Improvement Priority: Major - P3
Reporter: Victor Boivie Assignee: Robert Guo (Inactive)
Resolution: Done Votes: 7
Labels: SERVER_V2, uuid
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

MacOS X 10.6 (which implies a little endian x86/64 CPU)


Attachments: PNG File GUID-Insertion-Speed.png    
Issue Links:
Depends
Related
is related to DOCS-1543 UUID BinData subtype Closed
is related to PYTHON-387 Support Java and C# legacy byte order... Closed

 Description   

In python:
>>> import pymongo
>>> import uuid
>>> from pymongo import Connection
>>> c = Connection('localhost', 27017)
>>> db = c.test
>>> db.foo.insert(

{"python": uuid.UUID("aaf4c61d-dcc5-e8a2-dabe-de0f3b482cd9")}

)
ObjectId('4e318fb98f1e81eac4000001')

In java:
UUID id = UUID.fromString("aaf4c61d-dcc5-e8a2-dabe-de0f3b482cd9");
Mongo mongo = new Mongo();
DB db = mongo.getDB("test");
DBObject dbo = new BasicDBObject();
dbo.put("java", id);
db.getCollection("foo").insert(dbo);

Result:
> db.foo.find()

{ "_id" : ObjectId("4e318fb98f1e81eac4000001"), "python" : UUID('aaf4c61ddcc5e8a2dabede0f3b482cd9') } { "_id" : ObjectId("4e318fc12746ac3aa375aee9"), "java" : UUID('a2e8c5dc1dc6f4aad92c483b0fdebeda') }

(yes, with the patch from SERVER-1201 applied)

Java seems to serialize/deserialize the UUIDs as little endian according to BSONEncoder.java, line 354

Changing this would however break a lot of applications out there. However, the python/java incompatibility is really bad, so it should be fixed in my humble opinion...



 Comments   
Comment by Jeffrey Yemin [ 31/Mar/15 ]

Closing all resolved 3.0.0 issues, as 3.0.0 has been tagged and released.

Comment by Githook User [ 30/Jan/15 ]

Author:

{u'username': u'guoyr', u'name': u'Robert Guo', u'email': u'robert.guo@10gen.com'}

Message: JAVA-403 allow user to specify UUID format
Branch: master
https://github.com/mongodb/mongo-java-driver/commit/3d44a7e300f5359690d1b5870c7509f3617552f2

Comment by Githook User [ 09/Oct/14 ]

Author:

{u'username': u'guoyr', u'name': u'Robert Guo', u'email': u'robert.guo@10gen.com'}

Message: JAVA-403 allow user to specify UUID format
Branch: 3.0.x
https://github.com/mongodb/mongo-java-driver/commit/3d44a7e300f5359690d1b5870c7509f3617552f2

Comment by Jeffrey Yemin [ 12/Jan/13 ]

Moving to 3.0 release, unfortunately. The only way to do this in 2.x is with the DBEncoder/DBDecoder framework, and we're likely going to be getting rid of that framework in 3.0.

Comment by Bernie Hackett [ 26/Dec/12 ]

Philipp, you can already do this in PyMongo:

http://api.mongodb.org/python/current/api/pymongo/collection.html#pymongo.collection.Collection.uuid_subtype

Valid settings are OLD_UUID_SUBTYPE (3 - the current default), UUID_SUBTYPE (4 - will be the default in some future release), JAVA_LEGACY, and CSHARP_LEGACY.

This is a per-collection setting so that you can use subtype 4 in new collections but use legacy byte orders in existing collections.

Comment by Philipp Schneider (coresystems) [ 26/Dec/12 ]

@Nils
For our case we needed Subtype 3 because Python was/is using the same database. We did not needed backward compatibility because the database was empty.
Python and Java needed to read and write the same database.
Our patch is not a general fix for this issue. It is a quick and working solution which only works if you start with an empty database and do not care about backward compatibility.

From a mongodb user point, it would be nice if the final fix/solution would have an option where the user of the driver can define if he wants to use subtype4 or subtype3 (3 will have no backward compatibility). On option in some kind of settings would be nice.

Comment by Nils [ 19/Dec/12 ]

The fix appears to still use subtype 3, which will break backwards compatibility. Any reason not to switch to subtype 4, which arguably is is the correct subtype, while retaining backwards compatibility?

Comment by Philipp Schneider (coresystems) [ 14/Nov/12 ]

Hello,
we have patched the driver and using the patched version since a few month. We have just upgraded to the newest driver version.
You can download the patched version here:
https://github.com/mila-labs/mongo-java-driver/

Use at own risk and only if you have NO DATA yet in the database

Comment by Jeffrey Yemin [ 30/Oct/12 ]

Nils, we're working on a way to do this without breaking compatibility. Should know more in a week or so.

Comment by Nils [ 24/Oct/12 ]

According to the BSON spec, binary subtype 4 is standard UUID, and 3 is legacy, so there should be no problem implementing subtype 4 encoding, while falling back to properly decoding old subtype 3 legacy.

http://bsonspec.org/#/specification

Comment by Nils [ 24/Oct/12 ]

Can we at the very least get something like a static global switch to fix this issue for those of us who do not have legacy issues and would like to not start creating legacy issues? The fix should be simple enough and be backwards compatible.

Comment by Nils [ 24/Oct/12 ]

I've just been hit with this problem. Unable to get consistent view of data in Java and everything else. And apparently no pending fix. I don't understand how this is not a higher priority. The UUID, as seen from Java, does not match what the shell or any other tool is seeing. It makes it almost impossible to work with the data.

Comment by Armin Ronacher [ 24/Jan/12 ]

The official BSON specification does not list Binary subtype 4 at all at the moment but some drivers have already started to accept it (for instance the Python one) and MongoDB itself also already handles binary subtype 4. May I suggest adding at the very least a note to the binary specification and adding a link to this issue or related ones?

Comment by Mark Lewis [ 31/Oct/11 ]

A graph showing insertion speed of an object with a UUID value as its _id. For this issue, the difference between Timestamp UUID and Timestamp (order swap) is the important bit.

Comment by Mark Lewis [ 31/Oct/11 ]

I recently performed some benchmarking of Mongo insert performance where a UUID field is indexed. The current (wrong) implementation performed about 50% slower than a corrected UUID implementation for timestamp-based UUIDs, because timestamp-based UUIDs are supposed to have the same mostly-increasing behavior as ObjectId, but due to this issue they are not. So there is a performance aspect to this issue as well.

Comment by Robert Stam [ 03/Oct/11 ]

Maybe not, because the binary subtype would be different (3 for legacy representations and 4 for the new standard representation).

Comment by Jeff Yemin (Inactive) [ 03/Oct/11 ]

Isn't it the case that $in may produce false positives?

Comment by Antoine Girbal [ 29/Jul/11 ]

this is the plan.
The only issue is when you want to find a doc by its UUID, you wont be sure which one it's stored under, so you will have to try both in your app (I guess an $in op will be clean enough).
Also if you do sorting on that field it will not be consistent (hopefully not a problem in most cases).
We can't do tricks on the db side because other drivers were storing UUID correctly.
AG

Comment by Victor Boivie [ 29/Jul/11 ]

>Also, keeping the type as a binary subtypes makes it easer and more like the current impl.

Understandable.

>the old one will be read as new class (UUIDJava/Legacy probably)

Can't you do it like this: (we here assume that the new binary subtype for 'new-style UUID' is 6)

dbo.put("foo", new java.util.UUID(1,2)) -> save as subtype 6
dbo.get("foo") and stored as subtype 6 -> return java.util.UUID class
dbo.put("foo", new org.bson.LegacyUUID(1,2)) -> save as subtype 3
dbo.get("foo") and stored as subtype 3 -> return java.util.UUID class, but deserialized using the old style

That way, old code will work (and will automatically convert to the new format as the documents are updated). And we could still use the defacto java.lang.UUID class and not care about the legacy type provided that we don't care how the documents are stored in the DB

Comment by Scott Hernandez (Inactive) [ 28/Jul/11 ]

It would be great to be able to subclass UUID but it is final unfortunately. We will make UUID be stored as the new subtype and the old one will be read as new class (UUIDJava/Legacy probably). The rest of the details are yet to finished. You will be able to store/retrieve either type, but new UUIDs will be stored as the new, and compatible, bson type.

Also, keeping the type as a binary subtypes makes it easer and more like the current impl.

Comment by Victor Boivie [ 28/Jul/11 ]

It could even be a top level BSON type (and not a binary type), which could save 5 additional bytes per entry as the length is always the same.

How would that be activated? By setting an option to the driver somehow? Because I really hope that you will still use the languages' own native UUID classes as it is today. And not reinvent a new one in the BSON namespace.

Comment by Scott Hernandez (Inactive) [ 28/Jul/11 ]

Yes, there is a plan to deprecate the current UUID binary subtype and create a new one which is binary compatible for all drivers. It will be a breaking change but one that the developers can choose to implement by migrating data, or having handler code in the app.

Generated at Thu Feb 08 08:52:12 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.