[JAVA-2546] GridFSBucket requires key filename Created: 26/Jun/17  Updated: 29/Oct/23  Resolved: 30/Jun/17

Status: Closed
Project: Java Driver
Component/s: GridFS
Affects Version/s: 3.4.2
Fix Version/s: 3.5.0

Type: Bug Priority: Major - P3
Reporter: Kevin Adistambha Assignee: Ross Lawley
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related

 Description   

The spec for GridFS mentions that filename is optional. However, the Java driver GridFSBucket seems to require this field.

For example, this fs.files collection using GridFS (note the missing filename field):

> db.fs.files.find()
{
  "_id": ObjectId("5950700998e91439516f0abe"),
  "chunkSize": 261120,
  "uploadDate": ISODate("2017-06-26T02:23:05.529Z"),
  "length": 8466,
  "md5": "c09d698b0e623e5cc05cb05d49002a2d"
}

This Java code snippet will fail to download the file:

GridFSBucket bucket = GridFSBuckets.create(database);
bucket.downloadToStream(new ObjectId("5950700998e91439516f0abe"), System.out);

The error displayed is:

org.bson.BsonInvalidOperationException: Document does not contain key filename

However the same code in Python will work:

fs = gridfs.GridFSBucket(conn.test)
print fs.open_download_stream(bson.ObjectId("5950700998e91439516f0abe")).read()

There seems to be a discrepancy in GridFS behaviour between the Java driver and the Python driver.



 Comments   
Comment by Jan S. [ 27/Oct/17 ]

Hi Ross.

so you expect the people to access GridFS as collection and execute the following update command, right?

MongoCollection<Document> gridFSCollection = currentDatabase.getCollection("fs.files");
gridFSCollection.updateMany(Filters.eq("filename", null), Updates.set("filename", ""));

If you do so then please write a migration guide and add this information.

Comment by Ross Lawley [ 20/Oct/17 ]

Hi mango,

The GridFS implementation follows the GridFS Specification and is a convention built upon the database. Regarding a migration guide, this was new code and the original GridFS implementation still exists. However, it looks like that there are some data inconsistencies, that haven't previously been reported.

You are correct that server will allow differing types for fields, however, here the convention expects the fieldname to be a String. As the underlying data is just stored in collections in the database, it is possible that data maybe manipulated outside the GridFS convention, so that it no longer fits the conventions expectations. Also, the drivers team understands that prior to the specification the different language GridFS implementations, did differ slightly.

With the new GridFS implementation the driver reads the data from BSON and into the strongly typed BsonDocument before creating a GridFSFile. The error is because the filename is expected to be a BsonString type and not a BsonNull type.

Before opening a new ticket to support BsonNull values as the filename, it would be good to findout how a null value was set for the fieldname? Also, as expectations are made for the types of data for the GridFSFile it would be good to understand if there are other alternative types for that metadata.

Thanks for your help,

Ross

Comment by Jan S. [ 19/Oct/17 ]

Is this topic already covered by a migration guide? I did not find anything regarding that topic in the Mongo server changelog of 3.0, 3.2 and 3.4.

BTW: What is allowed or not is IMHO up to the server. If the drivers do disallow certain data to be stored the server also should disallow it. The current situation is totally confusing as it depends on the driver and it's version what is allowed or not.

Comment by Ross Lawley [ 19/Oct/17 ]

Hi mango,

Yes this was released in 3.5.0 and fixes the issue where the filename is missing. From your report it looks like you have a null value stored which is still invalid.

Ross

Comment by Jan S. [ 19/Oct/17 ]

@Ross Lawley Does this mean the Java Driver 3.5.0 already does contain the change to "defaulting the filename to an empty string" or is this future work?

Because in my tests it does not matter which version I use (3.4.? or 3.5.0) I can't retrieve GridFS objects that have a null filename (org.bson.BsonInvalidOperationException: Value expected to be of type STRING is of unexpected type NULL).

Comment by Ross Lawley [ 30/Jun/17 ]

Just to clarify what has been fixed - the 3.5 Java driver will handle legacy GridFS data in the future by defaulting the filename to an empty string. DRIVERS-392 will look into relaxing the filename being required in spec.

Comment by Githook User [ 30/Jun/17 ]

Author:

{u'username': u'rozza', u'name': u'Ross Lawley', u'email': u'ross.lawley@gmail.com'}

Message: Support decoding legacy files that don't contain filenames

JAVA-2546
Branch: master
https://github.com/mongodb/mongo-java-driver/commit/12c389442b28ebbb253db324237098980f7b3047

Comment by Ross Lawley [ 27/Jun/17 ]

PR: https://github.com/rozza/mongo-java-driver/pull/205

Comment by Ross Lawley [ 27/Jun/17 ]

Hi,

Thanks for the responses. The filename is required on upload, this is explicit in the specification and defined in the fields list for the files collection. It is unspecified if we support legacy files that were created without a name and there is no test case for that scenario.

The reason we see an error in the Java driver is internally it uses a strictly typed collection of GridFSFile (where filename is non optional). The GridFSFile contains the information needed for the download. I've raised DRIVERS-392 to clarify if filename should be optional in the future and asked for a test case to support legacy data. Either way, we should support legacy data going forward - marking to fix in 3.5

Thanks again,

Ross

Comment by Kevin Adistambha [ 27/Jun/17 ]

Hi ross.lawley, as a sanity check, I tried doing the same using the node driver and the C# driver. Both allowed me to download without the filename field present in the fs.files collection.

For reference, this is what I tried using the node driver:

MongoClient.connect('mongodb://localhost/test', function(err, db) {
    var bucket = new mongodb.GridFSBucket(db);
    var id = bson.ObjectId('5950700998e91439516f0abe');
    bucket.openDownloadStream(id).pipe(process.stdout);
});

and the same in the C# driver:

public static void DownloadGridFS(IMongoDatabase database) {
    var bucket = new MongoDB.Driver.GridFS.GridFSBucket(database);
    var id = new ObjectId("5950700998e91439516f0abe");
    bucket.DownloadToStream(id, Console.OpenStandardOutput());
}

As mentioned by andrew.ryder, we're not entirely sure which one is the intended behaviour.

Thanks!

Comment by Andrew Ryder (Inactive) [ 27/Jun/17 ]

G'day ross.lawley, it isn't clear to me where in the specification it says filename is required. If we take the wording to mean it's required it would imply that metadata is required too, which makes no sense.

In any case, even if the filename were required, it still means there is a bug somewhere and we're simply trying to understand where:

  • If the specification is ambiguous then we need to clarify it.
  • If two drivers implement the specification differently then one or both have a bug.
  • If the documentation contradicts the specification then there is a documentation bug.

Given the specification is not entirely clear – to our reading, please provide a quote if you believe it is clear – and the documentation says the filename is optional, and the Python driver allows it as optional, the Occam's Razor solution says that the Java driver is the odd man out.

I think we're fully prepared to accept this may be new and that the Java driver may be the only thing doing it right, we just need some clear indication that is so before raising a series of change requests to fix those others sites.

Please let us know.

Comment by Ross Lawley [ 26/Jun/17 ]

Hi kevin.adistambha,

Thanks for the ticket. The GridFSBucket follows the latest GridFS Specification where filename is required. As a result the GridFSFile class in Java does not allow null values for the filename string.

Ross

Generated at Thu Feb 08 08:57:30 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.