[SERVER-9888] GridFS should support sharding of the chunks collection with hashed shard keys Created: 10/Jun/13  Updated: 31/Jan/23  Resolved: 31/Jan/23

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 2.4.0, 2.4.4
Fix Version/s: None

Type: Improvement Priority: Minor - P4
Reporter: Robert Moore Assignee: Matt Panton
Resolution: Won't Do Votes: 8
Labels: ShardingRoughEdges, community-team, gridfs, hashed, sharded-cluster, sharding-common-backlog
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Sharded gridfs.


Attachments: Text File gridfs-stats.txt    
Issue Links:
Depends
Duplicate
is duplicated by SERVER-19789 filemd5 command does not jive with ha... Closed
Related
is related to DOCS-1597 Clarify supported shard keys for Grid... Closed
Assigned Teams:
Sharding EMEA
Participants:
Case:

 Description   

I have created a sharded cluster for use with GridFS.

I would like to use a hashed shard key for the chunks to ensure the data is uniformly distributed across the cluster. When I try to use the filemd5 command to validate the files I receive an error saying the collection has to be indexed on the {{

{files_id:1}

or

{files_id:1, n:1}

}}.

mongos> db.gridfs.chunks.runCommand( { filemd5 : ObjectId("51b535788544eb45b4cdaab5"), root : 'gridfs' } )
{
    "ok" : 0,
    "errmsg" : "GridFS fs.chunks collection must be sharded on either {files_id:1} or {files_id:1, n:1}"
}

I think the set of acceptable indexes should be updated to include the hashed type.

The stats() for the chunks collection is attached.



 Comments   
Comment by Matt Panton [ 31/Jan/23 ]

At this time the team has decided to not pursue implementing a fix for the filemd5 command in a sharded environment as the filemd5 is now no longer supported for GridFS on the server. 

Comment by Matt Kangas [ 23/Jun/14 ]

Updating title to state the goal more clearly. Closing pull request per offline discussion with the developer.

Comment by Randolph Tan [ 28/Aug/13 ]

It looks like the md5 command is failing on the shard itself since it cannot figure out the right index to use. To be more specific, it is failing on this part of the code:

            // inside CmdFileMD5::run @ db/dbcommands.cpp
            shared_ptr<Cursor> cursor = getBestGuessCursor(ns.c_str(), query, sort);
            if ( ! cursor ) {
                errmsg = "need an index on { files_id : 1 , n : 1 }";
                return false;
            }

Edit: showed code instead instead of github link since file was changed since last time and messed the line being highlighted.

Comment by Stennie Steneker (Inactive) [ 19/Jun/13 ]

castiel: Can you please make a separate issue in the PERL project for your pull request? I've flagged this server issue as needing review for similar changes in other drivers.

Thanks,
Stephen

Comment by Mark Burazin [ 12/Jun/13 ]

Stephen,

I have made a perl driver patch too which is related to this change so it allows inserting in such sharded gridfs databases, should I make a ticket in the perl driver jira first or just make a pull request referencing this ticket?

Here example error from the perl driver:

can't use unique indexes with sharding  ns:dload.fs.chunks key: { files_id: 1, n: 1 } at /usr/lib64/perl5/vendor_perl/5.12.4/x86_64-linux/MongoDB/GridFS.pm line 93

Thanks,
Mark

Comment by Robert Moore [ 12/Jun/13 ]

Stephen -

I wrapped the line. I can't modify the previous commit messages but the latest one has the ticket number.

The contributor's agreement was done a while ago. Github user name is 'allanbank'.

Rob.

Comment by Stennie Steneker (Inactive) [ 12/Jun/13 ]

Hi Robert,

Great, thanks for the pull request! Have you read our guide to Contributing to the MongoDB project?

There are a few extra steps that will help this request be ready for review by our kernel team:

  • add a reference to this Jira ticket in the commit comment
  • adhere to the Kernel Code Style; in particular, your code change has a long line (should be limited to 100 columns)
  • sign the contributor agreement if you haven't already

Regards,
Stephen

Comment by Robert Moore [ 12/Jun/13 ]

Stephen,

I have created a pull request with the required changes:
https://github.com/mongodb/mongo/pull/442

Rob.

Comment by Stennie Steneker (Inactive) [ 11/Jun/13 ]

Hi Robert,

The GridFS implementation as at MongoDB 2.4 currently only supports calculating md5 sums for fs.chunks collections sharded on either

{files_id:1}

or

{files_id:1, n:1}

. The filemd5 command is used to validate the uploaded files.

I've raised DOCS-1597 to make this more explicit in the current documentation on Sharding GridFS Data Store.

Support for

{files_id: hashed}

seems a reasonable improvement.

Regards,
Stephen

Generated at Thu Feb 08 03:21:44 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.