Core Server
  1. Core Server
  2. SERVER-9888

'filemd5' command does not allow sharding of the chunks collection on the key { 'files_id' : 'hashed' }.

    Details

    • Backport:
      No
    • # Replies:
      7
    • Last comment by Customer:
      true
    • Driver changes needed?:
      Driver changes needed

      Description

      I have created a sharded cluster for use with GridFS.

      I would like to use a hashed shard key for the chunks to ensure the data is uniformly distributed across the cluster. When I try to use the filemd5 command to validate the files I receive an error saying the collection has to be indexed on the {{

      {files_id:1}

      or

      {files_id:1, n:1}

      }}.

      mongos> db.gridfs.chunks.runCommand( { filemd5 : ObjectId("51b535788544eb45b4cdaab5"), root : 'gridfs' } )
      {
          "ok" : 0,
          "errmsg" : "GridFS fs.chunks collection must be sharded on either {files_id:1} or {files_id:1, n:1}"
      }
      

      I think the set of acceptable indexes should be updated to include the hashed type.

      The stats() for the chunks collection is attached.

      1. gridfs-stats.txt
        2 kB
        Robert Moore

        Issue Links

          Activity

          Hide
          Stephen Steneker
          added a comment -

          Hi Robert,

          The GridFS implementation as at MongoDB 2.4 currently only supports calculating md5 sums for fs.chunks collections sharded on either

          {files_id:1}

          or

          {files_id:1, n:1}

          . The filemd5 command is used to validate the uploaded files.

          I've raised DOCS-1597 to make this more explicit in the current documentation on Sharding GridFS Data Store.

          Support for

          {files_id: hashed}

          seems a reasonable improvement.

          Regards,
          Stephen

          Show
          Stephen Steneker
          added a comment - Hi Robert, The GridFS implementation as at MongoDB 2.4 currently only supports calculating md5 sums for fs.chunks collections sharded on either {files_id:1} or {files_id:1, n:1} . The filemd5 command is used to validate the uploaded files. I've raised DOCS-1597 to make this more explicit in the current documentation on Sharding GridFS Data Store . Support for {files_id: hashed} seems a reasonable improvement. Regards, Stephen
          Hide
          Robert Moore
          added a comment -

          Stephen,

          I have created a pull request with the required changes:
          https://github.com/mongodb/mongo/pull/442

          Rob.

          Show
          Robert Moore
          added a comment - Stephen, I have created a pull request with the required changes: https://github.com/mongodb/mongo/pull/442 Rob.
          Hide
          Stephen Steneker
          added a comment -

          Hi Robert,

          Great, thanks for the pull request! Have you read our guide to Contributing to the MongoDB project?

          There are a few extra steps that will help this request be ready for review by our kernel team:

          • add a reference to this Jira ticket in the commit comment
          • adhere to the Kernel Code Style; in particular, your code change has a long line (should be limited to 100 columns)
          • sign the contributor agreement if you haven't already

          Regards,
          Stephen

          Show
          Stephen Steneker
          added a comment - Hi Robert, Great, thanks for the pull request! Have you read our guide to Contributing to the MongoDB project ? There are a few extra steps that will help this request be ready for review by our kernel team: add a reference to this Jira ticket in the commit comment adhere to the Kernel Code Style ; in particular, your code change has a long line (should be limited to 100 columns) sign the contributor agreement if you haven't already Regards, Stephen
          Hide
          Robert Moore
          added a comment -

          Stephen -

          I wrapped the line. I can't modify the previous commit messages but the latest one has the ticket number.

          The contributor's agreement was done a while ago. Github user name is 'allanbank'.

          Rob.

          Show
          Robert Moore
          added a comment - Stephen - I wrapped the line. I can't modify the previous commit messages but the latest one has the ticket number. The contributor's agreement was done a while ago. Github user name is 'allanbank'. Rob.
          Hide
          Mark Burazin
          added a comment -

          Stephen,

          I have made a perl driver patch too which is related to this change so it allows inserting in such sharded gridfs databases, should I make a ticket in the perl driver jira first or just make a pull request referencing this ticket?

          Here example error from the perl driver:

          can't use unique indexes with sharding  ns:dload.fs.chunks key: { files_id: 1, n: 1 } at /usr/lib64/perl5/vendor_perl/5.12.4/x86_64-linux/MongoDB/GridFS.pm line 93
          

          Thanks,
          Mark

          Show
          Mark Burazin
          added a comment - Stephen, I have made a perl driver patch too which is related to this change so it allows inserting in such sharded gridfs databases, should I make a ticket in the perl driver jira first or just make a pull request referencing this ticket? Here example error from the perl driver: can't use unique indexes with sharding ns:dload.fs.chunks key: { files_id: 1, n: 1 } at /usr/lib64/perl5/vendor_perl/5.12.4/x86_64-linux/MongoDB/GridFS.pm line 93 Thanks, Mark
          Hide
          Stephen Steneker
          added a comment -

          Mark Burazin: Can you please make a separate issue in the PERL project for your pull request? I've flagged this server issue as needing review for similar changes in other drivers.

          Thanks,
          Stephen

          Show
          Stephen Steneker
          added a comment - Mark Burazin : Can you please make a separate issue in the PERL project for your pull request? I've flagged this server issue as needing review for similar changes in other drivers. Thanks, Stephen
          Hide
          Randolph Tan
          added a comment - - edited

          It looks like the md5 command is failing on the shard itself since it cannot figure out the right index to use. To be more specific, it is failing on this part of the code:

                      // inside CmdFileMD5::run @ db/dbcommands.cpp
                      shared_ptr<Cursor> cursor = getBestGuessCursor(ns.c_str(), query, sort);
                      if ( ! cursor ) {
                          errmsg = "need an index on { files_id : 1 , n : 1 }";
                          return false;
                      }
          

          Edit: showed code instead instead of github link since file was changed since last time and messed the line being highlighted.

          Show
          Randolph Tan
          added a comment - - edited It looks like the md5 command is failing on the shard itself since it cannot figure out the right index to use. To be more specific, it is failing on this part of the code: // inside CmdFileMD5::run @ db/dbcommands.cpp shared_ptr<Cursor> cursor = getBestGuessCursor(ns.c_str(), query, sort); if ( ! cursor ) { errmsg = "need an index on { files_id : 1 , n : 1 }"; return false; } Edit: showed code instead instead of github link since file was changed since last time and messed the line being highlighted.

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Days since reply:
                33 weeks, 4 days ago
                Date of 1st Reply: