Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-10478

Very large documents can cause premature migration commit

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Blocker - P1 Blocker - P1
    • 2.2.6, 2.4.6, 2.5.2
    • Affects Version/s: None
    • Component/s: Sharding
    • Labels:
      None
    • ALL

      Issue Status as of October 23rd, 2013

      ISSUE SUMMARY
      During a chunk migration, if one of the documents in the chunk has a size in the range of 16,776,185 and 16,777,216 bytes (inclusive), then some documents in that chunk may be lost during the migration process.

      USER IMPACT
      Documents which are not migrated from the chunk are lost and need to be reinserted into the collection.

      MongoDB v2.2 maintains a backup of every document involved in a chunk migration in a moveChunk directory (http://docs.mongodb.org/manual/faq/sharding/). It is possible to examine this directory programmatically to find documents migrated within the document size in question.

      MongoDB v2.4 has this option off by default.

      SOLUTION
      Mongod needs to ensure it always sends at least one doc until the batches are done for that chunk.

      WORKAROUNDS
      If there are very large documents in your cluster, you should disable the balancer until upgrading. See: http://docs.mongodb.org/manual/tutorial/manage-sharded-cluster-balancer/

      If document loss is suspected, locate the moveChunk directory on the master replica of the donor shard at the time of the migration. The lost documents can be reinserted from that backup or your own regular backups.

      PATCHES
      MongoDB v2.2.6 and v2.4.6 will address this problem. Downloads for the release candidates will be available at http://www.mongodb.org/downloads within 24 hours.

            Assignee:
            greg_10gen Greg Studer
            Reporter:
            greg_10gen Greg Studer
            Votes:
            0 Vote for this issue
            Watchers:
            14 Start watching this issue

              Created:
              Updated:
              Resolved: