Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-4941

collection rename may not replicate / clone properly during initial sync

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • 3.6.0-rc0
    • Affects Version/s: None
    • Component/s: Storage
    • Labels:
      None
    • Fully Compatible
    • ALL
    • Storage 2017-07-10, Storage 2017-07-31, Storage 2017-09-11, Storage 2017-10-02
    • 0

      Issue Status as of Sep 25, 2017

      ISSUE DESCRIPTION AND IMPACT
      During initial sync, if a renameCollection operation is found during oplog application, the initial sync process is aborted and restarted to prevent data divergence in replica set nodes (see below for an example).

      In addition to the renameCollection command, operations such as aggregations using $out and MapReduce with output to a collection may implicitly use renameCollection operations to create their output collections.

      Users who attempt to resync a node and, before the process is complete, run any of the of the operations above, may see their initial sync process abort and restart. In extreme cases (e.g.: if users are constantly running aggregations to new collections) initial sync operations may never complete.

      DIAGNOSIS AND AFFECTED VERSIONS
      This change was made in SERVER-26117 and affects MongoDB 3.2.12 and newer. On initial sync, users may encounter the following error message:

      2017-09-05T17:56:04.348+0000 E REPL     [repl writer worker 5] Error applying operation: OplogOperationUnsupported: Applying renameCollection not supported in initial sync: { ts: Timestamp 1504588941000|592, h: 4948566672906734558, v: 2, op: "c", ns: "graphs.$cmd", o: { renameCollection: "graphs.tmp.agg_out.989", to: "graphs.graphs_temp", stayTemp: false, dropTarget: true } } ({ ts: Timestamp 1504588941000|592, h: 4948566672906734558, v: 2, op: "c", ns: "graphs.$cmd", o: { renameCollection: "graphs.tmp.agg_out.989", to: "graphs.graphs_temp", stayTemp: false, dropTarget: true } })
      2017-09-05T17:56:04.348+0000 E REPL     [replication-168] Failed to apply batch due to 'OplogOperationUnsupported: error applying batch: Applying renameCollection not supported in initial sync: { ts: Timestamp 1504588941000|592, h: 4948566672906734558, v: 2, op: "c", ns: "graphs.$cmd", o: { renameCollection: "graphs.tmp.agg_out.989", to: "graphs.graphs_temp", stayTemp: false, dropTarget: true } }'
      

      RATIONALE
      Prior to SERVER-26117, allowing the renameCollection operation to complete could cause data divergence. Here's an example of a situation that may occur using aggregation:

      1. User performs an aggregation against the test.foo using $out to a test.aggResults collection.
      2. The aggregation starts generating results, and writes documents A and B to the temporary test.tmp.agg_out.1 collection.
      3. User adds a node to the replica set, and the node begins the initial sync process.
      4. Initial syncing node records minvalid, then lists all databases and collections that it needs to clone, discovers test.tmp.agg_out.1.
      5. The user’s aggregation continues, writing documents C and D to test.tmp.agg_out.1.
      6. The user’s aggregation completes, renaming test.tmp.agg_out.1 to test.aggResults.
      7. The initial sync clones all the collections it knows about in the test database. This includes an attempt to clone test.tmp.agg_out.1, however it discovers no documents on the sync source for that collection, as it has already been renamed to test.aggResults on the sync source.
      8. Note the initial syncing node does not attempt to clone test.aggResults because that collection didn’t exist when it listed the collections it needed to clone in step 4.
      9. The initial sync finishes data cloning and moves on to oplog application. It replicates the inserts of documents C and D to test.tmp.agg_out.1 (which implicitly creates that collection).
      10. The initial sync encounters the renameCollection oplog entry and proceeds, renaming its copy of test.tmp.agg_out.1 to test.aggResults
      11. Initial sync finishes

      At this point the test.aggResults collection on the primary/sync source contains the documents A, B, C and D. On the newly added node however, that collection only contains the documents C and D, and while it believes itself consistent with the primary and caught up, reads from that node will return incomplete results. Additionally, if the user now does any writes to documents A or B this may cause the newly added node to crash as it won’t have any record of A or B.

      REMEDIATION AND WORKAROUNDS
      Users of renameCollection and aggregation with $out affected by this behavior need to pause the use of these features in order to completean initial sync operation.

      Users mapReduce() may also pause their mapReduce() operations. Alternatively, they may use Output to a collection with an action as a workaround, as this avoids the renameCollection operation performed internally by the out option of mapReduce. For example:

      db.outputcollection.drop()
      // The output collection can't be empty, so insert a marker document
      db.outputcollection.insert({marker:1})
      db.mycollection.mapReduce(myMapFunction, myReduceFunction, { out: { merge : "outputcollection" } })
      

      FIX VERSION
      A fix for this behavior is included in MongoDB 3.6.

            Assignee:
            geert.bosch@mongodb.com Geert Bosch
            Reporter:
            aaron Aaron Staple
            Votes:
            0 Vote for this issue
            Watchers:
            24 Start watching this issue

              Created:
              Updated:
              Resolved: