[SERVER-4941] collection rename may not replicate / clone properly during initial sync Created: 12/Feb/12  Updated: 29/Apr/20  Resolved: 21/Sep/17

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: None
Fix Version/s: 3.6.0-rc0

Type: Bug Priority: Major - P3
Reporter: Aaron Staple Assignee: Geert Bosch
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-31093 Remove initialsync attempts from pass... Closed
Documented
is documented by DOCS-10847 Docs for SERVER-4941: collection rena... Closed
Duplicate
is duplicated by SERVER-5760 Collections remaining with different ... Closed
is duplicated by SERVER-18310 Can't rollback dropCollection if new ... Closed
is duplicated by SERVER-30952 Initial (re)sync never completes, stu... Closed
is duplicated by SERVER-35105 Applying renameCollection not support... Closed
is duplicated by SERVER-38524 Rename collection in initial sync Closed
Problem/Incident
Related
related to SERVER-4332 renameCollection across dbs doesn't r... Closed
related to SERVER-5694 renameCollection replication is not a... Closed
related to SERVER-40151 there is an error when adding a new s... Closed
related to SERVER-26117 renameCollection 'c' op should restar... Closed
is related to SERVER-31944 initial_sync_applier_error.js is now ... Backlog
is related to SERVER-15393 Renaming a collection with newly adde... Closed
is related to SERVER-30478 Add replset test for rename during in... Closed
is related to SERVER-30620 SyncTail::fetchAndInsertMissingDocume... Closed
is related to SERVER-29772 Provide option to 3.2 and 3.4 to allo... Closed
is related to SERVER-15359 Provide "id" of collection Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Storage 2017-07-10, Storage 2017-07-31, Storage 2017-09-11, Storage 2017-10-02
Participants:
Case:
Linked BF Score: 0

 Description   
Issue Status as of Sep 25, 2017

ISSUE DESCRIPTION AND IMPACT
During initial sync, if a renameCollection operation is found during oplog application, the initial sync process is aborted and restarted to prevent data divergence in replica set nodes (see below for an example).

In addition to the renameCollection command, operations such as aggregations using $out and MapReduce with output to a collection may implicitly use renameCollection operations to create their output collections.

Users who attempt to resync a node and, before the process is complete, run any of the of the operations above, may see their initial sync process abort and restart. In extreme cases (e.g.: if users are constantly running aggregations to new collections) initial sync operations may never complete.

DIAGNOSIS AND AFFECTED VERSIONS
This change was made in SERVER-26117 and affects MongoDB 3.2.12 and newer. On initial sync, users may encounter the following error message:

2017-09-05T17:56:04.348+0000 E REPL     [repl writer worker 5] Error applying operation: OplogOperationUnsupported: Applying renameCollection not supported in initial sync: { ts: Timestamp 1504588941000|592, h: 4948566672906734558, v: 2, op: "c", ns: "graphs.$cmd", o: { renameCollection: "graphs.tmp.agg_out.989", to: "graphs.graphs_temp", stayTemp: false, dropTarget: true } } ({ ts: Timestamp 1504588941000|592, h: 4948566672906734558, v: 2, op: "c", ns: "graphs.$cmd", o: { renameCollection: "graphs.tmp.agg_out.989", to: "graphs.graphs_temp", stayTemp: false, dropTarget: true } })
2017-09-05T17:56:04.348+0000 E REPL     [replication-168] Failed to apply batch due to 'OplogOperationUnsupported: error applying batch: Applying renameCollection not supported in initial sync: { ts: Timestamp 1504588941000|592, h: 4948566672906734558, v: 2, op: "c", ns: "graphs.$cmd", o: { renameCollection: "graphs.tmp.agg_out.989", to: "graphs.graphs_temp", stayTemp: false, dropTarget: true } }'

RATIONALE
Prior to SERVER-26117, allowing the renameCollection operation to complete could cause data divergence. Here's an example of a situation that may occur using aggregation:

  1. User performs an aggregation against the test.foo using $out to a test.aggResults collection.
  2. The aggregation starts generating results, and writes documents A and B to the temporary test.tmp.agg_out.1 collection.
  3. User adds a node to the replica set, and the node begins the initial sync process.
  4. Initial syncing node records minvalid, then lists all databases and collections that it needs to clone, discovers test.tmp.agg_out.1.
  5. The user’s aggregation continues, writing documents C and D to test.tmp.agg_out.1.
  6. The user’s aggregation completes, renaming test.tmp.agg_out.1 to test.aggResults.
  7. The initial sync clones all the collections it knows about in the test database. This includes an attempt to clone test.tmp.agg_out.1, however it discovers no documents on the sync source for that collection, as it has already been renamed to test.aggResults on the sync source.
  8. Note the initial syncing node does not attempt to clone test.aggResults because that collection didn’t exist when it listed the collections it needed to clone in step 4.
  9. The initial sync finishes data cloning and moves on to oplog application. It replicates the inserts of documents C and D to test.tmp.agg_out.1 (which implicitly creates that collection).
  10. The initial sync encounters the renameCollection oplog entry and proceeds, renaming its copy of test.tmp.agg_out.1 to test.aggResults
  11. Initial sync finishes

At this point the test.aggResults collection on the primary/sync source contains the documents A, B, C and D. On the newly added node however, that collection only contains the documents C and D, and while it believes itself consistent with the primary and caught up, reads from that node will return incomplete results. Additionally, if the user now does any writes to documents A or B this may cause the newly added node to crash as it won’t have any record of A or B.

REMEDIATION AND WORKAROUNDS
Users of renameCollection and aggregation with $out affected by this behavior need to pause the use of these features in order to completean initial sync operation.

Users mapReduce() may also pause their mapReduce() operations. Alternatively, they may use Output to a collection with an action as a workaround, as this avoids the renameCollection operation performed internally by the out option of mapReduce. For example:

db.outputcollection.drop()
// The output collection can't be empty, so insert a marker document
db.outputcollection.insert({marker:1})
db.mycollection.mapReduce(myMapFunction, myReduceFunction, { out: { merge : "outputcollection" } })

FIX VERSION
A fix for this behavior is included in MongoDB 3.6.



 Comments   
Comment by Githook User [ 21/Sep/17 ]

Author:

{'email': 'geert@mongodb.com', 'name': 'Geert Bosch', 'username': 'GeertBosch'}

Message: SERVER-4941 Fix lint
Branch: master
https://github.com/mongodb/mongo/commit/dd71aa808cc60bace3fcd629353897883f034ee1

Comment by Githook User [ 21/Sep/17 ]

Author:

{'email': 'geert@mongodb.com', 'name': 'Geert Bosch', 'username': 'GeertBosch'}

Message: SERVER-4941 Allow renameCollection during initial sync
Branch: master
https://github.com/mongodb/mongo/commit/d296e1dfed119fb3ef9d4907ac1875480f1408c8

Comment by Eric Milkie [ 02/May/17 ]

Once UUIDs are present and used by replication to apply ops (instead of using the namespace name), this problem will be solved.

Comment by Eric Milkie [ 02/May/16 ]

SERVER-23919 will alleviate the problems from this issue. The full solution will be SERVER-15359.

Comment by auto [ 13/Feb/12 ]

Author:

{u'login': u'astaple', u'name': u'Aaron', u'email': u'aaron@10gen.com'}

Message: SERVER-4941 test
Branch: master
https://github.com/mongodb/mongo/commit/8c13962e6878a9dd8de81ca7cebb729e9cc3ac43

Generated at Thu Feb 08 03:07:24 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.