Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Duplicate
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 3.4.7
Component/s: Replication
Labels:
None

Operating System:
ALL
Steps To Reproduce:
- Run a 3-node cluster
- Run map/reduce jobs that require a temp collection
- Re-sync one of the nodes, the re-sync will fail and start again
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

We have a 3-node replicaset running v3.4.7, the primary is running 6 map/reduce jobs every minute or so and due to circumstances we had to re-sync one of the secondary nodes. However at one of the last steps of the re-sync we get the following error:

2017-09-05T17:56:04.348+0000 E REPL     [repl writer worker 5] Error applying operation: OplogOperationUnsupported: Applying renameCollection not supported in initial sync: { ts: Timestamp 1504588941000|592, h: 4948566672906734558, v: 2, op: "c", ns: "graphs.$cmd", o: { renameCollection: "graphs.tmp.agg_out.989", to: "graphs.graphs_temp", stayTemp: false, dropTarget: true } } ({ ts: Timestamp 1504588941000|592, h: 4948566672906734558, v: 2, op: "c", ns: "graphs.$cmd", o: { renameCollection: "graphs.tmp.agg_out.989", to: "graphs.graphs_temp", stayTemp: false, dropTarget: true } })
2017-09-05T17:56:04.348+0000 E REPL     [replication-168] Failed to apply batch due to 'OplogOperationUnsupported: error applying batch: Applying renameCollection not supported in initial sync: { ts: Timestamp 1504588941000|592, h: 4948566672906734558, v: 2, op: "c", ns: "graphs.$cmd", o: { renameCollection: "graphs.tmp.agg_out.989", to: "graphs.graphs_temp", stayTemp: false, dropTarget: true } }'
2017-09-05T17:56:04.348+0000 I ASIO     [NetworkInterfaceASIO-RS-0] Ending connection to host graphs1-mongo3:27017 due to bad connection status; 2 connections to that host remain open
2017-09-05T17:56:04.348+0000 I REPL     [replication-167] Finished fetching oplog during initial sync: CallbackCanceled: Callback canceled. Last fetched optime and hash: { ts: Timestamp 1504634159000|4672, t: -1 }[-4041766555669726456]
2017-09-05T17:56:04.348+0000 I REPL     [replication-168] Initial sync attempt finishing up.

After this, MongoDB cleans up the files and starts the re-sync again, it's now basically stuck in a very big loop. This used to work fine with 2.x, when we re-synced quite a few times.

I'm not sure what to do about this, we can't stop the map/reduce jobs for the duration of the resync as it takes about 8 hours to get to this point.

duplicates

SERVER-4941 collection rename may not replicate / clone properly during initial sync

Closed

Assignee:: Kelsey Schubert
Reporter:: Robert Beekman
Participants:: Daniel Pasette, Kelsey Schubert, Ramon Fernandez, Robert Beekman, Spencer Brody, Thijs Cadier
Votes:: 0 Vote for this issue
Watchers:: 6 Start watching this issue

Created:: Sep 05 2017 07:44:03 PM UTC
Updated:: Apr 07 2023 04:19:45 PM UTC
Resolved:: Sep 05 2017 07:55:28 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates

PagerDuty