Loading...

Type: Bug
Resolution: Duplicate
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 3.4.3
Component/s: Replication
Labels:
None

Assigned Teams:

Replication
Operating System:
ALL
Steps To Reproduce:
Hide

Create a large(ish) document (<16MB) in a MMAP replicaset

Initiate an initalsync

While this is happening, apply an update that would push the document size over the 16MB limit

Wait for the initial copy to finish, catching up from the oplog will fail.
Show
Create a large(ish) document (<16MB) in a MMAP replicaset Initiate an initalsync While this is happening, apply an update that would push the document size over the 16MB limit Wait for the initial copy to finish, catching up from the oplog will fail.
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Initial sync from a MMAPv1 replicaset to a new WT secondary, both running 3.4.3:

2017-04-10T06:00:01.604+0000 I CONTROL  [signalProcessingThread] db version v3.4.3
2017-04-10T06:00:01.604+0000 I CONTROL  [signalProcessingThread] git version: f07437fb5a6cca07c10bafa78365456eb1d6d5e1
2017-04-10T06:00:01.604+0000 I CONTROL  [signalProcessingThread] OpenSSL version: OpenSSL 1.0.1f 6 Jan 2014
2017-04-10T06:00:01.604+0000 I CONTROL  [signalProcessingThread] allocator: tcmalloc
2017-04-10T06:00:01.604+0000 I CONTROL  [signalProcessingThread] modules: none
2017-04-10T06:00:01.604+0000 I CONTROL  [signalProcessingThread] build environment:
2017-04-10T06:00:01.604+0000 I CONTROL  [signalProcessingThread]     distmod: ubuntu1404
2017-04-10T06:00:01.604+0000 I CONTROL  [signalProcessingThread]     distarch: x86_64
2017-04-10T06:00:01.604+0000 I CONTROL  [signalProcessingThread]     target_arch: x86_64

The initial copy of the data was successful, but the oplog catchup stage failed due to an update that breached the 16MB limit:

2017-04-06T16:04:04.054+0000 F REPL     [repl writer worker 11] warning: log line attempted (147kB) over max size (10kB), printing beginning and end ... writer worker caught exception:  :: caused by :: 17
419 Resulting document after update is larger than 16777216 on: { ts: Timestamp 1491429857000|404, h: 8450293174279352223, v: 2, op: "u", ns: "[SNIP - large update]", o2: { _id: ObjectId('58961877c14
1580404e7f604') }, o: { $set: { [SNIP] } }
2017-04-06T16:04:04.055+0000 I REPL     [repl writer worker 8] missing object not found on source. presumably deleted later in oplog
2017-04-06T16:04:04.055+0000 I REPL     [repl writer worker 8] o2: { _id: ObjectId('58e4e2e9bc44b6c99ed9bb2b') }
2017-04-06T16:04:04.055+0000 I REPL     [repl writer worker 8] o firstfield: $set
2017-04-06T16:04:04.067+0000 E REPL     [replication-59] Failed to apply batch due to 'Location17419: error applying batch: Resulting document after update is larger than 16777216'
2017-04-06T16:04:04.075+0000 I REPL     [replication-61] Finished fetching oplog during initial sync: CallbackCanceled: fetcher was shut down after previous batch was processed. Last fetched optime and ha
sh: { ts: Timestamp 1491494643000|612, t: -1 }[1653548606848978015]
2017-04-06T16:04:04.075+0000 I REPL     [replication-61] Initial sync attempt finishing up.
2017-04-06T16:04:04.082+0000 I REPL     [replication-61] warning: log line attempted (491kB) over max size (10kB), printing beginning and end ... Initial Sync Attempt Statistics: { failedInitialSyncAttemp
ts: 0, maxFailedInitialSyncAttempts: 10, initialSyncStart: new Date(1491397073200), initialSyncAttempts: [], fetchedMissingDocs: 110321, appliedOps: 27140000, initialSyncOplogStart: Timestamp 149139707400
0|51, initialSyncOplogEnd: Timestamp 1491494643000|612, databases: { databasesCloned: 44, [SNIP]: { collections: 1, clonedCollections: 1, start: new Date(1491397074124), end: new Date(1491397074208), elapsedMillis: 84, [SNIP]: { documentsToCopy: 14, documentsCopied: 14, indexes: 1, fetchedBatches: 1, start: new Date(1491397074124), end: new Date(1491397074208), elapsedMillis: 84 } }, [SNIP]: { collections: 51, clonedCollections: 51, start: new Date(1491397074208), end: new Date(1491397729132), elapsedMillis: 654924, [SNIP]: { documentsToCopy: 1, documentsCopied: 1, indexes: 1, fetchedBatches: 1, start: new Date(1491397074209), end: new Date(1491397074287), elapsedMillis: 78 }, [SNIP]: { documentsToCopy: 310824, documentsCopied: 310834, indexes: 2, fetchedBatches: 129, start: new Date(1491397074287), end: new Date(1491397082539), elapsedMillis: 8252 }, [SNIP]: { documentsToCopy: 2236029, documentsCopied: 2236405, indexes: 2, fetchedBatches: 109, start: new Date(1491397082539), end: new Date(1491397133928), elapsedMillis: 51389 }, [SNIP]: { documentsToCopy: 30770, documentsCopied: 30770, indexes: 2, fetchedBatches: 1, start: new Date(1491397133928), end: new Date(1491397134604), elapsedMillis: 676 }, [SNIP]: { documentsToCopy: 1188, documentsCopied: 1188, indexes: 1, fetchedBatches: 1, start: new Date(1491397134604), end: new Date(1491397134679), elapsedMillis: 75 }, [SNIP]: { documentsToCopy: 111810, documentsCopied: 111811, indexes: 2, fetchedBatches: 5, start: new Date(1491397134679), end: new Date(1491397137511), elapsedMillis: 2832 }, [SNIP]: { documentsToCopy: 16767, documentsCopied: 16767, indexes: 1, fetchedBatches: 1, start: new Date(1491397137511), end: new Date(1491397137922), elapsedMillis: 411 }, [SNIP]: { documentsToCopy: 85965, documentsCopied: 85969, indexes: 2, fetchedBatches: 11, start: new Date(1491397137921), end: new Date(1491397146202), elapsedMillis: 8281 }, [SNIP]: { documentsToCopy: 67820, documentsCopied: 67826, indexes: 2, fetchedBatches: 15, start: new Date(1491397146202), end: new Date(1491397156706), elapsedMillis: 10504 }, [SNIP]: { documentsToCopy: 5, documentsCopied: 5, indexes: 2, fetchedBatches: 1, start: new Date(1491397156706), end: new Date(1491397156765), elapsedMillis: 59 }, [SNIP]: { documentsToCopy: 18827, documentsCopied: 18827, indexes: 2, fetchedBatches: 1, start: new Date(1491397156765), end: new Date(1491397157548), elapsedMillis: 783 }, [SNIP]: { documentsToCopy: 9603, documentsCopied: 9603, indexes: 1, fetchedBatches: 1, start: new Date(1491397157547), end: new Date(1491397157821), elapsedMillis: 274 }, [SNIP]: { documentsToCopy: 86, documentsCopied: 86, indexes: 1, fetchedBatches: 1, start: new Date(1491397157821), end: new Date(1491397157842), e .......... alerts.krwfihep_client_alerts: { documentsToCopy: 0, documentsCopied: 0, indexes: 7, fetchedBatches: 0, start: new Date(1491492468641), end: new Date(1491492468683), elapsedMillis: 42 }, [SNIP]: { documentsToCopy: 513, documentsCopied: 513, indexes: 7, fetchedBatches: 1, start: new Date(1491492468683), end: new Date(1491492468943), elapsedMillis: 260 }, [SNIP]: { documentsToCopy: 7, documentsCopied: 7, indexes: 7, fetchedBatches: 1, start: new Date(1491492468943), end: new Date(1491492469036), elapsedMillis: 93 }, [SNIP]: { documentsToCopy: 3245, documentsCopied: 3245, indexes: 7, fetchedBatches: 1, start: new Date(1491492469036), end: new Date(1491492470635), elapsedMillis: 1599 }, [SNIP]: { documentsToCopy: 921, documentsCopied: 921, indexes: 7, fetchedBatches: 1, start: new Date(1491492470635), end: new Date(1491492470715), elapsedMillis: 80 }, [SNIP]: { documentsToCopy: 1747, documentsCopied: 1747, indexes: 7, fetchedBatches: 1, start: new Date(1491492470715), end: new Date(1491492470993), elapsedMillis: 278 }, [SNIP]: { documentsToCopy: 0, documentsCopied: 0, indexes: 7, fetchedBatches: 0, start: new Date(1491492470993), end: new Date(1491492471052), elapsedMillis: 59 }, [SNIP]: { documentsToCopy: 21, documentsCopied: 21, indexes: 7, fetchedBatches: 1, start: new Date(1491492471052), end: new Date(1491492471102), elapsedMillis: 50 }, [SNIP]: { documentsToCopy: 0, documentsCopied: 0, indexes: 7, fetchedBatches: 0, start: new Date(1491492471102), end: new Date(1491492471168), elapsedMillis: 66 }, [SNIP]: { documentsToCopy: 4, documentsCopied: 4, indexes: 7, fetchedBatches: 1, start: new Date(1491492471168), end: new Date(1491492471241), elapsedMillis: 73 }, [SNIP]: { documentsToCopy: 6, documentsCopied: 6, indexes: 7, fetchedBatches: 1, start: new Date(1491492471240), end: new Date(1491492471317), elapsedMillis: 77 }, [SNIP]: { documentsToCopy: 0, documentsCopied: 0, indexes: 7, fetchedBatches: 0, start: new Date(1491492471317), end: new Date(1491492471389), elapsedMillis: 72 }, [SNIP]: { documentsToCopy: 0, documentsCopied: 0, indexes: 7, fetchedBatches: 0, start: new Date(1491492471389), end: new Date(1491492471468), elapsedMillis: 79 }, [SNIP]: { documentsToCopy: 1385, documentsCopied: 1385, indexes: 7, fetchedBatches: 1, start: new Date(1491492471468), end: new Date(1491492471827), elapsedMillis: 359 }, [SNIP]: { documentsToCopy: 598, documentsCopied: 598, indexes: 7, fetchedBatches: 1, start: new Date(1491492471827), end: new Date(1491492472019), elapsedMillis: 192 }, [SNIP]: { documentsToCopy: 100, documentsCopied: 100, indexes: 7, fetchedBatches: 1, start: new Date(1491492472019), end: new Date(1491492472094), elapsedMillis: 75 }, [SNIP]: { documentsToCopy: 79333, documentsCopied: 79333, indexes: 7, fetchedBatches: 14, start: new Date(1491492472094), end: new Date(1491492492457), elapsedMillis: 20363 } } } }
2017-04-06T16:04:04.082+0000 E REPL     [replication-61] Initial sync attempt failed -- attempts left: 9 cause: Location17419: error applying batch: Resulting document after update is larger than 16777216
2017-04-06T16:04:05.083+0000 I REPL     [replication-59] Starting initial sync (attempt 2 of 10)
2017-04-06T16:04:05.133+0000 I REPL     [replication-59] sync source candidate: mongodbsvr1-p2:27028
2017-04-06T16:04:06.220+0000 I STORAGE  [replication-59] dropAllDatabasesExceptLocal 44
2017-04-06T16:04:24.446+0000 I REPL     [replication-59] ******
2017-04-06T16:04:24.446+0000 I REPL     [replication-59] creating replication oplog of size: 51200MB...
2017-04-06T16:04:24.451+0000 I STORAGE  [replication-59] WiredTigerRecordStoreThread local.oplog.rs already started
2017-04-06T16:04:24.451+0000 I STORAGE  [replication-59] The size storer reports that the oplog contains 0 records totaling to 0 bytes
2017-04-06T16:04:24.451+0000 I STORAGE  [replication-59] Scanning the oplog to determine where to place markers for truncation
2017-04-06T16:04:24.515+0000 I REPL     [replication-59] ******

This then caused the whole initialsync process to fail, dropping all the data and starting again from the beginning. If there is an entry in the oplog that the MMAP engine coped with (to replicate to the secondary), then the initial sync should cope in the same way (rejecting the update and carrying on?).

duplicates

SERVER-7515 idempotence violation when intermediate document exceeds size threshold

Open

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates