Loading...

Type: Bug
Resolution: Duplicate
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 4.2.5
Component/s: Replication, Tools
Labels:
- backup
- mongod
- oplog
- replication
- transactions
Environment:
OS: CentOS Linux release 7.7.1908 (Core)
Kernel: 3.10.0-1062.18.1.el7.x86_64 #1 SMP Tue Mar 17 23:49:17 UTC 2020

Also affects the mongo:4.2.5 docker image

Assigned Teams:

Tools
Operating System:
ALL
Steps To Reproduce:
Hide

Reproducing the large oplog entries is straightforward, simply make a number of very large writes in a transaction. It's easier to demonstrate using monogdump, steps to reproduce with mongodump follow.

Capturing the error with mongodump relies on timing, so depending on hardware the test case might not hit the issue, although it's working reliably on my machine. As per the issue description, this is affecting MongoDB 4.2.5, installed on CentOS 7 via the official mongodb-org-4.2 RPM repo, but I've also been able to reproduce using the mongo:4.2.5 docker image.

To reproduce, it's necessary to write a large transaction while mongodump is dumping that database. To reproduce this, I've written a small go command to write a large amount of dummy data to a collection (in order to slow down the mongodump operation), and another go command to perform a short transaction that writes a few large documents. Steps:

Create a sufficiently large collection such that mongodump will take at least a few seconds to complete.

Start a mongodump with oplog writing enabled.

While mongodump is running, complete a transaction with a very large amount of data to write, such that the large transaction is captured in mongodump's oplog.bson

Try to inspect the produced oplog.bson with bsondump, or try to restore the dump with mongorestore --oplogReplay

Some code to help reproduce can be found here, tested on Go 1.14:

https://github.com/ks07/mongodb-oplog-bug

A full log showing usage in reproducing the error and the error output:

https://gist.github.com/ks07/869229ea0fd26c6b0058f81427691070

Note that the example writes a small number of large documents very close to the 16 MiB limit, but in production usage we have noticed this behaviour with transactions writing a large number of smaller documents.
Show
Reproducing the large oplog entries is straightforward, simply make a number of very large writes in a transaction. It's easier to demonstrate using monogdump, steps to reproduce with mongodump follow. Capturing the error with mongodump relies on timing, so depending on hardware the test case might not hit the issue, although it's working reliably on my machine. As per the issue description, this is affecting MongoDB 4.2.5, installed on CentOS 7 via the official mongodb-org-4.2 RPM repo, but I've also been able to reproduce using the mongo:4.2.5 docker image. To reproduce, it's necessary to write a large transaction while mongodump is dumping that database. To reproduce this, I've written a small go command to write a large amount of dummy data to a collection (in order to slow down the mongodump operation), and another go command to perform a short transaction that writes a few large documents. Steps: Create a sufficiently large collection such that mongodump will take at least a few seconds to complete. Start a mongodump with oplog writing enabled. While mongodump is running, complete a transaction with a very large amount of data to write, such that the large transaction is captured in mongodump's oplog.bson Try to inspect the produced oplog.bson with bsondump, or try to restore the dump with mongorestore --oplogReplay Some code to help reproduce can be found here, tested on Go 1.14: https://github.com/ks07/mongodb-oplog-bug A full log showing usage in reproducing the error and the error output: https://gist.github.com/ks07/869229ea0fd26c6b0058f81427691070 Note that the example writes a small number of large documents very close to the 16 MiB limit, but in production usage we have noticed this behaviour with transactions writing a large number of smaller documents.
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

I've run into an issue where point-in-time snapshots of a mongo server produced using mongodump --oplog can be unusable, if they happen to coincide with a large transaction write operation. In these cases, the oplog.bson produced in the dump will contain a document that exceeds the 16MiB size limit set in mongo-tools-common, and thus restoring with mongorestore --oplogReplay will fail.

# mongorestore --oplogReplay
{{ 2020-04-08T13:15:13.749+0000 using default 'dump' directory}}
{{ 2020-04-08T13:15:13.749+0000 preparing collections to restore from}}
{{ 2020-04-08T13:15:13.751+0000 reading metadata for foo.junk from dump/foo/junk.metadata.json}}
{{ 2020-04-08T13:15:13.762+0000 restoring foo.junk from dump/foo/junk.bson}}
{{ 2020-04-08T13:15:16.748+0000 [........................] foo.junk 4.55MB/131MB (3.5%)}}
{{ 2020-04-08T13:15:19.748+0000 ....................... foo.junk 8.74MB/131MB (6.7%)}}
{{ 2020-04-08T13:15:22.749+0000 #...................... foo.junk 13.0MB/131MB (10.0%)}}
{{ 2020-04-08T13:15:25.751+0000 ##..................... foo.junk 17.6MB/131MB (13.5%)}}
{{ 2020-04-08T13:15:28.752+0000 ###.................... foo.junk 22.1MB/131MB (16.9%)}}
{{ 2020-04-08T13:15:31.748+0000 ###.................... foo.junk 26.5MB/131MB (20.3%)}}
{{ 2020-04-08T13:15:34.748+0000 ####................... foo.junk 31.3MB/131MB (24.0%)}}
{{ 2020-04-08T13:15:37.748+0000 #####.................. foo.junk 36.1MB/131MB (27.7%)}}
{{ 2020-04-08T13:15:40.753+0000 ######................. foo.junk 40.7MB/131MB (31.2%)}}
{{ 2020-04-08T13:15:43.748+0000 #######................ foo.junk 45.4MB/131MB (34.8%)}}
{{ 2020-04-08T13:15:46.748+0000 ########............... foo.junk 49.7MB/131MB (38.1%)}}
{{ 2020-04-08T13:15:48.489+0000 ####################### foo.junk 131MB/131MB (100.0%)}}
{{ 2020-04-08T13:15:48.489+0000 no indexes to restore}}
{{ 2020-04-08T13:15:48.489+0000 finished restoring foo.junk (1000001 documents, 0 failures)}}
{{ 2020-04-08T13:15:48.489+0000 replaying oplog}}
{{ 2020-04-08T13:15:48.496+0000 applied 1 oplog entries}}
{{ 2020-04-08T13:15:48.496+0000 Failed: restore error: error reading oplog bson input: invalid BSONSize: 16777499 bytes}}
{{ 2020-04-08T13:15:48.496+0000 1000001 document(s) restored successfully. 0 document(s) failed to restore.}}

From looking at the underlying local.oplog.rs collection, I think this may be a problem with how mongod attempts to split transactions into documents when writing to the oplog. By querying the local.oplog.rs collection it's possible to see the offending BSON documents in the oplog. Interestingly, despite this issue, it appears that replication still works correctly, although I haven't tested enough to confidently say this is the case.

I have attached one of the offending oplog.bson dumps to this issue.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

oplog.bson.gz
81 kB
Apr 08 2020 02:38:09 PM UTC

is duplicated by

TOOLS-2495 Oplog replay can't handle entries > 16 MB

Closed

related to

TOOLS-2542 Investigate oplog document size limit exceeded

Closed

Details

Description

Attachments

Attachments

Issue Links

Forms

Activity

People

Dates