Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-47408

oplog documents from transactions can breach the maximum bson size and break mongorestore

    • Type: Icon: Bug Bug
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 4.2.5
    • Component/s: Replication, Tools
    • Environment:
      OS: CentOS Linux release 7.7.1908 (Core)
      Kernel: 3.10.0-1062.18.1.el7.x86_64 #1 SMP Tue Mar 17 23:49:17 UTC 2020

      Also affects the mongo:4.2.5 docker image
    • Tools
    • ALL
    • Hide

      Reproducing the large oplog entries is straightforward, simply make a number of very large writes in a transaction. It's easier to demonstrate using monogdump, steps to reproduce with mongodump follow.

       

      Capturing the error with mongodump relies on timing, so depending on hardware the test case might not hit the issue, although it's working reliably on my machine. As per the issue description, this is affecting MongoDB 4.2.5, installed on CentOS 7 via the official mongodb-org-4.2 RPM repo, but I've also been able to reproduce using the mongo:4.2.5 docker image.

       

      To reproduce, it's necessary to write a large transaction while mongodump is dumping that database. To reproduce this, I've written a small go command to write a large amount of dummy data to a collection (in order to slow down the mongodump operation), and another go command to perform a short transaction that writes a few large documents. Steps:

       

      1. Create a sufficiently large collection such that mongodump will take at least a few seconds to complete.
      2. Start a mongodump with oplog writing enabled.
      3. While mongodump is running, complete a transaction with a very large amount of data to write, such that the large transaction is captured in mongodump's oplog.bson
      4. Try to inspect the produced oplog.bson with bsondump, or try to restore the dump with mongorestore --oplogReplay

       

      Some code to help reproduce can be found here, tested on Go 1.14:

      https://github.com/ks07/mongodb-oplog-bug

      A full log showing usage in reproducing the error and the error output:

      https://gist.github.com/ks07/869229ea0fd26c6b0058f81427691070

       

      Note that the example writes a small number of large documents very close to the 16 MiB limit, but in production usage we have noticed this behaviour with transactions writing a large number of smaller documents.

      Show
      Reproducing the large oplog entries is straightforward, simply make a number of very large writes in a transaction. It's easier to demonstrate using monogdump, steps to reproduce with mongodump follow.   Capturing the error with mongodump relies on timing, so depending on hardware the test case might not hit the issue, although it's working reliably on my machine. As per the issue description, this is affecting MongoDB 4.2.5, installed on CentOS 7 via the official mongodb-org-4.2 RPM repo, but I've also been able to reproduce using the mongo:4.2.5 docker image.   To reproduce, it's necessary to write a large transaction while mongodump is dumping that database. To reproduce this, I've written a small go command to write a large amount of dummy data to a collection (in order to slow down the mongodump operation), and another go command to perform a short transaction that writes a few large documents. Steps:   Create a sufficiently large collection such that mongodump will take at least a few seconds to complete. Start a mongodump with oplog writing enabled. While mongodump is running, complete a transaction with a very large amount of data to write, such that the large transaction is captured in mongodump's oplog.bson Try to inspect the produced oplog.bson with bsondump, or try to restore the dump with mongorestore --oplogReplay   Some code to help reproduce can be found here, tested on Go 1.14: https://github.com/ks07/mongodb-oplog-bug A full log showing usage in reproducing the error and the error output: https://gist.github.com/ks07/869229ea0fd26c6b0058f81427691070   Note that the example writes a small number of large documents very close to the 16 MiB limit, but in production usage we have noticed this behaviour with transactions writing a large number of smaller documents.

      I've run into an issue where point-in-time snapshots of a mongo server produced using mongodump --oplog can be unusable, if they happen to coincide with a large transaction write operation. In these cases, the oplog.bson produced in the dump will contain a document that exceeds the 16MiB size limit set in mongo-tools-common, and thus restoring with mongorestore --oplogReplay will fail.

       

      # mongorestore --oplogReplay
      {{ 2020-04-08T13:15:13.749+0000 using default 'dump' directory}}
      {{ 2020-04-08T13:15:13.749+0000 preparing collections to restore from}}
      {{ 2020-04-08T13:15:13.751+0000 reading metadata for foo.junk from dump/foo/junk.metadata.json}}
      {{ 2020-04-08T13:15:13.762+0000 restoring foo.junk from dump/foo/junk.bson}}
      {{ 2020-04-08T13:15:16.748+0000 [........................] foo.junk 4.55MB/131MB (3.5%)}}
      {{ 2020-04-08T13:15:19.748+0000 ....................... foo.junk 8.74MB/131MB (6.7%)}}
      {{ 2020-04-08T13:15:22.749+0000 #...................... foo.junk 13.0MB/131MB (10.0%)}}
      {{ 2020-04-08T13:15:25.751+0000 ##..................... foo.junk 17.6MB/131MB (13.5%)}}
      {{ 2020-04-08T13:15:28.752+0000 ###.................... foo.junk 22.1MB/131MB (16.9%)}}
      {{ 2020-04-08T13:15:31.748+0000 ###.................... foo.junk 26.5MB/131MB (20.3%)}}
      {{ 2020-04-08T13:15:34.748+0000 ####................... foo.junk 31.3MB/131MB (24.0%)}}
      {{ 2020-04-08T13:15:37.748+0000 #####.................. foo.junk 36.1MB/131MB (27.7%)}}
      {{ 2020-04-08T13:15:40.753+0000 ######................. foo.junk 40.7MB/131MB (31.2%)}}
      {{ 2020-04-08T13:15:43.748+0000 #######................ foo.junk 45.4MB/131MB (34.8%)}}
      {{ 2020-04-08T13:15:46.748+0000 ########............... foo.junk 49.7MB/131MB (38.1%)}}
      {{ 2020-04-08T13:15:48.489+0000 ####################### foo.junk 131MB/131MB (100.0%)}}
      {{ 2020-04-08T13:15:48.489+0000 no indexes to restore}}
      {{ 2020-04-08T13:15:48.489+0000 finished restoring foo.junk (1000001 documents, 0 failures)}}
      {{ 2020-04-08T13:15:48.489+0000 replaying oplog}}
      {{ 2020-04-08T13:15:48.496+0000 applied 1 oplog entries}}
      {{ 2020-04-08T13:15:48.496+0000 Failed: restore error: error reading oplog bson input: invalid BSONSize: 16777499 bytes}}
      {{ 2020-04-08T13:15:48.496+0000 1000001 document(s) restored successfully. 0 document(s) failed to restore.}}

       

      From looking at the underlying local.oplog.rs collection, I think this may be a problem with how mongod attempts to split transactions into documents when writing to the oplog. By querying the local.oplog.rs collection it's possible to see the offending BSON documents in the oplog. Interestingly, despite this issue, it appears that replication still works correctly, although I haven't tested enough to confidently say this is the case.

       

      I have attached one of the offending oplog.bson dumps to this issue.

            Assignee:
            Unassigned Unassigned
            Reporter:
            george@netcraft.com George Field
            Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

              Created:
              Updated:
              Resolved: