-
Type: Bug
-
Resolution: Duplicate
-
Priority: Major - P3
-
None
-
Affects Version/s: 4.2.5
-
Component/s: Replication, Tools
-
Environment:OS: CentOS Linux release 7.7.1908 (Core)
Kernel: 3.10.0-1062.18.1.el7.x86_64 #1 SMP Tue Mar 17 23:49:17 UTC 2020
Also affects the mongo:4.2.5 docker image
-
Tools
-
ALL
-
I've run into an issue where point-in-time snapshots of a mongo server produced using mongodump --oplog can be unusable, if they happen to coincide with a large transaction write operation. In these cases, the oplog.bson produced in the dump will contain a document that exceeds the 16MiB size limit set in mongo-tools-common, and thus restoring with mongorestore --oplogReplay will fail.
# mongorestore --oplogReplay
{{ 2020-04-08T13:15:13.749+0000 using default 'dump' directory}}
{{ 2020-04-08T13:15:13.749+0000 preparing collections to restore from}}
{{ 2020-04-08T13:15:13.751+0000 reading metadata for foo.junk from dump/foo/junk.metadata.json}}
{{ 2020-04-08T13:15:13.762+0000 restoring foo.junk from dump/foo/junk.bson}}
{{ 2020-04-08T13:15:16.748+0000 [........................] foo.junk 4.55MB/131MB (3.5%)}}
{{ 2020-04-08T13:15:19.748+0000 ....................... foo.junk 8.74MB/131MB (6.7%)}}
{{ 2020-04-08T13:15:22.749+0000 #...................... foo.junk 13.0MB/131MB (10.0%)}}
{{ 2020-04-08T13:15:25.751+0000 ##..................... foo.junk 17.6MB/131MB (13.5%)}}
{{ 2020-04-08T13:15:28.752+0000 ###.................... foo.junk 22.1MB/131MB (16.9%)}}
{{ 2020-04-08T13:15:31.748+0000 ###.................... foo.junk 26.5MB/131MB (20.3%)}}
{{ 2020-04-08T13:15:34.748+0000 ####................... foo.junk 31.3MB/131MB (24.0%)}}
{{ 2020-04-08T13:15:37.748+0000 #####.................. foo.junk 36.1MB/131MB (27.7%)}}
{{ 2020-04-08T13:15:40.753+0000 ######................. foo.junk 40.7MB/131MB (31.2%)}}
{{ 2020-04-08T13:15:43.748+0000 #######................ foo.junk 45.4MB/131MB (34.8%)}}
{{ 2020-04-08T13:15:46.748+0000 ########............... foo.junk 49.7MB/131MB (38.1%)}}
{{ 2020-04-08T13:15:48.489+0000 ####################### foo.junk 131MB/131MB (100.0%)}}
{{ 2020-04-08T13:15:48.489+0000 no indexes to restore}}
{{ 2020-04-08T13:15:48.489+0000 finished restoring foo.junk (1000001 documents, 0 failures)}}
{{ 2020-04-08T13:15:48.489+0000 replaying oplog}}
{{ 2020-04-08T13:15:48.496+0000 applied 1 oplog entries}}
{{ 2020-04-08T13:15:48.496+0000 Failed: restore error: error reading oplog bson input: invalid BSONSize: 16777499 bytes}}
{{ 2020-04-08T13:15:48.496+0000 1000001 document(s) restored successfully. 0 document(s) failed to restore.}}
From looking at the underlying local.oplog.rs collection, I think this may be a problem with how mongod attempts to split transactions into documents when writing to the oplog. By querying the local.oplog.rs collection it's possible to see the offending BSON documents in the oplog. Interestingly, despite this issue, it appears that replication still works correctly, although I haven't tested enough to confidently say this is the case.
I have attached one of the offending oplog.bson dumps to this issue.
- is duplicated by
-
TOOLS-2495 Oplog replay can't handle entries > 16 MB
- Closed
- related to
-
TOOLS-2542 Investigate oplog document size limit exceeded
- Closed