[SERVER-4425] Mongodump occasionally fails to dump database with --oplog Created: 04/Dec/11  Updated: 30/Mar/12  Resolved: 22/Dec/11

Status: Closed
Project: Core Server
Component/s: Tools
Affects Version/s: 2.0.1
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Aristarkh Zagorodnikov Assignee: Eric Milkie
Resolution: Cannot Reproduce Votes: 0
Labels: dump
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Backwards Compatibility: Fully Compatible
Operating System: Linux
Participants:

 Description   

Mongodump command:
"/usr/bin/mongodump" -o "/path/to/dump" -h target.host.name --oplog
assertion: 13106 nextSafe():

{ $err: "capped cursor overrun during query: local.oplog.rs", code: 13338 }

Server log extract:
Sun Dec 4 05:09:56 [conn16022] Assertion: 13338:capped cursor overrun during query: local.oplog.rs
Sun Dec 4 05:09:56 [conn16022] assertion 13338 capped cursor overrun during query: local.oplog.rs ns:local.oplog.rs query:{ ts:

{ $gt: Timestamp 1322960989000|38 }

}
Sun Dec 4 05:09:56 [conn16022] query local.oplog.rs exception: capped cursor overrun during query: local.oplog.rs code:13338 reslen:96 468ms



 Comments   
Comment by Aristarkh Zagorodnikov [ 22/Dec/11 ]

Good, I hope it won't show up anymore.

Comment by Eric Milkie [ 22/Dec/11 ]

I was unsuccessful in reproducing the behavior as well. I did clean up some threading and uninitialized variables in the course of running mongodump, so it's possible that I've fixed the issue; these changes will be released in v2.1.0.
I'm going to resolve this for now, but please reopen if it happens again. Thanks for taking the time to file this issue.

Comment by Aristarkh Zagorodnikov [ 22/Dec/11 ]

Yes, we had this failure only once, repeating backup solved the problem and we never had it again despite using the same command in the same environment on a daily basis.

Comment by Eric Milkie [ 21/Dec/11 ]

When you run mongodump, how often does the failure occur? I presume when you retry immediately with the same command, it succeeds?

Comment by Aristarkh Zagorodnikov [ 11/Dec/11 ]

Unfortunately our case is different. The backup takes about three minutes to complete (total uncompressed BSON size is ~1.8GiB), and the oplog contains enough space to store about 20 hours of writes in the worst case (checked each server using http://target.host.name:28017/_replSetOplog?_id=1 which shows at least 21-hour difference between oplog top and bottom records).

Comment by Eliot Horowitz (Inactive) [ 11/Dec/11 ]

That means the dump is taking longer than you have data in the oplog for.
You'll need to increase the size of the oplog to use this.

Generated at Thu Feb 08 03:05:57 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.