[SERVER-32382] Rollback can time out if oplog entries are large Created: 18/Dec/17  Updated: 30/Oct/23  Resolved: 16/May/18

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 3.4.10
Fix Version/s: 3.6.6, 4.0.0-rc0

Type: Bug Priority: Major - P3
Reporter: Bruce Lucas (Inactive) Assignee: Judah Schvimer
Resolution: Fixed Votes: 0
Labels: neweng, rollback-optional
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
depends on SERVER-34345 Make it easier to provide validation ... Closed
Related
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v3.6, v3.4
Sprint: Repl 2018-05-21
Participants:

 Description   

During rollback we query the remote oplog, fetching only a couple of small fields from each oplog entry, and we return up to 16 MB per batch, or about 600 k entries. This requires reading up to 600 k entire oplog entries on the remote end, and if the oplog entries are large and not in cache this can be a very substantial amount of data to be read from disk (tens or hundreds of GB), and may require more than the hard-coded 10-minute timeout to complete. In this case the rollback times out and cannot complete.



 Comments   
Comment by Githook User [ 26/Jun/18 ]

Author:

{'username': 'judahschvimer', 'name': 'Judah Schvimer', 'email': 'judah@mongodb.com'}

Message: SERVER-32382 set a default rollback batch size

(cherry picked from commit e716867bb5c36f7ad4686cf020f5f35b9cd9636e)
Branch: v3.6
https://github.com/mongodb/mongo/commit/1ba1a9fad2d065243a704b6338812406ac445eb0

Comment by Githook User [ 26/Jun/18 ]

Author:

{'username': 'judahschvimer', 'name': 'Judah Schvimer', 'email': 'judah@mongodb.com'}

Message: SERVER-32382 add rollback remote oplog batch size

(cherry picked from commit 9a112a8cb260bfc65bb2bfa3118044744e91a8cb)
Branch: v3.6
https://github.com/mongodb/mongo/commit/ba312234b81a51a397833d1438c2f83aa2a90aa1

Comment by Githook User [ 16/May/18 ]

Author:

{'email': 'judah@mongodb.com', 'username': 'judahschvimer', 'name': 'Judah Schvimer'}

Message: SERVER-32382 set a default rollback batch size
Branch: master
https://github.com/mongodb/mongo/commit/e716867bb5c36f7ad4686cf020f5f35b9cd9636e

Comment by Githook User [ 15/May/18 ]

Author:

{'email': 'judah@mongodb.com', 'username': 'judahschvimer', 'name': 'Judah Schvimer'}

Message: SERVER-32382 add rollback remote oplog batch size
Branch: master
https://github.com/mongodb/mongo/commit/9a112a8cb260bfc65bb2bfa3118044744e91a8cb

Comment by Spencer Brody (Inactive) [ 06/Mar/18 ]

A 10 minute socket timeout already seems ridiculously large, but you're right that in this case increasing it would work around the real issue and be a very small code change, so I'm open to it.

EDIT, although it's probably just as easy to make batch size configurable as to make the timeout configurable, and changing the batch size is probably the better solution.

Comment by Gregory McKeon (Inactive) [ 06/Mar/18 ]

Back to triage to consider Cailin's comment.

Comment by Cailin Nelson [ 03/Mar/18 ]

I don't think it's super important. According to Judah's investigation on HELP-5504, we are can probably avoid falling into this situation by switching to w:majority writes - therefore improving what happens in this situation is not a high priority.

That said.... why not set make the magic 10 minutes a setParameter? So that customers have an emergency "out" if they need it?

Comment by Gregory McKeon (Inactive) [ 02/Mar/18 ]

cailin.nelson Could you comment on the impact of this ticket for Cloud? We're planning to prioritize based on how much pain this will cause you.

Comment by Bruce Lucas (Inactive) [ 18/Dec/17 ]

The issue as I understand it is that we already project only the needed fields, but don't set a batch size so we may have to read a very large amount of oplog data in order to return a batch of 16 MB of very small documents extracted from those large oplog entries. Can we just set a small enough batch size on this query to limit the amount of data that has to be read on the remote end in order to return each batch?

Comment by Judah Schvimer [ 18/Dec/17 ]

All of this code is used in the new rollback algorithm for 3.8. I expect this problem to exist in 3.6 and continue to exist in 3.8 until addressed. We certainly can project out the needed fields very easily. I'm not sure of the best solution to the socket timeout problem.

Generated at Thu Feb 08 04:30:04 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.