[SERVER-25071] Ensure replication batch finishes before shutdown Created: 14/Jul/16  Updated: 26/Jan/18  Resolved: 26/Aug/16

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 3.3.0
Fix Version/s: 3.3.12

Type: Improvement Priority: Major - P3
Reporter: Scott Hernandez (Inactive) Assignee: Mathias Stearn
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Documented
is documented by DOCS-9468 Docs for SERVER-25071: Ensure replica... Closed
Duplicate
is duplicated by SERVER-12528 SIGTERM can cause an fassert if we're... Closed
Related
related to SERVER-24933 Clean shutdown of secondaries should ... Closed
related to SERVER-32935 improve exception handling in SyncTai... Closed
related to WT-2649 Some way to indicate valid points in ... Closed
related to SERVER-7200 use oplog as op buffer on secondaries Closed
is related to SERVER-25248 minvalid should be set while appropri... Closed
Backwards Compatibility: Fully Compatible
Sprint: Repl 2016-08-29
Participants:

 Description   

Much like SERVER-24933, we should ensure that our shutdown lands between apply batches.

There are a few ways to get there:

  1. Ignore shutdown interruptions during apply batch (not good for long running applies like index builds)
  2. Have shutdown wait for the current apply batch (similar to above)
  3. Undo/reset current batch work on shutdown to be in a consistent state

In addition we should move setting the minvalid boundaries under the PBWM lock so that minvalid reflects the completed apply batch.



 Comments   
Comment by Githook User [ 08/Sep/16 ]

Author:

{u'username': u'RedBeard0531', u'name': u'Mathias Stearn', u'email': u'mathias@10gen.com'}

Message: SERVER-25071 Remove temporary invariants
Branch: master
https://github.com/mongodb/mongo/commit/ae122d196abc39a05bb26230cd7901faecb3859d

Comment by Mathias Stearn [ 26/Aug/16 ]

It may be worth mentioning that this may cause clean shutdowns to take longer as they will no longer interrupt operations on secondaries. This is mostly an issue with index builds.

Comment by Githook User [ 26/Aug/16 ]

Author:

{u'username': u'RedBeard0531', u'name': u'Mathias Stearn', u'email': u'mathias@10gen.com'}

Message: SERVER-25071 Flush data replication queue as part of clean shutdown
Branch: master
https://github.com/mongodb/mongo/commit/15c19250190932511229ac0e70bacb4c9b107b82

Comment by Kevin Pulo [ 25/Jul/16 ]

scotthernandez, although it's related, I think the thing with setting minvalid while appropriate locks are held is more of a separate issue. I've filed SERVER-25248 for it because:

  • It has separate impacts which are unrelated to shutdown (prevents meaningful backups of fsyncLocked secondaries)
  • It affects both 3.2 and master (whereas this ticket is just about master, with 3.2 covered by SERVER-24933)
  • I already had most of it written out (including the fsyncLock based repro steps)

Together, SERVER-25248 and this ticket mean that there's no reliable way to take a backup from a secondary. Fixing either ticket would at least make some progress forward in this regard.

However, I think it might be possible to shutdown during the "way in" race on SERVER-25248, in which case that ticket would be more of a dependency for this one. (It would probably also benefit from the inShutdownStrict() check that was added to multiApply in SERVER-24933 — the order would be: take fsyncLock mutex, take PBWM, check shutdown, write minvalid.)

Comment by Kevin Pulo [ 25/Jul/16 ]

while it is in this state, the node is invalid and has no (usable) data, until it can finish the batch. It cannot for example transition to secondary, nor become primary.

More importantly (IMHO), it's not a useful backup for DR purposes. This is a problem because a shutdown mongod is considered gold-standard in terms of being quiesced for the purposes of taking a backup. Users rightly expect that a mongod which has been cleanly shutdown has data which is "good" in every way imaginable/possible. For a replset, this includes consistency with the primary. It will be quite shocking for the user who later discovers that the data they saved — from a shutdown mongod — is only useful in the presence of other data which is stored on other hosts (or rather, was stored on other hosts...).

The server has worked this way in all previous versions (including 3.2, once 3.2.9 has been released), so this feels more like a regression to me.

Comment by Scott Hernandez (Inactive) [ 15/Jul/16 ]

When the member is restarted it cannot get to a consistent data set unless it can contact another consistent member so it can finish applying. In essence, while it is in this state, the node is invalid and has no (usable) data, until it can finish the batch. It cannot for example transition to secondary, nor become primary.

The 3-node set, with a single arbiter, is a good example of a failure where losing the primary, and the secondary being in the middle of a batch during shutdown means you lose the ability to recover when the secondary and arbiter restart.

Comment by Andy Schwerin [ 15/Jul/16 ]

What is the harm of shutting down mid-batch?

Generated at Thu Feb 08 04:08:12 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.