[SERVER-8139] New replication dep. on minvalid collection causes bad behavior Created: 10/Jan/13  Updated: 11/Jul/16  Resolved: 24/Jan/13

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 2.4.0-rc0

Type: Improvement Priority: Blocker - P1
Reporter: Scott Hernandez (Inactive) Assignee: Eric Milkie
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on DOCS-961 Update/create documentation for "prim... Closed
Related
is related to SERVER-7652 Initial sync criteria needs to check ... Closed
is related to SERVER-8556 Add failpoint tests to initial sync t... Backlog
Participants:

 Description   

With the new behavior of using minvalid to determine if a initial sync has been done, and regular replication should start, leads to some significant problems.

Cases:

  • Upgrade a set without minvalid collectoin: all nodes going into startup2(forever) if no primary/secondary is up, otherwise all nodes drop+resync all data.
  • Variations of this will cause the initial sync to fail/elect a primary because of the oplog stale rules
  • Removing the minvalid collection causes a full resync (ind. of data/indexes), or host going offline in startup2 until one can be done

The upgrade case is bad since we had no need for the minvalid collection and it was not maintained nor guaranteed (esp. on the primary) or if replicas were seeded with a copy of the files without it.



 Comments   
Comment by auto [ 24/Jan/13 ]

Author:

{u'date': u'2013-01-24T18:24:56Z', u'email': u'milkie@10gen.com', u'name': u'Eric Milkie'}

Message: SERVER-8139 cleanup

'h' is not needed in minValid recorded in the database; it is never
read.
The dummy minValid value is no longer needed either.
Branch: master
https://github.com/mongodb/mongo/commit/4622d34b15e3e941d9164d099dc320bfdb5cdb62

Comment by auto [ 24/Jan/13 ]

Author:

{u'date': u'2013-01-24T16:37:32Z', u'email': u'milkie@10gen.com', u'name': u'Eric Milkie'}

Message: SERVER-8139 add special flag to minvalid during initial sync
Branch: master
https://github.com/mongodb/mongo/commit/334c13c492b98cf82e1605cba5ca774bb52014f8

Comment by Scott Hernandez (Inactive) [ 16/Jan/13 ]

Eric,

So the only logic change is that if the minvalid collection has the 0'd doc then initial sync has not completed and must wipe and restart? At startup, if the minvalid collection is missing, or has any non-zero ts/h fields, then replication works normally, not causing an initial sync.

This sounds reasonable and similar to Kristina and my suggestion to keep a different collection with more state about the initial sync (steps) as an indication of the initial sync state (and completion). I see some advantages to keeping more diagnostic information within this collection (not minvalid which is basically a boolean of initial sync active/done) but they effectively provide the same marker that that indicates if the initial sync has started/is-active and is done.

Comment by Eric Milkie [ 15/Jan/13 ]

Proposal:
At the beginning of initial sync, we can set minValid to ts:0,h:0. It will get replaced at the end of a successful initial sync.

Then, we can use this value as part of the initial sync criteria. It will only exist if an initial sync was attempted but never completed.

Generated at Thu Feb 08 03:16:39 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.