[SERVER-42255] Replica set initialization writes first oplog entry with no term Created: 17/Jul/19 Updated: 06/Dec/22 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Bernard Gorman | Assignee: | Backlog - Replication Team |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Assigned Teams: |
Replication
|
||||||||||||
| Operating System: | ALL | ||||||||||||
| Participants: | |||||||||||||
| Description |
|
When a replica set is first initialized using rs.initiate(), it writes a note to that effect into the oplog as its first entry. However, this is written while the term is OpTime::kUninitializedTerm, and the entry therefore has no term field t:
This has been the case since 3.6 and remains so on current master. |
| Comments |
| Comment by Lingzhi Deng [ 22/Jul/19 ] |
|
The cause is that we write the first oplog entry in initializeReplSetStorage which eventually calls getNextOpTimes and replCoord->getTerm() to determine the term. The problem is that initializeReplSetStorage is called before _finishReplSetInitiate which calls TopologyCoordinator::updateConfig() to reset term from OpTime::kUninitializedTerm (-1) to OpTime::kInitialTerm (0). We don't write down the t field if the term is OpTime::kUninitializedTerm and therefore the t field is missing from the "initiating set" entry. If we somehow initialize the TopologyCoordinator's term earlier to OpTime::kInitialTerm (0) before initializeReplSetStorage is called, maybe we can then log {t: 0} for it. The first election will always be started as term 1. |
| Comment by William Schultz (Inactive) [ 22/Jul/19 ] |
|
Another question that came up in discussion: is it possible to run concurrent replSetInitiate commands against separate replica set nodes? In that case, would both nodes write divergent "initiation" entries and would this cause any problems? |
| Comment by Ratika Gandhi [ 22/Jul/19 ] |
|
We want to understand why this has no term and what happens when the oplog entry gets rolled back. |
| Comment by Judah Schvimer [ 17/Jul/19 ] |
|
This has no term because it is written before any node is elected primary. We'll need to write it in a real term for it to be safe. We may also want to make sure this oplog entry cannot be rolled back, which would be valuable for change streams. |