[DOCS-12975] Investigate changes in SERVER-42022: Attempt to remove initial sync missing document fetching Created: 21/Aug/19  Updated: 13/Nov/23  Resolved: 17/Mar/20

Status: Closed
Project: Documentation
Component/s: manual, Server
Affects Version/s: None
Fix Version/s: 4.3.1, Server_Docs_20231030, Server_Docs_20231106, Server_Docs_20231105, Server_Docs_20231113

Type: Task Priority: Major - P3
Reporter: Backlog - Core Eng Program Management Team Assignee: Kay Kim (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Documented
documents SERVER-42022 Attempt to remove initial sync missin... Closed
Duplicate
Participants:
Days since reply: 3 years, 47 weeks, 2 days ago
Epic Link: DOCS: 4.4 Server Release Work

 Description   

Description

Downstream Change Summary

During initial sync, a replica clones all collections, while at the same time buffer its sync source's new oplog entries. Once cloning completes, the replica sets its "minValid" timestamp to the latest oplog entry, and replays the oplog up to minValid. At this point the replica has constructed a consistent snapshot of the data, and it can transition to a secondary.

In MMAPv1 days, it was possible to miss a document while cloning a collection, even if the document existed during the whole time we were cloning, because the document was moved during the clone. Therefore, during recovery or initial sync, when a replica would apply an update operation that referred to a missing document (either from a regular update or the "applyOps" command) it would attempt to fetch the document from the sync source, in case the document had been missed during cloning. If the document was found, the replica would advance its minValid timestamp to the sync source's current opTime, setting back its progress toward achieving a consistent snapshot.

In MongoDB 4.3 we no longer support MMAPv1, and we don't even support syncing from a server that uses MMAPv1. Post-MMAPv1 we can't miss documents during collection clone, so it is no longer necessary to fetch missing documents, nor to advance minValid. If an update refers to a missing document, then there must be a later oplog entry that deletes the document, so the replica can skip the update operation.

We don't have much documentation of this process, so not much has to change. replSetGetStatus.initialSyncStatus.fetchedMissingDocs field is removed, see two references in replSetGetStatus.txt and one in getLog.txt. There may be other places that need updating that I have no found, however!

Description of Linked Ticket

We currently think this was only necessary for mmapv1 when updates could cause a collection scan to miss a document. If we could remove this step, it could significantly speed up initial sync in certain workloads and simplify initial sync.

This would require the storage engine API to prohibit updates from causing collection scans to miss a document.

Scope of changes

Impact to Other Docs

MVP (Work and Date)

Resources (Scope or Design Docs, Invision, etc.)



 Comments   
Comment by Githook User [ 16/Mar/20 ]

Author:

{'username': 'kay-kim', 'name': 'Kay Kim', 'email': 'kay.kim@10gen.com'}

Message: DOCS-12975: 4.4 remove mmapv1 fetchedMissingDocs metrics
Branch: master
https://github.com/mongodb/docs/commit/549da48db7be4b80cc63623eb5a7a0c4d7d54e11

Generated at Thu Feb 08 08:06:38 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.