[SERVER-24482] Initial sync during high document update/churn causes repl-worker slowness, connection churn Created: 09/Jun/16 Updated: 03/Jan/20 Resolved: 03/Jan/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 3.0.12 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Andrew Ryder (Inactive) | Assignee: | Evin Roesle |
| Resolution: | Done | Votes: | 3 |
| Labels: | RF, initialSync | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Operating System: | ALL | ||||||||
| Sprint: | Repl 2019-07-01 | ||||||||
| Participants: | |||||||||
| Case: | (copied to CRM) | ||||||||
| Description |
|
Summary Scenario
These messages are not fatal, the scenario can be recovered, but the appearance of this message is correlated with both very slow oplog application and connection churning observed on the sync source from the member being sync'ed. The oplog application speeds up dramatically once it gets past the crossover point where no more operations are occurring for deleted documents. Reproduction
If this is monitored closely the sync'ing member is likely to continue to fall further behind as it applies oplog entries too slowly. As the number of "missing object not found on source" messages drop the replication rate increases. Once no more messages of that type occur, replication rate skyrockets. |
| Comments |
| Comment by Evin Roesle [ 03/Jan/20 ] |
|
This issue has gone away in MongoDB 4.3+ so we are closing this ticket. |
| Comment by Scott Hernandez (Inactive) [ 09/Jun/16 ] |
|
During the first apply phase (of initial sync) when attempting to get to a consistent point, any failed update due to a missing document must re-fetch the document – in this case, since it was deleted it could not be found; each of those queries can be costly and are currently run sequentially, and in-line with the applies. Since fetching the new document, even if not found, causes increasing the window of operation which must be applied before initial sync can complete these cases can also lengthen the time before initial sync is complete. This is a known performance issue, and I've linked a work item related to improving performance in these cases. |