-
Type:
Task
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
Cluster Scalability
-
ClusterScalability 22Jun-6Jul
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description
The resharding operations registry tracks active resharding operations and their metadata. This metadata is used to block conflicting operations, look up the operation UUID for commit, abort, or cleanup, and route donor-side oplog handling.
Because the registry is an in-memory cache that must stay in sync with durable state, incorrect registration, early unregistration, or bad reconstruction during resync or restart can cause the node to return incorrect metadata or fail to find metadata that should exist.
This can lead to subtle correctness issues. Operations may fail to find the resharding instance they should act on, conflicting operations may not be blocked, or donor oplogs may be routed to the wrong recipient. We therefore need observability into incorrect registry state.
Mitigation
Logs
| Log | Why it helps |
| Log every registry registration and unregistration event, including the resharding UUID, namespace or temporary namespace, participant role, and relevant metadata values at the time of the update. | Trace the flow of registrations and un registrations to determine why it might be stale. |
| Log every registry resync from disk, including why the resync happened and what entries were loaded or removed. | Trace the flow of registrations and un registrations to determine why it might be stale. |
FTDC metrics
| Metric | Why it helps |
| Counter for registry registration and unregistration events. | Correlate visually in FTDC metrics register/unregister events with other operations such as resharding starting. |
| Counter for registry resyncs. | Correlate visually in FTDC metrics register/unregister events with other operations such as resharding starting. |