[SERVER-56471] Donor's listDatabases response missing operationTime Created: 29/Apr/21 Updated: 29/Oct/23 Resolved: 05/May/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 4.9.0-rc1, 5.0.0-rc0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Lingzhi Deng | Assignee: | Jack Mulrow |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | pm-1791_non-cloud-blocking, post-rc0 | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||
| Backport Requested: |
v4.9
|
||||||||||||||||||||||||
| Sprint: | Sharding 2021-05-03, Sharding 2021-05-17 | ||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Linked BF Score: | 53 | ||||||||||||||||||||||||
| Description |
|
The recipient's tenant all databases cloner uses the operationTime in the listDatabases response as the afterClusterTime to wait for the listDatabases result to be majority committed on the donor. But if the listDatabases response doesn't contain an operationTime. The afterClusterTime would be null and the migration would fail with "afterClusterTime cannot be a null timestamp".
|
| Comments |
| Comment by Githook User [ 06/May/21 ] | ||||||||||
|
Author: {'name': 'Jack Mulrow', 'email': 'jack.mulrow@mongodb.com', 'username': 'jsmulrow'}Message: (cherry picked from commit 395510381dc73a1741d139fbf8d5de266ef21384) | ||||||||||
| Comment by Githook User [ 05/May/21 ] | ||||||||||
|
Author: {'name': 'Jack Mulrow', 'email': 'jack.mulrow@mongodb.com', 'username': 'jsmulrow'}Message: | ||||||||||
| Comment by Jack Mulrow [ 05/May/21 ] | ||||||||||
|
A note for anyone who's curious, the linked BF failed for technically a different reason than the failure in this ticket - a node triggers a key refresh on startup but doesn't wait for it to complete before accepting requests, and in that BF the recipient sent a request before its sync source had finished refreshing after a restart, so it received no operationTime - but it should also be fixed by my proposed change to give the recipient the advanceClusterTime privilege, so it's fine to link it as depending on this ticket. | ||||||||||
| Comment by Lingzhi Deng [ 29/Apr/21 ] | ||||||||||
|
Note: If we want to fix this on the recipient, we need to setOperationTime to a null timestamp before each command because otherwise getOperationTime would just return the operationTime from the last command it ran and that would be wrong. | ||||||||||
| Comment by Jack Mulrow [ 29/Apr/21 ] | ||||||||||
|
Nodes only clear their cluster time keys on rollback, so the only other case this should happen is after a restart, which should also be rare, so cloning from a different node could be fine. Adding the privilege should be pretty easy though, so I'll give that a shot first. Also, I'm realizing the keys cache only refreshes when it tries to validate a request with $clusterTime from an unauthorized client or when it expects keys to expire (which happens every 90 days, but we cap the refresh interval at 20 days to avoid overflow issues), but I don't think the recipient sends $clusterTime, so we'd have to do work to trigger the refresh some other way if we go with the "retry until the recipient receives operationTime" approach (probably triggering one after rollback or after failing to find keys in the cache for a best effort signing). | ||||||||||
| Comment by Lingzhi Deng [ 29/Apr/21 ] | ||||||||||
|
That works too. We then need to make it a retryable error unless we think it is reasonable to fail the migration. But that should be straightforward too. | ||||||||||
| Comment by Matthew Russotto [ 29/Apr/21 ] | ||||||||||
|
Rollback should be rare so it's actually reasonable to just fail the clone and try another donor node, unless this happens in other cases. | ||||||||||
| Comment by Jack Mulrow [ 29/Apr/21 ] | ||||||||||
|
It looks like what happened is the node that didn't return $clusterTime or operationTime had just exited rollback, which cleared its key cache, and since signing cluster times for unauthorized external clients is best-effort, the node saw it didn't have a key and returned the listDatabases responses without cluster times.
I can think of a couple ways to get around this, assuming the recipient needs an operationTime to operate correctly. One is to have the recipient retry the listDatabases command until it includes cluster times, which should happen after the donor node's next periodic refresh. Or we could give the recipient the "advanceClusterTime" privilege so it will always receive cluster times even if the donor's key cache is empty. |