[SERVER-56471] Donor's listDatabases response missing operationTime Created: 29/Apr/21  Updated: 29/Oct/23  Resolved: 05/May/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 4.9.0-rc1, 5.0.0-rc0

Type: Bug Priority: Major - P3
Reporter: Lingzhi Deng Assignee: Jack Mulrow
Resolution: Fixed Votes: 0
Labels: pm-1791_non-cloud-blocking, post-rc0
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
is depended on by SERVER-52713 [testing] Add stepdown/kill/terminate... Closed
Related
related to SERVER-47568 No keys found for HMAC in RECOVERING ... Closed
related to SERVER-55371 4.2.13 removed rs.initiate() return o... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.9
Sprint: Sharding 2021-05-03, Sharding 2021-05-17
Participants:
Linked BF Score: 53

 Description   

The recipient's tenant all databases cloner uses the operationTime in the listDatabases response as the afterClusterTime to wait for the listDatabases result to be majority committed on the donor. But if the listDatabases response doesn't contain an operationTime. The afterClusterTime would be null and the migration would fail with "afterClusterTime cannot be a null timestamp".

buildscripts.resmokelib.errors.ServerFailure: Tenant migration 'b7b18b5e-2986-44d1-93dc-05e1171fee1d' with donor replica set 'rs0' aborted due to an error: {'state': 'aborted', 'abortReason': {'code': 72, 'codeName': 'InvalidOptions', 'errmsg': 'Tenant migration recipient command failed :: caused by :: TenantAllDatabaseCloner failed to get listDatabases result majority-committed :: caused by :: afterClusterTime cannot be a null timestamp'}, 'ok': 1.0, '$clusterTime': {'clusterTime': Timestamp(1619673554, 10), 'signature': {'hash': b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00', 'keyId': 0}}, 'operationTime': Timestamp(1619673554, 10)}



 Comments   
Comment by Githook User [ 06/May/21 ]

Author:

{'name': 'Jack Mulrow', 'email': 'jack.mulrow@mongodb.com', 'username': 'jsmulrow'}

Message: SERVER-56471 Give tenant migration recipients privilege to skip cluster time validation

(cherry picked from commit 395510381dc73a1741d139fbf8d5de266ef21384)
Branch: v4.9
https://github.com/mongodb/mongo/commit/d3559beab25b08afa292529182ce100922596741

Comment by Githook User [ 05/May/21 ]

Author:

{'name': 'Jack Mulrow', 'email': 'jack.mulrow@mongodb.com', 'username': 'jsmulrow'}

Message: SERVER-56471 Give tenant migration recipients privilege to skip cluster time validation
Branch: master
https://github.com/mongodb/mongo/commit/395510381dc73a1741d139fbf8d5de266ef21384

Comment by Jack Mulrow [ 05/May/21 ]

A note for anyone who's curious, the linked BF failed for technically a different reason than the failure in this ticket - a node triggers a key refresh on startup but doesn't wait for it to complete before accepting requests, and in that BF the recipient sent a request before its sync source had finished refreshing after a restart, so it received no operationTime - but it should also be fixed by my proposed change to give the recipient the advanceClusterTime privilege, so it's fine to link it as depending on this ticket.

Comment by Lingzhi Deng [ 29/Apr/21 ]

Note: If we want to fix this on the recipient, we need to setOperationTime to a null timestamp before each command because otherwise getOperationTime would just return the operationTime from the last command it ran and that would be wrong.

Comment by Jack Mulrow [ 29/Apr/21 ]

Nodes only clear their cluster time keys on rollback, so the only other case this should happen is after a restart, which should also be rare, so cloning from a different node could be fine. Adding the privilege should be pretty easy though, so I'll give that a shot first.

Also, I'm realizing the keys cache only refreshes when it tries to validate a request with $clusterTime from an unauthorized client or when it expects keys to expire (which happens every 90 days, but we cap the refresh interval at 20 days to avoid overflow issues), but I don't think the recipient sends $clusterTime, so we'd have to do work to trigger the refresh some other way if we go with the "retry until the recipient receives operationTime" approach (probably triggering one after rollback or after failing to find keys in the cache for a best effort signing).

Comment by Lingzhi Deng [ 29/Apr/21 ]

That works too. We then need to make it a retryable error unless we think it is reasonable to fail the migration. But that should be straightforward too.

Comment by Matthew Russotto [ 29/Apr/21 ]

Rollback should be rare so it's actually reasonable to just fail the clone and try another donor node, unless this happens in other cases.

Comment by Jack Mulrow [ 29/Apr/21 ]

It looks like what happened is the node that didn't return $clusterTime or operationTime had just exited rollback, which cleared its key cache, and since signing cluster times for unauthorized external clients is best-effort, the node saw it didn't have a key and returned the listDatabases responses without cluster times.

[j7:rs0:n2] | 2021-04-29T05:19:13.793+00:00 I ROLLBACK 21593 [BackgroundSync] "Transition to ROLLBACK"
...
[j7:rs0:n2] | 2021-04-29T05:19:14.185+00:00 I REPL 20716 [BackgroundSync] "Resetting key manager cache"
[j7:rs0:n2] | 2021-04-29T05:19:14.185+00:00 I ROLLBACK 21592 [BackgroundSync] "Rollback complete"
...
[j7:rs0:n2] | 2021-04-29T05:19:14.185+00:00 I ROLLBACK 21611 [BackgroundSync] "Transition to SECONDARY"
...
[j7:rs0:n2] | 2021-04-29T05:19:14.186+00:00 I REPL 82002 [BackgroundSync] "LogicalTimeValidator::trySignLogicalTime KeyNotFound","attr":\{"keyStatus":{"code":211,"codeName":"KeyNotFound","errmsg":"No key found that is valid for { ts: Timestamp(1619673552, 37) }"}}
...
buildscripts.resmokelib.errors.ServerFailure: Tenant migration 'b7b18b5e-2986-44d1-93dc-05e1171fee1d' with donor replica set 'rs0' aborted due to an error: \{'state': 'aborted', 'abortReason': {'code': 72, 'codeName': 'InvalidOptions', 'errmsg': 'Tenant migration recipient command failed :: caused by :: TenantAllDatabaseCloner failed to get listDatabases result majority-committed :: caused by :: afterClusterTime cannot be a null timestamp'}, 'ok': 1.0, '$clusterTime': \{'clusterTime': Timestamp(1619673554, 10), 'signature': {'hash': b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00', 'keyId': 0}}, 'operationTime': Timestamp(1619673554, 10)}

I can think of a couple ways to get around this, assuming the recipient needs an operationTime to operate correctly. One is to have the recipient retry the listDatabases command until it includes cluster times, which should happen after the donor node's next periodic refresh. Or we could give the recipient the "advanceClusterTime" privilege so it will always receive cluster times even if the donor's key cache is empty.

Generated at Thu Feb 08 05:39:21 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.