-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Sharding
-
Fully Compatible
-
ALL
-
v4.4
-
Sharding 2020-03-23, Sharding 2020-04-06, Sharding 2020-04-20
-
15
When a shard updates its knowledge of its shard version, post migration commit, it logs a message which looks like this:
[ShardedClusterFixture:job0:shard1:primary] 2020-02-04T15:18:15.510+0000 I COMMAND [conn291] command admin.$cmd appName: "tid:54" command: getMore { getMore: 3975999160804323048, collection: "$cmd.aggregate", lsid: { id: UUID("d2eb0b2e-2ff8-4263-ab12-e5f9514ff6a4"), uid: BinData(0, E3B0C44298FC1C149AFBF4C8996FB92427AE41E4649B934CA495991B7852B855) }, $clusterTime: { clusterTime: Timestamp(1580829495, 39), signat[ShardedClusterFixture:job0:shard1:primary] 2020-02-04T15:18:11.560+0000 I SHARDING [conn55] Updating metadata for collection config.system.sessions from collection version: 15|0||5e398add924cca4d6c4487b2, shard version: 0|0||5e398add924cca4d6c4487b2 to collection version: 16|0||5e398add924cca4d6c4487b2, shard version: 16|0||5e398add924cca4d6c4487b2 due to version change
This log line comes from here and if we zoom inside CollectionMetadata::toStringBasic(), the call to log the current shard version will invoke ChunkManager::getVersion(ShardId).
If it so happens that the CatalogCache's entry for a collection gets invalidated with the local shard id and there is a concurrently running migration, it is possible that the completion of the chunk migration will get stuck indefinitely, because ChunkManager::getVersion will keep throwing ShardInvalidatedForTargetingInfo exceptions and will keep getting retried under refreshFilteringMetadataUntilSuccess
I think the bug is currently not happening, because somehow after the logging changes were committed this line is no longer logged in the test output. For example here.