[SERVER-53111] Add invariant to CatalogCache that same collection version has same allowMigrations setting Created: 30/Nov/20  Updated: 29/Oct/23  Resolved: 14/Dec/20

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 4.9.0

Type: Task Priority: Major - P3
Reporter: Max Hirschhorn Assignee: Jaume Moragues (Inactive)
Resolution: Fixed Votes: 0
Labels: PM-1965-Milestone-0-Metadata-Format
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Backwards Compatibility: Fully Compatible
Sprint: Sharding 2020-12-14
Participants:

 Description   

Doing an updateOne() by _id during a resharding operation when the existing collection isn't sharded on _id is hitting this invariant in CollectionShardingRuntime::getOwnershipFilter(). (This is a bug in resharding.) The logs indicate that collection version 2|1||5fc2510e13f353da220dd16c first had {"allowMigrations":true} and then {"allowMigrations":false}. The {"allowMigrations":false} gets ignored during the second refresh because the collection version is already known, which is why the invariant in CollectionShardingRuntime::getOwnershipFilter() got triggered. Having this surface as an invariant failure from the CatalogCache refreshing might help to track this issue down more easily.

[js_test:resharding_clones_initial_data] 2020-11-28T13:30:55.118+0000 d20021| {"t":{"$date":"2020-11-28T13:30:55.117+00:00"},"s":"I",  "c":"SH_REFR",  "id":4619901, "ctx":"CatalogCache-0","msg":"Refreshed cached collection","attr":{"namespace":"reshardingDb.coll","newVersion":{"chunkVersion":{"0":{"$timestamp":{"t":2,"i":1}},"1":{"$oid":"5fc2510e13f353da220dd16c"}},"forcedRefreshSequenceNum":1,"epochDisambiguatingSequenceNum":7},"oldVersion":{"chunkVersion":"None","forcedRefreshSequenceNum":0,"epochDisambiguatingSequenceNum":0},"allowMigrations":true,"newRoutingHistory":"RoutingTableHistory: reshardingDb.coll key: { oldKey: 1.0 }\nChunks:\n\tshard: resharding_clones_initial_data-rs0, lastmod: 2|0||5fc2510e13f353da220dd16c, [{ oldKey: MinKey }, { oldKey: 0.0 })\n\tshard: resharding_clones_initial_data-rs1, lastmod: 2|1||5fc2510e13f353da220dd16c, [{ oldKey: 0.0 }, { oldKey: MaxKey })\nShard versions:\n\tresharding_clones_initial_data-rs0: 2|0||5fc2510e13f353da220dd16c\n\tresharding_clones_initial_data-rs1: 2|1||5fc2510e13f353da220dd16c\n","durationMillis":6}}
[js_test:resharding_clones_initial_data] 2020-11-28T13:30:55.118+0000 d20020| {"t":{"$date":"2020-11-28T13:30:55.117+00:00"},"s":"I",  "c":"SH_REFR",  "id":4619901, "ctx":"CatalogCache-0","msg":"Refreshed cached collection","attr":{"namespace":"reshardingDb.coll","newVersion":{"chunkVersion":{"0":{"$timestamp":{"t":2,"i":1}},"1":{"$oid":"5fc2510e13f353da220dd16c"}},"forcedRefreshSequenceNum":1,"epochDisambiguatingSequenceNum":9},"oldVersion":{"chunkVersion":"None","forcedRefreshSequenceNum":0,"epochDisambiguatingSequenceNum":0},"allowMigrations":true,"newRoutingHistory":"RoutingTableHistory: reshardingDb.coll key: { oldKey: 1.0 }\nChunks:\n\tshard: resharding_clones_initial_data-rs0, lastmod: 2|0||5fc2510e13f353da220dd16c, [{ oldKey: MinKey }, { oldKey: 0.0 })\n\tshard: resharding_clones_initial_data-rs1, lastmod: 2|1||5fc2510e13f353da220dd16c, [{ oldKey: 0.0 }, { oldKey: MaxKey })\nShard versions:\n\tresharding_clones_initial_data-rs0: 2|0||5fc2510e13f353da220dd16c\n\tresharding_clones_initial_data-rs1: 2|1||5fc2510e13f353da220dd16c\n","durationMillis":6}}
...
[js_test:resharding_clones_initial_data] 2020-11-28T13:30:55.275+0000 d20020| {"t":{"$date":"2020-11-28T13:30:55.275+00:00"},"s":"I",  "c":"SH_REFR",  "id":4619901, "ctx":"CatalogCache-0","msg":"Refreshed cached collection","attr":{"namespace":"reshardingDb.coll","newVersion":{"chunkVersion":{"0":{"$timestamp":{"t":2,"i":1}},"1":{"$oid":"5fc2510e13f353da220dd16c"}},"forcedRefreshSequenceNum":1,"epochDisambiguatingSequenceNum":12},"oldVersion":{"chunkVersion":"None","forcedRefreshSequenceNum":0,"epochDisambiguatingSequenceNum":0},"allowMigrations":false,"newRoutingHistory":"RoutingTableHistory: reshardingDb.coll key: { oldKey: 1.0 }\nChunks:\n\tshard: resharding_clones_initial_data-rs0, lastmod: 2|0||5fc2510e13f353da220dd16c, [{ oldKey: MinKey }, { oldKey: 0.0 })\n\tshard: resharding_clones_initial_data-rs1, lastmod: 2|1||5fc2510e13f353da220dd16c, [{ oldKey: 0.0 }, { oldKey: MaxKey })\nShard versions:\n\tresharding_clones_initial_data-rs0: 2|0||5fc2510e13f353da220dd16c\n\tresharding_clones_initial_data-rs1: 2|1||5fc2510e13f353da220dd16c\n","durationMillis":6}}
[js_test:resharding_clones_initial_data] 2020-11-28T13:30:55.275+0000 d20021| {"t":{"$date":"2020-11-28T13:30:55.275+00:00"},"s":"I",  "c":"SH_REFR",  "id":4619901, "ctx":"CatalogCache-0","msg":"Refreshed cached collection","attr":{"namespace":"reshardingDb.coll","newVersion":{"chunkVersion":{"0":{"$timestamp":{"t":2,"i":1}},"1":{"$oid":"5fc2510e13f353da220dd16c"}},"forcedRefreshSequenceNum":1,"epochDisambiguatingSequenceNum":10},"oldVersion":{"chunkVersion":"None","forcedRefreshSequenceNum":0,"epochDisambiguatingSequenceNum":0},"allowMigrations":false,"newRoutingHistory":"RoutingTableHistory: reshardingDb.coll key: { oldKey: 1.0 }\nChunks:\n\tshard: resharding_clones_initial_data-rs0, lastmod: 2|0||5fc2510e13f353da220dd16c, [{ oldKey: MinKey }, { oldKey: 0.0 })\n\tshard: resharding_clones_initial_data-rs1, lastmod: 2|1||5fc2510e13f353da220dd16c, [{ oldKey: 0.0 }, { oldKey: MaxKey })\nShard versions:\n\tresharding_clones_initial_data-rs0: 2|0||5fc2510e13f353da220dd16c\n\tresharding_clones_initial_data-rs1: 2|1||5fc2510e13f353da220dd16c\n","durationMillis":6}}
...
[js_test:resharding_clones_initial_data] 2020-11-28T13:30:57.054+0000 d20021| {"t":{"$date":"2020-11-28T13:30:57.054+00:00"},"s":"I",  "c":"SHARDING", "id":0,       "ctx":"conn28","msg":"CollectionShardingRuntime::getOwnershipFilter","attr":{"namespace":"reshardingDb.coll","allowMigrations":true,"metadata":"collection version: 2|1||5fc2510e13f353da220dd16c, shard version: 2|1||5fc2510e13f353da220dd16c"}}
[js_test:resharding_clones_initial_data] 2020-11-28T13:30:57.055+0000 d20020| {"t":{"$date":"2020-11-28T13:30:57.054+00:00"},"s":"I",  "c":"SHARDING", "id":0,       "ctx":"conn30","msg":"CollectionShardingRuntime::getOwnershipFilter","attr":{"namespace":"reshardingDb.coll","allowMigrations":true,"metadata":"collection version: 2|1||5fc2510e13f353da220dd16c, shard version: 2|0||5fc2510e13f353da220dd16c"}}
...
[js_test:resharding_clones_initial_data] 2020-11-28T13:30:57.055+0000 d20021| {"t":{"$date":"2020-11-28T13:30:57.054+00:00"},"s":"F",  "c":"-",        "id":23081,   "ctx":"conn28","msg":"Invariant failure","attr":{"expr":"!ChunkVersion::isIgnoredVersion(*optReceivedShardVersion) || !metadata->get().allowMigrations() || !metadata->get().isSharded()","msg":"For sharded collections getOwnershipFilter cannot be relied on without a valid shard version","file":"src/mongo/db/s/collection_sharding_runtime.cpp","line":137}}
[js_test:resharding_clones_initial_data] 2020-11-28T13:30:57.055+0000 d20020| {"t":{"$date":"2020-11-28T13:30:57.054+00:00"},"s":"F",  "c":"-",        "id":23081,   "ctx":"conn30","msg":"Invariant failure","attr":{"expr":"!ChunkVersion::isIgnoredVersion(*optReceivedShardVersion) || !metadata->get().allowMigrations() || !metadata->get().isSharded()","msg":"For sharded collections getOwnershipFilter cannot be relied on without a valid shard version","file":"src/mongo/db/s/collection_sharding_runtime.cpp","line":137}}



 Comments   
Comment by Githook User [ 14/Dec/20 ]

Author:

{'name': 'Jaume Moragues', 'email': 'jaume.moragues@mongodb.com'}

Message: SERVER-53111 Add invariant to CatalogCache that same collection version has same allowMigrations
Branch: master
https://github.com/mongodb/mongo/commit/17db86d32461a9f48680d27ec286a66f2f3203af

Comment by Blake Oler [ 01/Dec/20 ]

Can we also invariant that the reshardingFields is the same?

Generated at Thu Feb 08 05:29:57 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.