[SERVER-34632] config.chunks change to config.cache.chunks creates a collection long name after upgrade Created: 24/Apr/18 Updated: 26/Oct/23 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 3.6.4 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Shay | Assignee: | Backlog - Catalog and Routing |
| Resolution: | Unresolved | Votes: | 4 |
| Labels: | SSCCL-BUG, oldshardingemea | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||
| Assigned Teams: |
Catalog and Routing
|
||||||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||||||||||
| Sprint: | Sharding 2018-05-21, Sharding 2018-06-04, Sharding 2018-06-18, Sharding 2018-07-02, Sharding 2018-07-16, Sharding 2018-07-30, Sharding 2018-08-13, Sharding EMEA 2021-06-14, Sharding EMEA 2021-06-28, Sharding EMEA 2021-07-12 | ||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||||||||||||||||||
| Linked BF Score: | 0 | ||||||||||||||||||||||||||||||||||||
| Description |
|
After upgrading from 3.4 to 3.6 I get many of these errors for different sharded collections:
This did not happen before the upgrade. I suspect the issue is the change in |
| Comments |
| Comment by Kaloian Manassiev [ 18/Oct/21 ] |
|
The feature was disabled in 5.1 (under |
| Comment by Githook User [ 09/Jul/21 ] |
|
Author: {'name': 'Antonio Fuschetto', 'email': 'antonio.fuschetto@mongodb.com', 'username': 'afuschetto'}Message: SERVER-34632 config.chunks change to config.cache.chunks creates a collection long name after upgrade |
| Comment by Antonio Fuschetto [ 06/Jul/21 ] |
|
Code review url: https://mongodbcr.appspot.com/798010001 |
| Comment by Julio Viera [ 17/Dec/18 ] |
|
Is there any update, ETA or workaround available for this other than renaming the collections? Thanks! |
| Comment by Githook User [ 24/May/18 ] |
|
Author: {'username': 'kaloianm', 'name': 'Kaloian Manassiev', 'email': 'kaloian.manassiev@mongodb.com'}Message: SERVER-34632 Rename `struct dbTask` to DBTask ... to follow naming conventions |
| Comment by Githook User [ 24/May/18 ] |
|
Author: {'username': 'kaloianm', 'name': 'Kaloian Manassiev', 'email': 'kaloian.manassiev@mongodb.com'}Message: SERVER-34632 Use alias for the callback of CatalogCacheLoader::getChunksSince Also use StringMap in CollectionShardingState instead of std::unordered_map. |
| Comment by Esha Maharishi (Inactive) [ 16/May/18 ] |
|
I agree, since the 3.6.5 primary's behavior in the mixed-version replica set is pretty much the same as before the fix. It might be good to confirm that the 3.6.6 secondary will fail reads with non-available readConcern in this case. |
| Comment by Kaloian Manassiev [ 16/May/18 ] |
|
Remember that this will be backported to 3.6 as well. But yes, if a 3.6.6 node is promoted to a primary, manages to create the view and then a 3.6.5 is promoted to a primary before the entire shard is upgraded, the 3.6.5 primary will fail. Getting out of this situation will either require finishing the upgrade or downgrading to 3.6.5 and manually deleting the created views. I think that this is a reasonable tradeoff to what would otherwise require writing to two collections (UUID and namespace) and a complex handoff protocol about which collection the secondaries should be reading from. |
| Comment by Esha Maharishi (Inactive) [ 16/May/18 ] |
|
Hmm, ok, and finally, how does this work in a mixed-version replica set (say two nodes, one is 3.6.x, other is 4.0)? If the 3.6.x node steps up, will it try to write to "config.cache.chunks.<ns>", see it's a view, and fail? |
| Comment by Kaloian Manassiev [ 15/May/18 ] |
|
esha.maharishi, I edited the description above to clarify. The drop of the namespace-suffixed collections will happen as part of the regular update of the cache collection. The sequence is: set the cache collection as "in-update" (which will cause the secondaries to disregard what they read and wait until the "in-update" flag is cleared), drop the namespace-suffxied collection (this means secondaries can get no chunks, but it doesn't matter because the will loop around and retry), then create the UUID-suffixed collection + the view (only if the view name doesn't exceed the name size limitations, which I think there shouldn't be any). |
| Comment by Esha Maharishi (Inactive) [ 15/May/18 ] |
|
The view idea sounds neat. Couple questions -
When will this occur? (On startup/transition to primary; on setFCV=4.0?). Will the drop + creates be atomic? (We might be able to just drop, and let the next refresh do the creates?) Using config.cache.chunks.<uuid> might also solve |
| Comment by Kaloian Manassiev [ 15/May/18 ] |
|
The plan is to fix this problem though the following changes:
renctan, esha.maharishi, schwerin, can you please review this plan? |
| Comment by Shay [ 30/Apr/18 ] |
|
Hi Kal,
Thank you for your response.
We are working on shortening the collection names, however, this is hard to do without downtime to our application.
I would suggest until a fix is made to add a comment to the limits documentation and the upgrade documentation from 3.4 to 3.6 to help others avoid this issue.
please update when you have an estimate of when a fix will be made available, and what changes will be made.
Regards, Shay |
| Comment by Kaloian Manassiev [ 30/Apr/18 ] |
|
Hi Rybak, Sorry for the silence on this ticket. We are aware of what is causing the problem and are working on coming up with a solution. Unfortunately there is no workaround currently other than using a shorter collection name (I noticed there is some name duplication in the collection name you pasted). To give you a little bit of context: these warning messages are indication that the shard chunk filtering metadata could not be persisted on the primary and as such reads against secondary nodes with anything other than the default read concern will not work. In addition, because these failed operations are retried internally, they may build-up in-memory state overtime and cause the server's memory usage to grow unbounded. Apologies for the inconvenience and please continue monitoring this ticket for when this fix will be available. Best regards, -Kal.
|