[SERVER-53098] Optimize roundtrips on routers after an operation changed the collection metadata Created: 27/Nov/20  Updated: 10/Feb/22  Resolved: 09/Feb/22

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Marcos José Grillo Ramirez Assignee: Marcos José Grillo Ramirez
Resolution: Won't Do Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-63502 Complete TODO listed in SERVER-53098 Closed
related to SERVER-63515 Remove old unnecessary TODO referenci... Closed
Sprint: Sharding EMEA 2021-11-15, Sharding EMEA 2021-11-29, Sharding EMEA 2021-12-13, Sharding EMEA 2021-12-27, Sharding EMEA 2022-01-10, Sharding EMEA 2022-01-24, Sharding EMEA 2022-02-07, Sharding EMEA 2022-02-21
Participants:

 Description   

The current behavior when a DDL operation just changed the full collection metadata is to advance the collection version and mark the primary shard db as stale (like for example, rename collection). This is actually an optimization originally designed to improve the performance of migrations, where only the operations related to the involved shards need to block, but other operations can be fulfilled. The unintended behavior of this is the following:

Suppose we have a DDL operation on namespace nss that changed the collection metadata, then:

1. The router advances the shard version and mark the database's primary shard as stale
2. Every operation that does not target the primary database shard, will do a roundtrip and then wait for the refresh of the cache entry
3. After marking all shards as stale, every operation for namespace nss will wait for a new version

This is a correct behavior, however, a more optimal approach would be to wait for the new shard version. We could make a linearizable invalidation on the issuer of the DDL command, but this would only improve the performance on one router. We should come up with a way to reduce the number roundtrips once we've determined an operation changed the collection metadata.



 Comments   
Comment by Marcos José Grillo Ramirez [ 09/Feb/22 ]

Having finer granularity in the catalog cache is enough performant for now.

Comment by Kaloian Manassiev [ 21/Oct/21 ]

marcos.grillo, please update the description to match the discussion with the team that you are referencing.

Comment by Marcos José Grillo Ramirez [ 04/Jun/21 ]

After discussing it with the team, this ticket will be repurposed to optimize the roundtrips on routers instead.

Generated at Thu Feb 08 05:29:55 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.