[SERVER-33766] Secondary may not invalidate in-memory routing table cache after primary refresh Created: 08/Mar/18  Updated: 29/Oct/23  Resolved: 20/Apr/18

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.6.3, 3.7.2
Fix Version/s: 3.6.5, 3.7.6

Type: Bug Priority: Major - P3
Reporter: Esha Maharishi (Inactive) Assignee: Janna Golden
Resolution: Fixed Votes: 0
Labels: ShardingTechDebt
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v3.6
Sprint: Sharding 2018-04-23, Sharding 2018-05-07
Participants:
Linked BF Score: 25

 Description   

Secondaries have an OpObserver on "config.cache.collections" to invalidate their in-memory routing table cache entry for the namespace whose routing table is being updated, if the oplog entry has field 'lastRefreshedCollectionVersion'.

Primaries write the 'lastRefreshedCollectionVersion' field when marking the config.cache.collections entry as done refreshing. However, they only write the major/minor versions of the collection version; that is, they do not include the epoch.

If a sharded collection exists with one chunk (e.g., collectionVersion=(1,0, epoch)), then is dropped, recreated, and re-sharded, but the best-effort setShardVersion for the drop fails, then the refresh after the shardCollection will persist lastRefreshedCollectionVersion=(1,0).

Since this is no different than the lastRefreshedCollectionVersion already in the config.cache.collections document, the oplog entry for this update will not include the 'lastRefreshedCollectionVersion' field, so the OpObserver on the secondary will not fire.

Also dianna.hohensee, shouldn't the primary write 'lastRefreshedCollectionVersion' when starting the refresh, so that the secondary invalidates its in-memory cache sooner?



 Comments   
Comment by Githook User [ 23/Apr/18 ]

Author:

{'email': 'golden.janna@gmail.com', 'username': 'jannaerin', 'name': 'jannaerin'}

Message: SERVER-33766 Ensure secondaries always invalidate their in-memory routing table when new metadata arrives
Branch: v3.6
https://github.com/mongodb/mongo/commit/9bcf96d7c37bde286dd0602053d97d7a20f28bff

Comment by Githook User [ 21/Apr/18 ]

Author:

{'email': 'golden.janna@gmail.com', 'username': 'jannaerin', 'name': 'jannaerin'}

Message: SERVER-33766 Ensure that secondaries always invalidate the in-memory routing table when new
metadata arrives
Branch: master
https://github.com/mongodb/mongo/commit/b48579fcba7dfe3c7178b60c88feec96955c02f7

Comment by Githook User [ 20/Apr/18 ]

Author:

{'email': 'golden.janna@gmail.com', 'username': 'jannaerin', 'name': 'jannaerin'}

Message: SERVER-33766 Ensure that secondaries always invalidate the in-memory routing table when new
metadata arrives
Branch: master
https://github.com/mongodb/mongo/commit/a458c159eb3d3bf5260bbde38eeed22a45586636

Comment by Dianna Hohensee (Inactive) [ 09/Mar/18 ]

Technically, config.cache.collections already has the epoch field, but the OpObserver isn't looking at it.

However,
1) We're going to remove the epoch field someday soon, since we have a UUID field in there, too.
2) You're suggesting that we now watch 3 fields, rather than 2 or even just 1, for a simple stable / not stable question. You'll frankly make me cry if it gets so over-complicated
3) There isn't a reason to be using a ChunkVersion. I added 'lastRefreshedCollectionVersion', rather than a sequence number, because it was thought that a version would be more useful for some undefined diagnostic purpose. But that never eventuated: we don't use it for logging anywhere, nor have we ever wanted to consult it for system diagnosis. You could technically merge 'refreshing' and 'refreshSequenceNumber' into one value and be just fine.

You're going to have to deal with backwards compatibility, by the way. I hesitate to say this, but the simplest fix would indeed be to start watching the 'epoch' field for changes. Backport it, and you're done without backwards compatibility issues. That's probably the v3.6 fix regardless.

Comment by Esha Maharishi (Inactive) [ 08/Mar/18 ]

dianna.hohensee, wouldn't it also work to just write the entire ChunkVersion (including epoch), rather than only writing the major/minor versions?

Comment by Dianna Hohensee (Inactive) [ 08/Mar/18 ]

A solution for this ticket would be to go back to the original implementation/design proposal, where instead of 'lastRefreshedCollectionVersion' and Timestamp, we had 'refreshSequenceNumber' as an int we alway incremented.

Here's the commit that changed it from 'lastRefreshedCollectionVersion' to 'refreshSequenceNumber' https://github.com/mongodb/mongo/commit/125497e1ba8b4cd10ae5fa84fc66e70cfe46ccce#diff-de22a7651073aa247f338651b950a06c

Comment by Dianna Hohensee (Inactive) [ 08/Mar/18 ]

esha.maharishi, there's no point in invalidating the routing table before the updates fully propagate, because there's already a window of time before anything replicates where secondaries will continue to service requests: replication lag. If the secondary decides to refresh from the persisted routing table, it will see 'refreshing' and wait for a stable view. Seeing 'lastRefreshedCollectionVersion' will cause the secondary OpObserver to wake up any threads waiting to load the persisted routing table – threads waiting because they saw 'refreshing' true.

Generated at Thu Feb 08 04:34:31 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.