[SERVER-21911] ShardRegistry::reload can overwrite existing entry with an older one temporarily in SCCC Created: 15/Dec/15  Updated: 06/Dec/22  Resolved: 15/Dec/15

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.2.0
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Randolph Tan Assignee: [DO NOT USE] Backlog - Sharding Team
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
related to SERVER-22797 Calls to ShardRegistry::reload needs ... Closed
Assigned Teams:
Sharding
Operating System: ALL
Participants:
Linked BF Score: 0

 Description   

The outline for ShardRegistry::reload goes like this (as of 4b37c81ddfd33f550f2f42e1a14a56e427620db4):

1. Query config.shards.
2. Grab mutex.
3. Clear everything and repopulate from the query result.

The issue comes in when 2 threads calls reload and these threads got different results from the query at #1 (basically, they are state at different points in time). The newer one finishes first, and then the older one will overwrite the newer one after it grabs the lock. This will cause the ShardRegistry to contain the old entry until the next reload.

This is only a problem with SCCC because the CSRS implementation has a guard against this (Note: opTime is always zero for SCCC):

https://github.com/mongodb/mongo/blob/r3.2.0/src/mongo/s/client/shard_registry.cpp#L190-l195


Generated at Thu Feb 08 03:58:47 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.