[SERVER-21911] ShardRegistry::reload can overwrite existing entry with an older one temporarily in SCCC Created: 15/Dec/15 Updated: 06/Dec/22 Resolved: 15/Dec/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 3.2.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Randolph Tan | Assignee: | [DO NOT USE] Backlog - Sharding Team |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Assigned Teams: |
Sharding
|
||||||||||||
| Operating System: | ALL | ||||||||||||
| Participants: | |||||||||||||
| Linked BF Score: | 0 | ||||||||||||
| Description |
|
The outline for ShardRegistry::reload goes like this (as of 4b37c81ddfd33f550f2f42e1a14a56e427620db4): 1. Query config.shards. The issue comes in when 2 threads calls reload and these threads got different results from the query at #1 (basically, they are state at different points in time). The newer one finishes first, and then the older one will overwrite the newer one after it grabs the lock. This will cause the ShardRegistry to contain the old entry until the next reload. This is only a problem with SCCC because the CSRS implementation has a guard against this (Note: opTime is always zero for SCCC): https://github.com/mongodb/mongo/blob/r3.2.0/src/mongo/s/client/shard_registry.cpp#L190-l195 |