[SERVER-76559] Lock-free reads can read from sharded collection as unsharded when collection is dropped and recreated (ABA problem) Created: 26/Apr/23 Updated: 09/Jun/23 Resolved: 09/Jun/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Daniel Gomez Ferro | Assignee: | Henrik Edin |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Assigned Teams: |
Storage Execution
|
||||||||
| Operating System: | ALL | ||||||||
| Sprint: | Execution Team 2023-05-15, Execution Team 2023-05-29, Execution NAMR Team 2023-06-26 | ||||||||
| Participants: | |||||||||
| Description |
|
This is hypothetical, we haven't tried to reproduce it. The scenario looks like this:
Currently we have a check to prevent this exact scenario described in |
| Comments |
| Comment by Henrik Edin [ 09/Jun/23 ] |
|
After discussing with kaloian.manassiev@mongodb.com and daniel.gomezferro@mongodb.com we will close this as Won't Fix as this ticket relates to the old AutoGet*** handlers. We want to fix it in the new Acquisitions classes but this work is tracked in SERVER-76561. |
| Comment by Dianna Hohensee (Inactive) [ 02/May/23 ] |
|
I think this can happen. I haven't tested it, just logically walking through what I know. But it involves a user querying a collection while concurrently serially issuing shardCollection, dropCollection, createCollection cmds. The user will end up reading from the original version of the collection without shard filtering (a routing table). I think this is a perfectly OK result for the user given the concurrency involved. The only way it might not be OK would be if the collection were sharded, moveChunk redistributed the data away from the shard and then the collection is dropped/recreated, all in a very small window of time. I think it is a very unlikely user scenario with moveChunk, but it's definitely a loophole we should close sometime – after verifying the scenario is reproducible. In the master code, the shardCollection can happen anytime after the mongos sends the query against the mongod and before the mongod opens the storage snapshot and then drop/recreate must happen after the snapshot but before the isSharded check. In v6.0, all the commands must run between the isSharded check prior to opening the storage snapshot and checking the SV right after establishing a storage snapshot. It's a good bit more likely in master/7.0, actually, if I've analyzed this correctly. In retrospect, |