[SERVER-47646] Scope::_lastVersion optimization breaks with concurrent readers Created: 17/Apr/20 Updated: 06/Dec/22 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Concurrency, JavaScript |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | David Percy | Assignee: | Backlog - Service Architecture |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | sa-remove-fv-backlog-22 | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Assigned Teams: |
Service Arch
|
||||
| Operating System: | ALL | ||||
| Steps To Reproduce: | Checkout this branch: https://github.com/mongodb/mongo/compare/v4.4...dpercy:BF-16735-parallel-systemjs#diff-48f56cf0f9676df3fcbc02748a95e4e6 Compile the server, and run ./repro.sh. It will iterate rerunning resmoke until the test fails. For me it usually takes less than 20 iterations to reproduce. |
||||
| Participants: | |||||
| Linked BF Score: | 10 | ||||
| Description |
|
The scenario is two clients running in parallel: client A inserts a function into system.js then tries to call it; client B just runs a $where. They shouldn't interfere with each other because client B doesn't write to system.js. But somehow client A fails with a ReferenceError: the function it inserted is not defined. This bug was revealed by a new test in 4.4, but I was able to repro on 4.2, so it may go back several versions. — The cause has to do with an optimization in Scope::loadStored. Instead of loading system.js procedures on every call, it uses a global atomic counter, _lastVersion, to avoid reloading when nothing changed. When I disable this optimization, the bug goes away. I added log statements and found this order of events: Step 3 is the surprising part: client A inserts the document before bumping the counter, so if client B reads the new counter value then why doesn't it read the inserted document? I think what's happening must be something like this: The two clients are communicating using two different kinds of state: a WT collection and a global atomic counter. So WT doesn't see all the dependencies between the two clients, and it thinks it can serialize client B before client A. I think one solution would be to store the _lastVersion state in WT somehow. Then client B would never see an inconsistent state where _lastVersion is bumped but the collection is still empty. |