Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-47646

Scope::_lastVersion optimization breaks with concurrent readers

    • Service Arch
    • ALL
    • Hide

      Checkout this branch: https://github.com/mongodb/mongo/compare/v4.4...dpercy:BF-16735-parallel-systemjs#diff-48f56cf0f9676df3fcbc02748a95e4e6

      Compile the server, and run ./repro.sh. It will iterate rerunning resmoke until the test fails. For me it usually takes less than 20 iterations to reproduce.

      Show
      Checkout this branch: https://github.com/mongodb/mongo/compare/v4.4...dpercy:BF-16735-parallel-systemjs#diff-48f56cf0f9676df3fcbc02748a95e4e6 Compile the server, and run ./repro.sh. It will iterate rerunning resmoke until the test fails. For me it usually takes less than 20 iterations to reproduce.
    • 10

      The scenario is two clients running in parallel: client A inserts a function into system.js then tries to call it; client B just runs a $where. They shouldn't interfere with each other because client B doesn't write to system.js. But somehow client A fails with a ReferenceError: the function it inserted is not defined.

      This bug was revealed by a new test in 4.4, but I was able to repro on 4.2, so it may go back several versions.

      The cause has to do with an optimization in Scope::loadStored. Instead of loading system.js procedures on every call, it uses a global atomic counter, _lastVersion, to avoid reloading when nothing changed. When I disable this optimization, the bug goes away.

      I added log statements and found this order of events:
      1. Client A bumps _lastVersion.
      2. Client B reads the new value of _lastVersion.
      3. Client B updates its Scope instance by deleting all the JS functions in it. It sets _loadedVersion = _lastVersion.
      4. Client A gets that same Scope instance out of the pool.  It doesn't reload anything because _loadedVersion == _lastVersion.

      Step 3 is the surprising part: client A inserts the document before bumping the counter, so if client B reads the new counter value then why doesn't it read the inserted document?

      I think what's happening must be something like this:
      1. Client B begins a WiredTiger transaction.
      2. Client A inserts into system.js.
      3. Client A writes _lastVersion.
      4. Client B reads the new value of _lastVersion.
      5. Client B reads zero documents from system.js, because client B's WiredTiger snapshot is from before the insert.

      The two clients are communicating using two different kinds of state: a WT collection and a global atomic counter. So WT doesn't see all the dependencies between the two clients, and it thinks it can serialize client B before client A.

      I think one solution would be to store the _lastVersion state in WT somehow. Then client B would never see an inconsistent state where _lastVersion is bumped but the collection is still empty.

            Assignee:
            backlog-server-servicearch [DO NOT USE] Backlog - Service Architecture
            Reporter:
            david.percy@mongodb.com David Percy
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated: