Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-35316

Causal Consistency Violation for local read/write concerns in Jepsen Sharded Cluster Test

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Works as Designed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Sharding
    • Labels:
      None
    • Operating System:
      ALL

      Description

      Original Problem:

       

      When running a test against 2 replica set shards, with rc “local” and wc “w1” we get reads returning the base value of the document, nil, despite occurring after acknowledged writes in the session. Each single threaded client is writing to one key at a time, using one session, against a single mongos router, and does not have writes to secondaries enabled. The nemesis partitions the network into random halves for 10 seconds, with a 10 second wait in between (This failure has not appeared with partitions disabled.

      The expected pattern of operations in this test is read nil/0, write 1, read 1, write 2, read 2. In the test histories I’ve attached below, the :position field is the op’s optime value, read from the session after acknowledgement. It’s :link field is the previous optime value the client has seen in that key’s session. 

      In the first set of results (rwr-initial-read-1), this occurs in 3 keys over a 40 second test. In each failing key (15 23 16), we see a read of 0 (representing the initial empty document’s read nil for the checker), and a successful write of 1. Then the read following write 2 returns an empty document.  

      The second result set (rwr-initial-read-2) provided is a longer test over 300 seconds, where we observe this anomaly 6 times. Keys 50, 51, 82, 125, and 143 appear to drop the value on write 2. However, key 116 is missing the value for write 1.  Also of note is that in the history for key 116, (under independent/116) write 2 succeeds and appears in the final read for the key. See op `{:type :ok, :f :read, :value 2, :process 5, :time 205602064887, :position 6559250071353819138, :link 6559250071353819137, :index 1206}`

       

       

        Attachments

        1. diagram1.png
          diagram1.png
          50 kB
        2. diagram2.png
          diagram2.png
          54 kB

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                14 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: