|
Hey jerrytstng@gmail.com, thank you for reaching out. I reviewed the jepsen.log file and noticed none of the --read-concern, --write-concern, --txn-read-concern, or --txn-write-concern options were specified in your command invocation. My understanding of the Jepsen framework is that this means the system-default read and write concern levels are being used for the test run. In MongoDB 6.0, the default read concern level is "local" and the default write concern level is "majority". Read concern level "local" isn't sufficient for providing snapshot isolation in cross-shard transactions (see also our documentation on Transactions and Read Concern which discusses this) and so certain classes of anomalies the Elle verifier checks for are to be expected. I'll expand more on why I believe the anomaly you've observed is one of these 'to be expected' ones.
|
jepsen.log
|
lein run test --workload list-append --nemesis all --nodes-file /root/nodes --time-limit 18000 --test-count 1 --sharded
|
The G-single.txt file describes the anomaly detected in prose. There is also an SVG diagram for the anomaly detected. I'll summarize the anomaly detected inline here:
- Since both the T1 and T2 transactions write to the same document either [T1 commits before T2] xor [T2 commits before T1]. It is not possible for T1 and T2 to commit concurrently because writes to a single document are serialized in the MongoDB Server.
- At snapshot isolation level, either [T1 must see value=14 in the document with _id=9] xor [T2 must see value=3 in the document with _id=8].
- This property need not be true at read committed isolation level.
In a sharded cluster, transactions running with {readConcern: {level: "local"}} or {readConcern: {level: "majority"}} behave like read committed isolation level because the read timestamp chosen on each shard is independent. Only for {readConcern: {level: "snapshot"}} does mongos choose a read timestamp which is identical across all shards. In other words, a transaction T2 running with {readConcern: {level: "local"}} may see part of the effects of a committed transaction T1 from Shard_A but can miss seeing the effects of the same committed transaction from Shard_B due to the read timestamp on Shard_A being independent from the read timestamp on Shard_B. Expressed as an inequality, a permissible relationship would look like [the read timestamp for the transaction T2 on Shard_B] being less than [the commit timestamp for the transaction T1] being less than or equal to [the read timestamp for the transaction T2 on Shard_A] being less than [the commit timestamp for transaction T2].
Based on the offending _id values and the deterministic chunk ranges created when the list-append test case shards by {_id: "hashed"} in a 3-shard cluster, I feel confident in claiming the documents with _id=8 and _id=9 would live on two different shards.
[direct: mongos] test> db.getSiblingDB("config").chunks.find({}, {_id: 0, min: 1, max: 1, shard: 1})
|
[
|
{ min: { _id: MinKey() }, max: { _id: Long("-6148914691236517204"), shard: 'shard02' } },
|
{ min: { _id: Long("-6148914691236517204") }, max: { _id: Long("-3074457345618258602") }, shard: 'shard02' },
|
{ min: { _id: Long("-3074457345618258602") }, max: { _id: Long("0") }, shard: 'shard01' },
|
{ min: { _id: Long("0") }, max: { _id: Long("3074457345618258602") }, shard: 'shard01' },
|
{ min: { _id: Long("3074457345618258602") }, max: { _id: Long("6148914691236517204") }, shard: 'shard00' },
|
{ min: { _id: Long("6148914691236517204") }, max: { _id: MaxKey() }, shard: 'shard00' }
|
]
|
> convertShardKeyToHashed(8)
|
NumberLong("-6200100076853976706")
|
> convertShardKeyToHashed(9)
|
NumberLong("6497670140411665948")
|
Hopefully this addresses why you are seeing the behavior you reported. Please let me know if you have any further questions.
Thanks,
Max
|