[SERVER-54979] Calling move/split/mergeChunk after one another from different MongoS is not causally consistent Created: 05/Mar/21  Updated: 26/Oct/23

Status: Backlog
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.6.22, 4.2.12, 4.0.23, 4.4.4, 4.9.0-alpha4, 5.0.1
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Kaloian Manassiev Assignee: Backlog - Catalog and Routing
Resolution: Unresolved Votes: 0
Labels: oldshardingemea
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Duplicate
is duplicated by SERVER-71626 Failed to Presplit and create chunks ... Closed
Related
is related to SERVER-74600 Fix test bugs due to secondary reads ... Closed
Assigned Teams:
Catalog and Routing
Sprint: Sharding EMEA 2021-09-06, Sharding EMEA 2021-09-20, Sharding EMEA 2021-10-04, Sharding EMEA 2021-10-18, Sharding EMEA 2021-11-01, Sharding EMEA 2021-11-15, Sharding EMEA 2021-11-29, Sharding EMEA 2021-12-13, Sharding EMEA 2021-12-27, Sharding EMEA 2022-01-10, Sharding EMEA 2022-01-24, Sharding EMEA 2022-02-07
Participants:
Linked BF Score: 37

 Description   

Note: This is not a correctness bug, just an annoyance for tests and for people who do manual chunk operations outside of the Balancer.

The move/split/mergeChunk set of commands only involve the chunk's owner shard and the config server, but they don't propagate any kind of causality token to the client, similar to causally-consistent writes for example.

This means that if one issues a split on one MongoS and then move from another, the move may actually not see the effects of the split and return an error that chunk with the exact specified bounds doesn't exist.

This is not a problem for the Balancer, because (a) it always runs on the config server primary, which is as up-to-date as can be and (b) because it almost always runs on the same node.



 Comments   
Comment by Githook User [ 08/Aug/21 ]

Author:

{'name': 'Simon Graetzer', 'email': 'simon.gratzer@mongodb.com'}

Message: SERVER-54979 Let chunkSplit+ splitVector participate in the shard versioning protocol
Branch: master
https://github.com/mongodb/mongo/commit/8974dbdec0286ac47086b794c49214a9f26677bc

Comment by Kaloian Manassiev [ 22/Jul/21 ]

Passing on to simon.gratzer to confirm that with his changes for split/merge to participate in the shard versioning protocol, this has now gone away.

Comment by Kaloian Manassiev [ 01/Apr/21 ]

There is really no good way to fix this other than making all refreshes from the MongoS to be linearisable so they always read from the latest primary in order to ensure they see all the previous effects.

Given that this is just a rare annoyance for tests, I am putting this for sharding/product sync to get a permission to close it as Won't Fix.

Generated at Thu Feb 08 05:35:06 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.