-
Type: Bug
-
Resolution: Gone away
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Sharding
-
None
-
Fully Compatible
-
ALL
The configuration
client(C), router(R), replica set shard1(S1), replica set shard2(S2), Balancer(B)
The scenario:
The client submits the write command to the router and learns the operationTime T1 returned from this write performed on the shard1
Then the balancer initiates migration of the data from S1 to S2 that includes the written by the write.
Client issues a read with (afterClusterTime = T1, readPreference=secondary)
The read will end up on the another shard than the write but the T1 may not be old enough as it has different oplog that may have already data in the time T1.
In this case the read will return the result that will not be causally consistent with the write as it will not have the written values.
One possible solution to the issue will be modifying the afterClusterTime on the router if the requested time is less than the routing information change for the requested data.
I.e. shards may have different clusterTime as we can not assume that they are communicating. However the routing data change is indirect communication that the client is unaware. So the afterClusterTime should be adjusted accordingly. An analogy is moving the clock in the different time zones.
Implementation details:
There are 2 parts:
1. keep the operationTime of the routing metadata refresh.
2. inspect all incoming messages that have afterClusterTime and if there is a chance that the requested data has been moved then update afterClusterTime to the operationTime of the routing metadata refresh.
- related to
-
SERVER-31275 Causal Consistency with secondary reads is broken by chunk migration commit
- Closed