[SERVER-32883] Enhanced FSM testing for reading from secondaries Created: 24/Jan/18  Updated: 30/Oct/23  Resolved: 23/May/18

Status: Closed
Project: Core Server
Component/s: Replication, Testing Infrastructure
Affects Version/s: None
Fix Version/s: 4.0.0-rc1, 4.1.1

Type: Task Priority: Major - P3
Reporter: Geert Bosch Assignee: Xiangyu Yao (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
depends on SERVER-34465 Add a testing parameter to choose a p... Closed
Duplicate
is duplicated by SERVER-34242 Enable causal consistency in concurre... Closed
Related
related to SERVER-32606 Tailing oplog on secondary fails with... Closed
related to SERVER-35057 Notify oplog waiter after advancing t... Closed
related to SERVER-35130 view_catalog_cycle_lookup.js failed i... Closed
related to SERVER-35156 secondary reads return cluster time a... Closed
related to SERVER-33042 Add test coverage for tailing oplog o... Closed
related to SERVER-34383 FSM test of secondary reads during op... Closed
related to SERVER-34384 Passthrough test for secondary reads ... Closed
related to SERVER-35197 Change CleanEveryN to CleanupConcurre... Closed
Backwards Compatibility: Fully Compatible
Backport Requested:
v4.0, v3.6
Sprint: TIG 2018-05-07, Storage NYC 2018-05-07, Storage NYC 2018-05-21, Storage NYC 2018-06-04
Participants:
Story Points: 8

 Description   

1. Change the secondary_reads_passthrough.yml test suite which was added as part of SERVER-34384 to use the "forceSyncSourceCandidate" failpoint as a server parameter to force secondary #2 to sync from secondary #1.

2. Add a new version of the concurrency_replication.yml test suite that uses a 5-node replica set with each secondary syncing in succession of each other (i.e. a linear chain), writeConcern={w: 1}, readConcern={level: "local", afterClusterTime: ...}, and readPreference={mode: "secondary"}. We'll also likely want to make a wrapper around a Mongo connection object to the primary and to a specific secondary so that an individual worker thread talks to a particular secondary all the time rather than some secondaries potentially never being read from.

I think there's some additional complexity here because we want FSM worker thread to do reads from different secondary. (We'll probably pin it to a particular secondary similar to how we "round-robin" when using multiple mongos processes.) It seems like we'll want to have a Mongo connection object implemented in JavaScript that for commands which are present in this list are routed via a direct connection to the secondary and commands not present in that list are routed via a direct connection to the primary. I think the existing "connection cache" in the concurrency framework makes it relatively straightforward to have direct connections to other nodes in the cluster.

In creating this wrapper around two separate Mongo connection objects, we may also want to change how SERVER-34383 was implemented to construct a wrapper around a secondary's connection from the connection cache instead of creating a replica set connection for the worker thread.

Original description

As part of SERVER-32606 it turned out that our testing of tailing the oplog on secondaries, including the case of chained replication, is light, while the code paths for secondary reads have gotten quite different now from reads on primaries.

We should have a passthrough test where we test these behaviors. This is related to SERVER-32606, but was too big a task to do as part of that ticket.



 Comments   
Comment by Githook User [ 24/May/18 ]

Author:

{'username': 'xy24', 'name': 'Xiangyu Yao', 'email': 'xiangyu.yao@mongodb.com'}

Message: SERVER-32883 Add concurrency_replication_causal_consistency suite

(cherry picked from commit 73cf3829a07f09bf35e1563a8cd0c1bad74bc226)
Branch: v4.0
https://github.com/mongodb/mongo/commit/a6bc80ed0712827f4793284c9d3a582260272e8b

Comment by Githook User [ 23/May/18 ]

Author:

{'username': 'xy24', 'name': 'Xiangyu Yao', 'email': 'xiangyu.yao@mongodb.com'}

Message: SERVER-32883 Add concurrency_replication_causal_consistency suite
Branch: master
https://github.com/mongodb/mongo/commit/73cf3829a07f09bf35e1563a8cd0c1bad74bc226

Comment by Spencer Brody (Inactive) [ 24/Jan/18 ]

geert.bosch, why did the fix for SERVER-32606 not include a regression test of the issue it addressed? While I agree that better test coverage of oplog tailing in general is a noble goal, it is unlikely that we're going to have the time to do a comprehensive reconsideration of the test coverage here in the near future. That shouldn't prevent us from increasing test coverage for specific edge cases as we uncover them.

Comment by William Schultz (Inactive) [ 24/Jan/18 ]

I agree that a passthrough suite exercising various chaining configurations could be valuable. A 5-node replica set, for example, would have a number of different chaining topologies that would be good to test. It seems like an area where our test coverage may be lacking.

Comment by Max Hirschhorn [ 24/Jan/18 ]

geert.bosch, is this your idea of having a version of replica_sets_jscore_passthrough.yml where we have 2 secondaries and force one of them to chain off of the other?

Generated at Thu Feb 08 04:31:35 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.