[SERVER-34537] change_streams_shards_start_in_sync.js relies on ARS ordering Created: 18/Apr/18  Updated: 27/Oct/23  Resolved: 02/May/19

Status: Closed
Project: Core Server
Component/s: Internal Code
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Mira Carey Assignee: Backlog - Service Architecture
Resolution: Gone away Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-39163 Perform parallel targeting in the ARS Closed
Assigned Teams:
Service Arch
Operating System: ALL
Participants:

 Description   

The change_streams_shards_start_in_sync.js test relies on the order in which the ARS runs commands against shards in order to succeed.

In particular, it:

  1. Uses mongobridge to disconnect one shard (Process A)
  2. Starts a changestream via a mongos (Process B)
  3. Waits for the changestream to start on the other shards (Process A)
  4. connects the shard (Process A)
  5. Now that the shard is connected, (Process B) finishes

Unfortunately, this relies on ordering in the AsyncRequestsSender. Because the ARS construction looks like:
For each request, use the ReplicaSetMonitor to target the request to a particular host, then call scheduleRemoteCommand. targeting via the rsm is a blocking operation.

Because of the mongobridge disconnect, this means that if you happen to target the disconnected shard before starting the other changestream, the test will hang for 20 seconds in targeting before failing.

One option would be to rewrite the replica set monitor to be fully async, at which point the order of targeting wouldn't matter.



 Comments   
Comment by Mira Carey [ 24/Apr/19 ]

After SERVER-39163, the ARS will be 100% async in its internal operations and this ticket will go away

Generated at Thu Feb 08 04:37:00 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.