[SERVER-46965] Test that we eliminate retries of short-running reads due to planned primary shutdown (without streamable isMaster) Created: 18/Mar/20  Updated: 21/May/20  Resolved: 21/May/20

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Tess Avitabile (Inactive) Assignee: Jason Chan
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-46963 Test locally that we eliminate retrie... Closed
is depended on by SERVER-46967 Test locally that we eliminate retrie... Closed
Sprint: Repl 2020-06-01
Participants:

 Description   

Write a workload of short-running reads (<1 second) with readPreference “primaryPreferred”. This should be run against a ReplSetTest with 2 nodes, where the primary is periodically shut down. Run this workload with the 4.2 version of pymongo to test without streamable isMaster (consider using a smaller client heartbeat frequency or a longer quiesce mode, in order to be robust to slow machines). Simulate 1 second of network latency between the driver and the primary using mongobridge. The test should disable retries in pymongo and assert that all commands succeed. This demonstrates that the need for retries is eliminated. Consider only running this test on variants that are not typically slow.



 Comments   
Comment by Jason Chan [ 21/May/20 ]

We want to observe the behavior:
The primary node P goes into quiesce mode while a read is still in progress. P will return a ShutdownInProgress error as an isMaster response to the driver. The find should proceed to complete and return to the driver without raising an error even if the driver has already marked S as unknown.

Set a failpoint on primary node d20020 to hang during a find command:

[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-20T22:01:41.847-0400 d20020| {"t":{"$date":"2020-05-20T22:01:41.846-04:00"},"s":"D2", "c":"COMMAND",  "id":21965,   "ctx":"conn1","msg":"About to run the command","attr":{"db":"admin","commandArgs":{"configureFailPoint":"waitAfterCommandFinishesExecution","mode":"alwaysOn","data":{"ns":"test.state","commands":["find"]},"lsid":{"id":{"$uuid":"d3b950ee-28b2-4a9d-a155-79f15a41a761"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1590026501,"i":2}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$readPreference":{"mode":"secondaryPreferred"},"$db":"admin"}}}
[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-20T22:01:41.847-0400 d20020| {"t":{"$date":"2020-05-20T22:01:41.846-04:00"},"s":"D2", "c":"COMMAND",  "id":21961,   "ctx":"conn1","msg":"Admin only command","attr":{"command":"configureFailPoint"}}
[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-20T22:01:41.847-0400 d20020| {"t":{"$date":"2020-05-20T22:01:41.847-04:00"},"s":"W",  "c":"CONTROL",  "id":23829,   "ctx":"conn1","msg":"Set failpoint","attr":{"failPointName":"waitAfterCommandFinishesExecution","failPoint":{"mode":1,"data":{"ns":"test.state","commands":["find"]},"timesEntered":0}}}
[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-20T22:01:41.847-0400 d20020| {"t":{"$date":"2020-05-20T22:01:41.847-04:00"},"s":"I",  "c":"COMMAND",  "id":51803,   "ctx":"conn1","msg":"Slow query","attr":{"type":"command","ns":"admin.$cmd","appName":"MongoDB Shell","command":{"configureFailPoint":"waitAfterCommandFinishesExecution","mode":"alwaysOn","data":{"ns":"test.state","commands":["find"]},"lsid":{"id":{"$uuid":"d3b950ee-28b2-4a9d-a155-79f15a41a761"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1590026501,"i":2}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$readPreference":{"mode":"secondaryPreferred"},"$db":"admin"},"numYields":0,"reslen":174,"locks":{},"protocol":"op_msg","durationMillis":0}}

Set a failpoint to hang while in quiesce mode on node d20020:

[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-20T22:01:41.850-0400 d20020| {"t":{"$date":"2020-05-20T22:01:41.850-04:00"},"s":"D2", "c":"COMMAND",  "id":21965,   "ctx":"conn1","msg":"About to run the command","attr":{"db":"admin","commandArgs":{"configureFailPoint":"hangDuringQuiesceMode","mode":"alwaysOn","data":{},"lsid":{"id":{"$uuid":"d3b950ee-28b2-4a9d-a155-79f15a41a761"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1590026501,"i":2}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$readPreference":{"mode":"secondaryPreferred"},"$db":"admin"}}}
[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-20T22:01:41.850-0400 d20020| {"t":{"$date":"2020-05-20T22:01:41.850-04:00"},"s":"D2", "c":"COMMAND",  "id":21961,   "ctx":"conn1","msg":"Admin only command","attr":{"command":"configureFailPoint"}}
[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-20T22:01:41.850-0400 d20020| {"t":{"$date":"2020-05-20T22:01:41.850-04:00"},"s":"W",  "c":"CONTROL",  "id":23829,   "ctx":"conn1","msg":"Set failpoint","attr":{"failPointName":"hangDuringQuiesceMode","failPoint":{"mode":1,"data":{},"timesEntered":0}}}
[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-20T22:01:41.851-0400 d20020| {"t":{"$date":"2020-05-20T22:01:41.850-04:00"},"s":"I",  "c":"COMMAND",  "id":51803,   "ctx":"conn1","msg":"Slow query","attr":{"type":"command","ns":"admin.$cmd","appName":"MongoDB Shell","command":{"configureFailPoint":"hangDuringQuiesceMode","mode":"alwaysOn","data":{},"lsid":{"id":{"$uuid":"d3b950ee-28b2-4a9d-a155-79f15a41a761"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1590026501,"i":2}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$readPreference":{"mode":"secondaryPreferred"},"$db":"admin"},"numYields":0,"reslen":174,"locks":{},"protocol":"op_msg","durationMillis":0}}

Wait for a find command to hang on the primary d20020:

[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-20T22:01:41.849-0400 d20020| {"t":{"$date":"2020-05-20T22:01:41.848-04:00"},"s":"I",  "c":"COMMAND",  "id":51803,   "ctx":"conn1","msg":"Slow query","attr":{"type":"command","ns":"admin.$cmd","appName":"MongoDB Shell","command":{"waitForFailPoint":"waitAfterCommandFinishesExecution","timesEntered":1.0,"maxTimeMS":300000.0,"lsid":{"id":{"$uuid":"d3b950ee-28b2-4a9d-a155-79f15a41a761"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1590026501,"i":2}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$readPreference":{"mode":"secondaryPreferred"},"$db":"admin"},"numYields":0,"reslen":163,"locks":{},"protocol":"op_msg","durationMillis":0}}

Primary node d20020 receives a shutdown command. It proceeds to shutdown and transition to quiesce mode. The node will hang here due to the failpoint:

[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-20T22:01:41.852-0400 d20020| {"t":{"$date":"2020-05-20T22:01:41.851-04:00"},"s":"I",  "c":"CONTROL",  "id":23377,   "ctx":"SignalHandler","msg":"got signal {sig} ({strsignal_sig})","attr":{"sig":15,"strsignal_sig":"Terminated"}}
[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-20T22:01:41.852-0400 d20020| {"t":{"$date":"2020-05-20T22:01:41.852-04:00"},"s":"I",  "c":"CONTROL",  "id":23378,   "ctx":"SignalHandler","msg":"kill from pid:{si_si_pid} uid:{si_si_uid}","attr":{"si_si_pid":18480,"si_si_uid":1000}}
[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-20T22:01:41.852-0400 d20020| {"t":{"$date":"2020-05-20T22:01:41.852-04:00"},"s":"I",  "c":"CONTROL",  "id":23381,   "ctx":"SignalHandler","msg":"will terminate after current cmd ends"}
[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-20T22:01:41.852-0400 d20020| {"t":{"$date":"2020-05-20T22:01:41.852-04:00"},"s":"I",  "c":"REPL",     "id":21343,   "ctx":"RstlKillOpThread","msg":"Starting to kill user operations"}
[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-20T22:01:41.853-0400 d20020| {"t":{"$date":"2020-05-20T22:01:41.852-04:00"},"s":"I",  "c":"REPL",     "id":21344,   "ctx":"RstlKillOpThread","msg":"Stopped killing user operations"}
[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-20T22:01:41.853-0400 d20020| {"t":{"$date":"2020-05-20T22:01:41.852-04:00"},"s":"I",  "c":"REPL",     "id":21340,   "ctx":"RstlKillOpThread","msg":"State transition ops metrics","attr":{"metrics":{"lastStateTransition":"stepDown","userOpsKilled":0,"userOpsRunning":1}}}
[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-20T22:01:41.853-0400 d20020| {"t":{"$date":"2020-05-20T22:01:41.852-04:00"},"s":"I",  "c":"REPL",     "id":21358,   "ctx":"SignalHandler","msg":"Replica set state transition","attr":{"newState":"SECONDARY","oldState":"PRIMARY"},"tags":[]}
...
[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-20T22:01:41.853-0400 d20020| {"t":{"$date":"2020-05-20T22:01:41.853-04:00"},"s":"I",  "c":"STORAGE",  "id":4695101, "ctx":"SignalHandler","msg":"hangDuringQuiesceMode failpoint enabled"}

Primary node d20020 responds to the driver with a ShutdownInProgress isMaster response:

[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-20T22:01:42.762-0400 d20020| {"t":{"$date":"2020-05-20T22:01:42.761-04:00"},"s":"D2", "c":"COMMAND",  "id":21965,   "ctx":"conn16","msg":"About to run the command","attr":{"db":"admin","commandArgs":{"ismaster":1,"client":{"driver":{"name":"PyMongo","version":"3.10.1"},"os":{"type":"Linux","name":"Linux","architecture":"x86_64","version":"4.13.0-46-generic"},"platform":"CPython 3.7.0.final.0"},"compression":[],"$db":"admin"}}}
[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-20T22:01:42.762-0400 d20020| {"t":{"$date":"2020-05-20T22:01:42.761-04:00"},"s":"I",  "c":"NETWORK",  "id":51800,   "ctx":"conn16","msg":"client metadata","attr":{"remote":"127.0.0.1:43268","client":"conn16","doc":{"driver":{"name":"PyMongo","version":"3.10.1"},"os":{"type":"Linux","name":"Linux","architecture":"x86_64","version":"4.13.0-46-generic"},"platform":"CPython 3.7.0.final.0"}}}
[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-20T22:01:42.762-0400 d20020| {"t":{"$date":"2020-05-20T22:01:42.762-04:00"},"s":"D1", "c":"COMMAND",  "id":21962,   "ctx":"conn16","msg":"Assertion while executing command","attr":{"command":"ismaster","db":"admin","commandArgs":{"ismaster":1,"client":{"driver":{"name":"PyMongo","version":"3.10.1"},"os":{"type":"Linux","name":"Linux","architecture":"x86_64","version":"4.13.0-46-generic"},"platform":"CPython 3.7.0.final.0"},"compression":[],"$db":"admin"},"error":"ShutdownInProgress: The server is in quiesce mode and will shut down"}}
[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-20T22:01:42.762-0400 d20020| {"t":{"$date":"2020-05-20T22:01:42.762-04:00"},"s":"I",  "c":"COMMAND",  "id":51803,   "ctx":"conn16","msg":"Slow query","attr":{"type":"command","ns":"admin.$cmd","command":{"ismaster":1,"client":{"driver":{"name":"PyMongo","version":"3.10.1"},"os":{"type":"Linux","name":"Linux","architecture":"x86_64","version":"4.13.0-46-generic"},"platform":"CPython 3.7.0.final.0"},"compression":[],"$db":"admin"},"numYields":0,"ok":0,"errMsg":"The server is in quiesce mode and will shut down","errName":91,"errCode":91,"reslen":344,"locks":{},"protocol":"op_query","durationMillis":0}}
[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-20T22:01:42.762-0400 d20020| {"t":{"$date":"2020-05-20T22:01:42.762-04:00"},"s":"I",  "c":"NETWORK",  "id":22944,   "ctx":"conn16","msg":"connection ended","attr":{"remote":"127.0.0.1:43268","connectionCount":6}}

We turn off the failpoint for the find command and return to the driver:

[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-20T22:01:42.954-0400 d20020| {"t":{"$date":"2020-05-20T22:01:42.954-04:00"},"s":"D2", "c":"COMMAND",  "id":21965,   "ctx":"conn1","msg":"About to run the command","attr":{"db":"admin","commandArgs":{"configureFailPoint":"waitAfterCommandFinishesExecution","mode":"off","lsid":{"id":{"$uuid":"d3b950ee-28b2-4a9d-a155-79f15a41a761"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1590026501,"i":2}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$readPreference":{"mode":"secondaryPreferred"},"$db":"admin"}}}
[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-20T22:01:42.954-0400 d20020| {"t":{"$date":"2020-05-20T22:01:42.954-04:00"},"s":"D2", "c":"COMMAND",  "id":21961,   "ctx":"conn1","msg":"Admin only command","attr":{"command":"configureFailPoint"}}

The driver does not receive an error from the find command that returns to the driver after the ShutdownInProgress isMaster response. All future reads are directed to the secondary node d20021.

Comment by Jason Chan [ 21/May/20 ]

For this test, we expect to observe that a primary is able to continue to service reads after a stepdown and after entering quiesce mode. Once the driver receives a ShutdownInProgress error from an isMaster response, it should know to no longer target the primary for reads.

We can see the following sequence of events in our logs from local testing by increasing the log verbosity for the command component:

We send a series of reads to primary node d20020:

[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-21T00:53:07.548+0000 d20020| 2020-05-21T00:53:07.548+00:00 D2 COMMAND  21965   [conn15] "About to run the command","attr":{"db":"test","commandArgs":{"find":"state","filter":{},"limit":1,"singleBatch":true,"lsid":{"id":{"$uuid":"c1e501a8-0f37-4708-9152-a8f5d63dfe4d"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1590022387,"i":2}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"test","$readPreference":{"mode":"primaryPreferred"}}}
[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-21T00:53:07.549+0000 d20020| 2020-05-21T00:53:07.548+00:00 I  COMMAND  51803   [conn15] "Slow query","attr":{"type":"command","ns":"test.state","command":{"find":"state","filter":{},"limit":1,"singleBatch":true,"lsid":{"id":{"$uuid":"c1e501a8-0f37-4708-9152-a8f5d63dfe4d"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1590022387,"i":2}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"test","$readPreference":{"mode":"primaryPreferred"}},"planSummary":"COLLSCAN","keysExamined":0,"docsExamined":1,"cursorExhausted":true,"numYields":0,"nreturned":1,"reslen":272,"locks":{"ReplicationStateTransition":{"acquireCount":{"w":1}},"Global":{"acquireCount":{"r":1}},"Database":{"acquireCount":{"r":1}},"Collection":{"acquireCount":{"r":1}},"Mutex":{"acquireCount":{"r":1}}},"storage":{},"protocol":"op_msg","durationMillis":0}
...

Node d20020 receives a shutdown command, and proceeds to stepdown and enter quiesce mode:

[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-21T00:53:07.798+0000 d20020| 2020-05-21T00:53:07.797+00:00 I  CONTROL  23377   [SignalHandler] "got signal {sig} ({strsignal_sig})","attr":{"sig":15,"strsignal_sig":"Terminated"}
[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-21T00:53:07.798+0000 d20020| 2020-05-21T00:53:07.797+00:00 I  CONTROL  23378   [SignalHandler] "kill from pid:{si_si_pid} uid:{si_si_uid}","attr":{"si_si_pid":121827,"si_si_uid":0}
[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-21T00:53:07.798+0000 d20020| 2020-05-21T00:53:07.797+00:00 I  CONTROL  23381   [SignalHandler] "will terminate after current cmd ends"
[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-21T00:53:07.798+0000 d20020| 2020-05-21T00:53:07.798+00:00 I  REPL     21343   [RstlKillOpThread] "Starting to kill user operations"
[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-21T00:53:07.798+0000 d20020| 2020-05-21T00:53:07.798+00:00 I  REPL     21344   [RstlKillOpThread] "Stopped killing user operations"
[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-21T00:53:07.798+0000 d20020| 2020-05-21T00:53:07.798+00:00 I  REPL     21340   [RstlKillOpThread] "State transition ops metrics","attr":{"metrics":{"lastStateTransition":"stepDown","userOpsKilled":0,"userOpsRunning":1}}
[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-21T00:53:07.798+0000 d20020| 2020-05-21T00:53:07.798+00:00 I  REPL     21358   [SignalHandler] "Replica set state transition","attr":{"newState":"SECONDARY","oldState":"PRIMARY"}
[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-21T00:53:07.799+0000 d20020| 2020-05-21T00:53:07.799+00:00 I  STORAGE  4695102 [SignalHandler] "Entering quiesce mode for shutdown","attr":{"quiesceTimeMillis":4998}

We have yet to respond to the driver with a ShutdownInProgress isMaster response. The driver continues to target d20020 for reads:

[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-21T00:53:08.302+0000 d20020| 2020-05-21T00:53:08.302+00:00 D2 COMMAND  21965   [conn15] "About to run the command","attr":{"db":"test","commandArgs":{"find":"state","filter":{},"limit":1,"singleBatch":true,"lsid":{"id":{"$uuid":"c1e501a8-0f37-4708-9152-a8f5d63dfe4d"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1590022387,"i":2}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"test","$readPreference":{"mode":"primaryPreferred"}}}
[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-21T00:53:08.304+0000 d20020| 2020-05-21T00:53:08.303+00:00 I  COMMAND  51803   [conn15] "Slow query","attr":{"type":"command","ns":"test.state","command":{"find":"state","filter":{},"limit":1,"singleBatch":true,"lsid":{"id":{"$uuid":"c1e501a8-0f37-4708-9152-a8f5d63dfe4d"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1590022387,"i":2}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"test","$readPreference":{"mode":"primaryPreferred"}},"planSummary":"COLLSCAN","keysExamined":0,"docsExamined":1,"cursorExhausted":true,"numYields":0,"nreturned":1,"reslen":272,"locks":{"ReplicationStateTransition":{"acquireCount":{"w":1}},"Global":{"acquireCount":{"r":1}},"Database":{"acquireCount":{"r":1}},"Collection":{"acquireCount":{"r":1}},"Mutex":{"acquireCount":{"r":1}}},"storage":{},"protocol":"op_msg","durationMillis":0}
[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-21T00:53:08.554+0000 d20020| 2020-05-21T00:53:08.554+00:00 D2 COMMAND  21965   [conn15] "About to run the command","attr":{"db":"test","commandArgs":{"find":"state","filter":{},"limit":1,"singleBatch":true,"lsid":{"id":{"$uuid":"c1e501a8-0f37-4708-9152-a8f5d63dfe4d"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1590022387,"i":2}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"test","$readPreference":{"mode":"primaryPreferred"}}}
[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-21T00:53:08.555+0000 d20020| 2020-05-21T00:53:08.554+00:00 I  COMMAND  51803   [conn15] "Slow query","attr":{"type":"command","ns":"test.state","command":{"find":"state","filter":{},"limit":1,"singleBatch":true,"lsid":{"id":{"$uuid":"c1e501a8-0f37-4708-9152-a8f5d63dfe4d"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1590022387,"i":2}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"test","$readPreference":{"mode":"primaryPreferred"}},"planSummary":"COLLSCAN","keysExamined":0,"docsExamined":1,"cursorExhausted":true,"numYields":0,"nreturned":1,"reslen":272,"locks":{"ReplicationStateTransition":{"acquireCount":{"w":1}},"Global":{"acquireCount":{"r":1}},"Database":{"acquireCount":{"r":1}},"Collection":{"acquireCount":{"r":1}},"Mutex":{"acquireCount":{"r":1}}},"storage":{},"protocol":"op_msg","durationMillis":0}

We respond to the PyMongo driver with a ShutdownInProgress isMaster response:

[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-21T00:53:08.783+0000 d20020| 2020-05-21T00:53:08.783+00:00 D2 COMMAND  21965   [conn17] "About to run the command","attr":{"db":"admin","commandArgs":{"ismaster":1,"client":{"driver":{"name":"PyMongo","version":"3.10.1"},"os":{"type":"Linux","name":"Linux","architecture":"x86_64","version":"2.6.32-220.el6.x86_64"},"platform":"CPython 3.7.0.final.0"},"$db":"admin"}}
[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-21T00:53:08.783+0000 d20020| 2020-05-21T00:53:08.783+00:00 I  NETWORK  51800   [conn17] "client metadata","attr":{"remote":"10.122.71.235:35066","client":"conn17","doc":{"driver":{"name":"PyMongo","version":"3.10.1"},"os":{"type":"Linux","name":"Linux","architecture":"x86_64","version":"2.6.32-220.el6.x86_64"},"platform":"CPython 3.7.0.final.0"}}
[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-21T00:53:08.784+0000 d20020| 2020-05-21T00:53:08.784+00:00 D1 COMMAND  21962   [conn17] "Assertion while executing command","attr":{"command":"ismaster","db":"admin","commandArgs":{"ismaster":1,"client":{"driver":{"name":"PyMongo","version":"3.10.1"},"os":{"type":"Linux","name":"Linux","architecture":"x86_64","version":"2.6.32-220.el6.x86_64"},"platform":"CPython 3.7.0.final.0"},"$db":"admin"},"error":"ShutdownInProgress: The server is in quiesce mode and will shut down"}
[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-21T00:53:08.784+0000 d20020| 2020-05-21T00:53:08.784+00:00 I  COMMAND  51803   [conn17] "Slow query","attr":{"type":"command","ns":"admin.$cmd","command":{"ismaster":1,"client":{"driver":{"name":"PyMongo","version":"3.10.1"},"os":{"type":"Linux","name":"Linux","architecture":"x86_64","version":"2.6.32-220.el6.x86_64"},"platform":"CPython 3.7.0.final.0"},"$db":"admin"},"numYields":0,"ok":0,"errMsg":"The server is in quiesce mode and will shut down","errName":91,"errCode":91,"reslen":344,"locks":{},"protocol":"op_query","durationMillis":1}
[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-21T00:53:08.785+0000 d20020| 2020-05-21T00:53:08.784+00:00 I  NETWORK  22944   [conn17] "connection ended","attr":{"remote":"10.122.71.235:35066","connectionCount":5}

We see that all future reads are sent to Secondary node d20021 for the rest of the quiesce period.

[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-21T00:53:08.805+0000 d20021| 2020-05-21T00:53:08.805+00:00 D2 COMMAND  21965   [conn13] "About to run the command","attr":{"db":"test","commandArgs":{"find":"state","filter":{},"limit":1,"singleBatch":true,"lsid":{"id":{"$uuid":"c1e501a8-0f37-4708-9152-a8f5d63dfe4d"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1590022387,"i":2}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"test","$readPreference":{"mode":"primaryPreferred"}}}
[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-21T00:53:08.806+0000 d20021| 2020-05-21T00:53:08.806+00:00 I  COMMAND  51803   [conn13] "Slow query","attr":{"type":"command","ns":"test.state","command":{"find":"state","filter":{},"limit":1,"singleBatch":true,"lsid":{"id":{"$uuid":"c1e501a8-0f37-4708-9152-a8f5d63dfe4d"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1590022387,"i":2}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"test","$readPreference":{"mode":"primaryPreferred"}},"planSummary":"COLLSCAN","keysExamined":0,"docsExamined":1,"cursorExhausted":true,"numYields":0,"nreturned":1,"reslen":272,"locks":{"ReplicationStateTransition":{"acquireCount":{"w":1}},"Global":{"acquireCount":{"r":1}},"Database":{"acquireCount":{"r":1}},"Collection":{"acquireCount":{"r":1}},"Mutex":{"acquireCount":{"r":1}}},"storage":{},"protocol":"op_msg","durationMillis":0}
[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-21T00:53:09.057+0000 d20021| 2020-05-21T00:53:09.056+00:00 D2 COMMAND  21965   [conn13] "About to run the command","attr":{"db":"test","commandArgs":{"find":"state","filter":{},"limit":1,"singleBatch":true,"lsid":{"id":{"$uuid":"c1e501a8-0f37-4708-9152-a8f5d63dfe4d"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1590022387,"i":2}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"test","$readPreference":{"mode":"primaryPreferred"}}}
[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-21T00:53:09.057+0000 d20021| 2020-05-21T00:53:09.057+00:00 I  COMMAND  51803   [conn13] "Slow query","attr":{"type":"command","ns":"test.state","command":{"find":"state","filter":{},"limit":1,"singleBatch":true,"lsid":{"id":{"$uuid":"c1e501a8-0f37-4708-9152-a8f5d63dfe4d"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1590022387,"i":2}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"test","$readPreference":{"mode":"primaryPreferred"}},"planSummary":"COLLSCAN","keysExamined":0,"docsExamined":1,"cursorExhausted":true,"numYields":0,"nreturned":1,"reslen":272,"locks":{"ReplicationStateTransition":{"acquireCount":{"w":1}},"Global":{"acquireCount":{"r":1}},"Database":{"acquireCount":{"r":1}},"Collection":{"acquireCount":{"r":1}},"Mutex":{"acquireCount":{"r":1}}},"storage":{},"protocol":"op_msg","durationMillis":0}
[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-21T00:53:09.266+0000 d20021| 2020-05-21T00:53:09.266+00:00 D2 REPL_HB  4615670 [ReplCoord-1] "Sending heartbeat","attr":{"requestId":29,"target":"ip-10-122-71-235:20020","heartbeatObj":{"replSetHeartbeat":"quiesce_mode_test","configVersion":2,"configTerm":1,"hbv":1,"from":"ip-10-122-71-235:20021","fromId":1,"term":2,"primaryId":0}}
...

The test completes successfully with driver option retryReads set to false, indicating that all reads completed successfully without the driver ever having to retry.

Example logs here.

Generated at Thu Feb 08 05:12:55 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.