|
An additional test case we wanted to observe:
The secondary node S goes into quiesce mode while a read is still in progress. S will return a ShutdownInProgress error as an isMaster response to the driver. The find should proceed to complete and return to the driver without raising an error even if the driver has already marked S as unknown.
We can see the following by increasing the log verbosity for the command component and setting the appropriate failpoints:
Set a failpoint to hang during the find command on secondary node d20021:
[js_test:quiesce_mode_streamable_ismaster] 2020-05-20T21:45:38.743+0000 d20021| 2020-05-20T21:45:38.743+00:00 W CONTROL 23829 [conn1] "Set failpoint","attr":{"failPointName":"waitInFindBeforeMakingBatch","failPoint":{"mode":1,"data":{},"timesEntered":0}}
|
Set a failpoint to hang while in quiesce mode on node d20021:
[js_test:quiesce_mode_streamable_ismaster] 2020-05-20T21:45:38.845+0000 d20021| 2020-05-20T21:45:38.845+00:00 W CONTROL 23829 [conn1] "Set failpoint","attr":{"failPointName":"hangDuringQuiesceMode","failPoint":{"mode":1,"data":{},"timesEntered":0}}
|
Driver issues a read to node d20021 and hangs due to failpoint:
[js_test:quiesce_mode_streamable_ismaster] 2020-05-20T21:45:38.744+0000 d20021| 2020-05-20T21:45:38.744+00:00 D2 COMMAND 21965 [conn22] "About to run the command","attr":{"db":"test","commandArgs":{"find":"state","filter":{},"limit":1,"singleBatch":true,"lsid":{"id":{"$uuid":"382de285-7d34-4ebc-b2f0-9fc2458f9237"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1590011138,"i":2}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"test","$readPreference":{"mode":"secondary"}}}
|
[js_test:quiesce_mode_streamable_ismaster] 2020-05-20T21:45:38.755+0000 d20021| 2020-05-20T21:45:38.755+00:00 I QUERY 20908 [conn22] "Waiting in find before making batch for query - {cq_Short}","attr":{"cq_Short":"ns: test.state query: {} sort: {} projection: {} limit: 1"}
|
Shutdown secondary d20021 and wait for it to enter quiesce mode and hang due to failpoint:
[js_test:quiesce_mode_streamable_ismaster] 2020-05-20T21:45:38.846+0000 d20021| 2020-05-20T21:45:38.846+00:00 I CONTROL 23377 [SignalHandler] "got signal {sig} ({strsignal_sig})","attr":{"sig":15,"strsignal_sig":"Terminated"}
|
[js_test:quiesce_mode_streamable_ismaster] 2020-05-20T21:45:38.846+0000 d20021| 2020-05-20T21:45:38.846+00:00 I CONTROL 23378 [SignalHandler] "kill from pid:{si_si_pid} uid:{si_si_uid}","attr":{"si_si_pid":10146,"si_si_uid":0}
|
[js_test:quiesce_mode_streamable_ismaster] 2020-05-20T21:45:38.847+0000 d20021| 2020-05-20T21:45:38.846+00:00 I CONTROL 23381 [SignalHandler] "will terminate after current cmd ends"
|
[js_test:quiesce_mode_streamable_ismaster] 2020-05-20T21:45:38.847+0000 d20021| 2020-05-20T21:45:38.846+00:00 I STORAGE 4695101 [SignalHandler] "hangDuringQuiesceMode failpoint enabled"
|
d20021 responds to the driver isMaster request with a ShutdownInProgress error. The driver will set d20021 as UNKNOWN:
[js_test:quiesce_mode_streamable_ismaster] 2020-05-20T21:45:38.848+0000 d20021| 2020-05-20T21:45:38.848+00:00 D2 COMMAND 21965 [conn23] "About to run the command","attr":{"db":"admin","commandArgs":{"ismaster":1,"client":{"driver":{"name":"PyMongo","version":"3.11.0b1.dev0"},"os":{"type":"Linux","name":"Linux","architecture":"x86_64","version":"2.6.32-220.el6.x86_64"},"platform":"CPython 3.7.0.final.0"},"$db":"admin"}}
|
[js_test:quiesce_mode_streamable_ismaster] 2020-05-20T21:45:38.849+0000 d20021| 2020-05-20T21:45:38.848+00:00 I NETWORK 51800 [conn23] "client metadata","attr":{"remote":"10.122.77.150:40030","client":"conn23","doc":{"driver":{"name":"PyMongo","version":"3.11.0b1.dev0"},"os":{"type":"Linux","name":"Linux","architecture":"x86_64","version":"2.6.32-220.el6.x86_64"},"platform":"CPython 3.7.0.final.0"}}
|
[js_test:quiesce_mode_streamable_ismaster] 2020-05-20T21:45:38.849+0000 d20021| 2020-05-20T21:45:38.849+00:00 D1 COMMAND 21962 [conn23] "Assertion while executing command","attr":{"command":"ismaster","db":"admin","commandArgs":{"ismaster":1,"client":{"driver":{"name":"PyMongo","version":"3.11.0b1.dev0"},"os":{"type":"Linux","name":"Linux","architecture":"x86_64","version":"2.6.32-220.el6.x86_64"},"platform":"CPython 3.7.0.final.0"},"$db":"admin"},"error":"ShutdownInProgress: The server is in quiesce mode and will shut down"}
|
We disable the failpoint on d20021 and the find is now allowed to complete:
[js_test:quiesce_mode_streamable_ismaster] 2020-05-20T21:45:38.856+0000 d20021| 2020-05-20T21:45:38.856+00:00 I COMMAND 51803 [conn22] "Slow query","attr":{"type":"command","ns":"test.state","command":{"find":"state","filter":{},"limit":1,"singleBatch":true,"lsid":{"id":{"$uuid":"382de285-7d34-4ebc-b2f0-9fc2458f9237"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1590011138,"i":2}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"test","$readPreference":{"mode":"secondary"}},"planSummary":"COLLSCAN","keysExamined":0,"docsExamined":1,"cursorExhausted":true,"numYields":1,"nreturned":1,"reslen":272,"locks":{"ReplicationStateTransition":{"acquireCount":{"w":2}},"Global":{"acquireCount":{"r":2}},"Database":{"acquireCount":{"r":2}},"Collection":{"acquireCount":{"r":2}},"Mutex":{"acquireCount":{"r":1}}},"storage":{},"protocol":"op_msg","durationMillis":111}
|
[js_test:quiesce_mode_streamable_ismaster] 2020-05-20T21:45:38.857+0000 d20021| 2020-05-20T21:45:38.857+00:00 I NETWORK 22944 [conn22] "connection ended","attr":{"remote":"10.122.77.150:40029","connectionCount":7}
|
All future reads go to the other secondary as d20021 proceeds to shutdown. The test completes successfully with retryReads=false. The driver does not receive an error from the find command that returns to the driver after the ShutdownInProgress isMaster response.
|
|
The goal of this testing ticket is to show the following:
(1) A secondary node S entering quiesce mode will immediately return to all waiting isMasters with shutdownInProgress. This should prompt the driver to recognize S as unknown.
(2) A find that gets issued once the driver recognizes S is UNKNOWN should be sent to a different available secondary. We don't expect S to service any reads while in quiesce mode when using the streamable isMaster protocol. (EDIT: It is possible that we still service reads in quiesce mode if S was already servicing a read as it entered quiesce mode or if a read was issued before the driver received the isMaster response detailing the ShutdownInProgress error)
We can see the following by increasing the log verbosity for the command component:
The driver sends a series of finds to the replica set with read preference=secondary. The reads are sent to secondary node d20021:
[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-19T21:17:50.514+0000 d20021| 2020-05-19T21:17:50.514+00:00 D2 COMMAND 21965 [conn22] "About to run the command","attr":{"db":"test","commandArgs":{"find":"state","filter":{},"limit":1,"singleBatch":true,"lsid":{"id":{"$uuid":"0b8b3a94-0866-4d35-bb87-2f038bf1bdde"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1589923070,"i":2}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"test","$readPreference":{"mode":"secondary"}}}
|
[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-19T21:17:50.514+0000 d20021| 2020-05-19T21:17:50.514+00:00 I COMMAND 51803 [conn22] "Slow query","attr":{"type":"command","ns":"test.state","command":{"find":"state","filter":{},"limit":1,"singleBatch":true,"lsid":{"id":{"$uuid":"0b8b3a94-0866-4d35-bb87-2f038bf1bdde"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1589923070,"i":2}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"test","$readPreference":{"mode":"secondary"}},"planSummary":"COLLSCAN","keysExamined":0,"docsExamined":1,"cursorExhausted":true,"numYields":0,"nreturned":1,"reslen":272,"locks":{"ReplicationStateTransition":{"acquireCount":{"w":1}},"Global":{"acquireCount":{"r":1}},"Database":{"acquireCount":{"r":1}},"Collection":{"acquireCount":{"r":1}},"Mutex":{"acquireCount":{"r":1}}},"storage":{},"protocol":"op_msg","durationMillis":0}
|
[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-19T21:17:50.766+0000 d20021| 2020-05-19T21:17:50.766+00:00 D2 COMMAND 21965 [conn22] "About to run the command","attr":{"db":"test","commandArgs":{"find":"state","filter":{},"limit":1,"singleBatch":true,"lsid":{"id":{"$uuid":"0b8b3a94-0866-4d35-bb87-2f038bf1bdde"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1589923070,"i":2}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"test","$readPreference":{"mode":"secondary"}}}
|
[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-19T21:17:50.766+0000 d20021| 2020-05-19T21:17:50.766+00:00 I COMMAND 51803 [conn22] "Slow query","attr":{"type":"command","ns":"test.state","command":{"find":"state","filter":{},"limit":1,"singleBatch":true,"lsid":{"id":{"$uuid":"0b8b3a94-0866-4d35-bb87-2f038bf1bdde"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1589923070,"i":2}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"test","$readPreference":{"mode":"secondary"}},"planSummary":"COLLSCAN","keysExamined":0,"docsExamined":1,"cursorExhausted":true,"numYields":0,"nreturned":1,"reslen":272,"locks":{"ReplicationStateTransition":{"acquireCount":{"w":1}},"Global":{"acquireCount":{"r":1}},"Database":{"acquireCount":{"r":1}},"Collection":{"acquireCount":{"r":1}},"Mutex":{"acquireCount":{"r":1}}},"storage":{},"protocol":"op_msg","durationMillis":0}
|
...
|
Secondary node d20021 enters quiesce mode and immediately returns a ShutdownInProgress error to the waiting isMaster request from the driver:
[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-19T21:17:50.938+0000 d20021| 2020-05-19T21:17:50.938+00:00 I STORAGE 4695102 [SignalHandler] "Entering quiesce mode for shutdown","attr":{"quiesceTimeMillis":5000}
|
[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-19T21:17:50.941+0000 d20021| 2020-05-19T21:17:50.941+00:00 D1 COMMAND 21962 [conn20] "Assertion while executing command","attr":{"command":"ismaster","db":"admin","commandArgs":{"ismaster":1,"topologyVersion":{"processId":{"$oid":"5ec44cf109834d95ff232356"},"counter":4},"maxAwaitTimeMS":1000,"$clusterTime":{"clusterTime":{"$timestamp":{"t":1589923059,"i":2}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"admin","$readPreference":{"mode":"primary"}},"error":"ShutdownInProgress: The server is in quiesce mode and will shut down"}
|
[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-19T21:17:50.941+0000 d20021| 2020-05-19T21:17:50.941+00:00 I COMMAND 51803 [conn20] "Slow query","attr":{"type":"command","ns":"admin.$cmd","command":{"ismaster":1,"topologyVersion":{"processId":{"$oid":"5ec44cf109834d95ff232356"},"counter":4},"maxAwaitTimeMS":1000,"$clusterTime":{"clusterTime":{"$timestamp":{"t":1589923059,"i":2}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"admin","$readPreference":{"mode":"primary"}},"numYields":0,"ok":0,"errMsg":"The server is in quiesce mode and will shut down","errName":91,"errCode":91,"reslen":329,"locks":{},"protocol":"op_msg","durationMillis":714}
|
[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-19T21:17:50.941+0000 d20021| 2020-05-19T21:17:50.941+00:00 I NETWORK 22944 [conn20] "connection ended","attr":{"remote":"10.122.79.102:42175","connectionCount":8}
|
[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-19T21:17:50.942+0000 d20021| 2020-05-19T21:17:50.942+00:00 I NETWORK 22944 [conn22] "connection ended","attr":{"remote":"10.122.79.102:42180","connectionCount":7}
|
[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-19T21:17:50.942+0000 d20021| 2020-05-19T21:17:50.942+00:00 I NETWORK 22943 [listener] "connection accepted","attr":{"remote":"10.122.79.102:42181","sessionId":23,"connectionCount":8}
|
[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-19T21:17:50.943+0000 d20021| 2020-05-19T21:17:50.943+00:00 D2 COMMAND 21965 [conn23] "About to run the command","attr":{"db":"admin","commandArgs":{"ismaster":1,"client":{"driver":{"name":"PyMongo","version":"3.11.0b1.dev0"},"os":{"type":"Linux","name":"Linux","architecture":"x86_64","version":"2.6.32-220.el6.x86_64"},"platform":"CPython 3.7.0.final.0"},"$db":"admin"}}
|
[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-19T21:17:50.943+0000 d20021| 2020-05-19T21:17:50.943+00:00 I NETWORK 51800 [conn23] "client metadata","attr":{"remote":"10.122.79.102:42181","client":"conn23","doc":{"driver":{"name":"PyMongo","version":"3.11.0b1.dev0"},"os":{"type":"Linux","name":"Linux","architecture":"x86_64","version":"2.6.32-220.el6.x86_64"},"platform":"CPython 3.7.0.final.0"}}
|
[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-19T21:17:50.945+0000 d20021| 2020-05-19T21:17:50.945+00:00 D1 COMMAND 21962 [conn23] "Assertion while executing command","attr":{"command":"ismaster","db":"admin","commandArgs":{"ismaster":1,"client":{"driver":{"name":"PyMongo","version":"3.11.0b1.dev0"},"os":{"type":"Linux","name":"Linux","architecture":"x86_64","version":"2.6.32-220.el6.x86_64"},"platform":"CPython 3.7.0.final.0"},"$db":"admin"},"error":"ShutdownInProgress: The server is in quiesce mode and will shut down"}
|
[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-19T21:17:50.945+0000 d20021| 2020-05-19T21:17:50.945+00:00 I COMMAND 51803 [conn23] "Slow query","attr":{"type":"command","ns":"admin.$cmd","command":{"ismaster":1,"client":{"driver":{"name":"PyMongo","version":"3.11.0b1.dev0"},"os":{"type":"Linux","name":"Linux","architecture":"x86_64","version":"2.6.32-220.el6.x86_64"},"platform":"CPython 3.7.0.final.0"},"$db":"admin"},"numYields":0,"ok":0,"errMsg":"The server is in quiesce mode and will shut down","errName":91,"errCode":91,"reslen":344,"locks":{},"protocol":"op_query","durationMillis":2}
|
We see that reads are now being sent to secondary node d20022. d20021 does not service any reads during quiesce mode:
[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-19T21:17:51.270+0000 d20022| 2020-05-19T21:17:51.270+00:00 D2 COMMAND 21965 [conn18] "About to run the command","attr":{"db":"test","commandArgs":{"find":"state","filter":{},"limit":1,"singleBatch":true,"lsid":{"id":{"$uuid":"0b8b3a94-0866-4d35-bb87-2f038bf1bdde"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1589923070,"i":2}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"test","$readPreference":{"mode":"secondary"}}}
|
[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-19T21:17:51.271+0000 d20022| 2020-05-19T21:17:51.271+00:00 I COMMAND 51803 [conn18] "Slow query","attr":{"type":"command","ns":"test.state","command":{"find":"state","filter":{},"limit":1,"singleBatch":true,"lsid":{"id":{"$uuid":"0b8b3a94-0866-4d35-bb87-2f038bf1bdde"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1589923070,"i":2}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"test","$readPreference":{"mode":"secondary"}},"planSummary":"COLLSCAN","keysExamined":0,"docsExamined":1,"cursorExhausted":true,"numYields":0,"nreturned":1,"reslen":272,"locks":{"ReplicationStateTransition":{"acquireCount":{"w":1}},"Global":{"acquireCount":{"r":1}},"Database":{"acquireCount":{"r":1}},"Collection":{"acquireCount":{"r":1}},"Mutex":{"acquireCount":{"r":1}}},"storage":{},"protocol":"op_msg","durationMillis":0}
|
[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-19T21:17:51.522+0000 d20022| 2020-05-19T21:17:51.522+00:00 D2 COMMAND 21965 [conn18] "About to run the command","attr":{"db":"test","commandArgs":{"find":"state","filter":{},"limit":1,"singleBatch":true,"lsid":{"id":{"$uuid":"0b8b3a94-0866-4d35-bb87-2f038bf1bdde"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1589923070,"i":2}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"test","$readPreference":{"mode":"secondary"}}}
|
[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-19T21:17:51.522+0000 d20022| 2020-05-19T21:17:51.522+00:00 I COMMAND 51803 [conn18] "Slow query","attr":{"type":"command","ns":"test.state","command":{"find":"state","filter":{},"limit":1,"singleBatch":true,"lsid":{"id":{"$uuid":"0b8b3a94-0866-4d35-bb87-2f038bf1bdde"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1589923070,"i":2}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"test","$readPreference":{"mode":"secondary"}},"planSummary":"COLLSCAN","keysExamined":0,"docsExamined":1,"cursorExhausted":true,"numYields":0,"nreturned":1,"reslen":272,"locks":{"ReplicationStateTransition":{"acquireCount":{"w":1}},"Global":{"acquireCount":{"r":1}},"Database":{"acquireCount":{"r":1}},"Collection":{"acquireCount":{"r":1}},"Mutex":{"acquireCount":{"r":1}}},"storage":{},"protocol":"op_msg","durationMillis":0}
|
...
|
The test completes successfully with driver option retryReads set to false, indicating that all reads completed successfully without the driver ever having to retry. With this, the test is able to show that goals (1) and (2) are achieved.
Example logs here.
|