|
For this testing ticket and ones similar, we decided to perform local testing and to not commit any of the work as there is currently a race condition in Drivers(DRIVERS-1262) that prevents us from being able to eliminate retries completely. Instead, the goal of this testing ticket is to show two things:
(1) A find should still complete successfully when issued to a secondary S while in quiesce mode (the driver has yet to recognize S to be in the UNKNOWN state).
(2) A find that gets issued once the driver recognizes S is UNKNOWN should be sent to a different available secondary.
We can see the following sequence of events in our logs from local testing by increasing the log verbosity for the command component:
Secondary node d20021 enters quiesce mode:
[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-18T23:34:54.733+0000 d20021| 2020-05-18T23:34:54.733+00:00 I STORAGE 4695102 [SignalHandler] "Entering quiesce mode for shutdown","attr":{"quiesceTimeMillis":5000}
|
d20021 continues to service reads once in quiesce mode:
[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-18T23:34:55.325+0000 d20021| 2020-05-18T23:34:55.325+00:00 D2 COMMAND 21965 [conn20] "About to run the command","attr":{"db":"test","commandArgs":{"find":"state","filter":{},"limit":1,"singleBatch":true,"lsid":{"id":{"$uuid":"19665e22-f14e-44f4-b1cf-a15309cfecd4"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1589844894,"i":2}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"test","$readPreference":{"mode":"secondary"}}}
|
[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-18T23:34:55.326+0000 d20021| 2020-05-18T23:34:55.325+00:00 I COMMAND 51803 [conn20] "Slow query","attr":{"type":"command","ns":"test.state","command":{"find":"state","filter":{},"limit":1,"singleBatch":true,"lsid":{"id":{"$uuid":"19665e22-f14e-44f4-b1cf-a15309cfecd4"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1589844894,"i":2}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"test","$readPreference":{"mode":"secondary"}},"planSummary":"COLLSCAN","keysExamined":0,"docsExamined":1,"cursorExhausted":true,"numYields":0,"nreturned":1,"reslen":272,"locks":{"ReplicationStateTransition":{"acquireCount":{"w":1}},"Global":{"acquireCount":{"r":1}},"Database":{"acquireCount":{"r":1}},"Collection":{"acquireCount":{"r":1}},"Mutex":{"acquireCount":{"r":1}}},"storage":{},"protocol":"op_msg","durationMillis":0}
|
…
|
Secondary d20021 Returns a ShutdownInProgress error to the driver:
[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-18T23:34:56.540+0000 d20021| 2020-05-18T23:34:56.540+00:00 D2 COMMAND 21965 [conn21] "About to run the command","attr":{"db":"admin","commandArgs":{"ismaster":1,"client":{"driver":{"name":"PyMongo","version":"3.10.1"},"os":{"type":"Linux","name":"Linux","architecture":"x86_64","version":"2.6.32-220.el6.x86_64"},"platform":"CPython 3.7.0.final.0"},"$db":"admin"}}
|
[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-18T23:34:56.540+0000 d20021| 2020-05-18T23:34:56.540+00:00 I NETWORK 51800 [conn21] "client metadata","attr":{"remote":"10.122.11.183:43715","client":"conn21","doc":{"driver":{"name":"PyMongo","version":"3.10.1"},"os":{"type":"Linux","name":"Linux","architecture":"x86_64","version":"2.6.32-220.el6.x86_64"},"platform":"CPython 3.7.0.final.0"}}
|
[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-18T23:34:56.541+0000 d20021| 2020-05-18T23:34:56.540+00:00 D1 COMMAND 21962 [conn21] "Assertion while executing command","attr":{"command":"ismaster","db":"admin","commandArgs":{"ismaster":1,"client":{"driver":{"name":"PyMongo","version":"3.10.1"},"os":{"type":"Linux","name":"Linux","architecture":"x86_64","version":"2.6.32-220.el6.x86_64"},"platform":"CPython 3.7.0.final.0"},"$db":"admin"},"error":"ShutdownInProgress: The server is in quiesce mode and will shut down"}
|
[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-18T23:34:56.541+0000 d20021| 2020-05-18T23:34:56.541+00:00 I COMMAND 51803 [conn21] "Slow query","attr":{"type":"command","ns":"admin.$cmd","command":{"ismaster":1,"client":{"driver":{"name":"PyMongo","version":"3.10.1"},"os":{"type":"Linux","name":"Linux","architecture":"x86_64","version":"2.6.32-220.el6.x86_64"},"platform":"CPython 3.7.0.final.0"},"$db":"admin"},"numYields":0,"ok":0,"errMsg":"The server is in quiesce mode and will shut down","errName":91,"errCode":91,"reslen":344,"locks":{},"protocol":"op_query","durationMillis":0}
|
We see that the finds are now being sent to the other secondary d20022 and we don't see d20021 service anymore as it transitions out of quiesce mode:
[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-18T23:34:56.588+0000 d20022| 2020-05-18T23:34:56.588+00:00 D2 COMMAND 21965 [conn16] "About to run the command","attr":{"db":"test","commandArgs":{"find":"state","filter":{},"limit":1,"singleBatch":true,"lsid":{"id":{"$uuid":"19665e22-f14e-44f4-b1cf-a15309cfecd4"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1589844894,"i":2}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"test","$readPreference":{"mode":"secondary"}}}
|
[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-18T23:34:56.589+0000 d20022| 2020-05-18T23:34:56.588+00:00 I COMMAND 51803 [conn16] "Slow query","attr":{"type":"command","ns":"test.state","command":{"find":"state","filter":{},"limit":1,"singleBatch":true,"lsid":{"id":{"$uuid":"19665e22-f14e-44f4-b1cf-a15309cfecd4"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1589844894,"i":2}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"test","$readPreference":{"mode":"secondary"}},"planSummary":"COLLSCAN","keysExamined":0,"docsExamined":1,"cursorExhausted":true,"numYields":0,"nreturned":1,"reslen":272,"locks":{"ReplicationStateTransition":{"acquireCount":{"w":1}},"Global":{"acquireCount":{"r":1}},"Database":{"acquireCount":{"r":1}},"Collection":{"acquireCount":{"r":1}},"Mutex":{"acquireCount":{"r":1}}},"storage":{},"protocol":"op_msg","durationMillis":0}
|
[js_test:quiesce_mode_non_streamable_ismaster] 2020-05-18T23:34:56.840+0000 d20022| 2020-05-18T23:34:56.840+00:00 D2 COMMAND 21965 [conn16] "About to run the command","attr":{"db":"test","commandArgs":{"find":"state","filter":{},"limit":1,"singleBatch":true,"lsid":{"id":{"$uuid":"19665e22-f14e-44f4-b1cf-a15309cfecd4"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1589844894,"i":2}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"test","$readPreference":{"mode":"secondary"}}}
|
...
|
The test completes successfully with driver option retryReads set to false, indicating that all reads completed successfully without the driver ever having to retry. With this, the test is able to show that goals (1) and (2) are achieved.
Example logs here.
|