|
Similarly, we expect that if a mongos enters quiesce mode while a read is still in progress, the read should return successfully to the driver even if the driver has already marked the node as UNKNOWN.
Our observations for this test case in a sharded cluster with two mongos nodes:
We set two failpoints - one to hang a read on the mongos, and one to cause the same mongos to hang once it enters quiesce mode:
[js_test:mongos_quiesce_mode_non_streamable_ismaster_hanging_read] 2020-05-27T19:32:09.056-0400 s20024| 2020-05-27T19:32:09.056-04:00 W CONTROL 23829 [conn12] "Set failpoint","attr":{"failPointName":"waitInFindBeforeMakingBatch","failPoint":{"mode":1,"data":{},"timesEntered":0}}
|
[js_test:mongos_quiesce_mode_non_streamable_ismaster_hanging_read] 2020-05-27T19:32:09.057-0400 s20024| 2020-05-27T19:32:09.056-04:00 I COMMAND 51803 [conn12] "Slow query","attr":{"type":"command","ns":"admin.waitInFindBeforeMakingBatch","appName":"MongoDB Shell","command":{"configureFailPoint":"waitInFindBeforeMakingBatch","mode":"alwaysOn","data":{},"lsid":{"id":{"$uuid":"f0d038a1-93c4-4185-8761-243cdb41b7cb"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1590622328,"i":7}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"admin"},"numYields":0,"reslen":174,"protocol":"op_msg","durationMillis":0}
|
...
|
[js_test:mongos_quiesce_mode_non_streamable_ismaster_hanging_read] 2020-05-27T19:32:09.262-0400 s20024| 2020-05-27T19:32:09.261-04:00 W CONTROL 23829 [conn12] "Set failpoint","attr":{"failPointName":"hangDuringQuiesceMode","failPoint":{"mode":1,"data":{},"timesEntered":0}}
|
[js_test:mongos_quiesce_mode_non_streamable_ismaster_hanging_read] 2020-05-27T19:32:09.262-0400 s20024| 2020-05-27T19:32:09.261-04:00 I COMMAND 51803 [conn12] "Slow query","attr":{"type":"command","ns":"admin.hangDuringQuiesceMode","appName":"MongoDB Shell","command":{"configureFailPoint":"hangDuringQuiesceMode","mode":"alwaysOn","data":{},"lsid":{"id":{"$uuid":"f0d038a1-93c4-4185-8761-243cdb41b7cb"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1590622328,"i":7}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"admin"},"numYields":0,"reslen":174,"protocol":"op_msg","durationMillis":0}
|
Wait for the find to hang on mongos node s20024:
[js_test:mongos_quiesce_mode_non_streamable_ismaster_hanging_read] 2020-05-27T19:32:09.225-0400 s20024| 2020-05-27T19:32:09.224-04:00 I QUERY 20908 [conn17] "Waiting in find before making batch for query - {cq_Short}","attr":{"cq_Short":"ns: test.state query: {} sort: {} projection: {} limit: 1"}
|
Call shutdown on mongos node s20024 and wait for it to hang while in quiesce mode:
[js_test:mongos_quiesce_mode_non_streamable_ismaster_hanging_read] 2020-05-27T19:32:09.267-0400 s20024| 2020-05-27T19:32:09.266-04:00 I CONTROL 23377 [SignalHandler] "got signal {sig} ({strsignal_sig})","attr":{"sig":15,"strsignal_sig":"Terminated"}
|
[js_test:mongos_quiesce_mode_non_streamable_ismaster_hanging_read] 2020-05-27T19:32:09.267-0400 s20024| 2020-05-27T19:32:09.266-04:00 I CONTROL 23378 [SignalHandler] "kill from pid:{si_si_pid} uid:{si_si_uid}","attr":{"si_si_pid":9732,"si_si_uid":1000}
|
[js_test:mongos_quiesce_mode_non_streamable_ismaster_hanging_read] 2020-05-27T19:32:09.267-0400 s20024| 2020-05-27T19:32:09.266-04:00 I CONTROL 23381 [SignalHandler] "will terminate after current cmd ends"
|
[js_test:mongos_quiesce_mode_non_streamable_ismaster_hanging_read] 2020-05-27T19:32:09.267-0400 s20024| 2020-05-27T19:32:09.266-04:00 I COMMAND 4695700 [SignalHandler] "hangDuringQuiesceMode failpoint enabled"
|
[js_test:mongos_quiesce_mode_non_streamable_ismaster_hanging_read] 2020-05-27T19:32:09.267-0400 s20024| 2020-05-27T19:32:09.267-04:00 I COMMAND 51803 [conn12] "Slow query","attr":{"type":"command","ns":"admin.","appName":"MongoDB Shell","command":{"waitForFailPoint":"hangDuringQuiesceMode","timesEntered":1,"maxTimeMS":300000,"lsid":{"id":{"$uuid":"f0d038a1-93c4-4185-8761-243cdb41b7cb"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1590622328,"i":7}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"admin"},"numYields":0,"reslen":163,"protocol":"op_msg","durationMillis":0}
|
Wait for mongos to reply to the driver with a ShutdownInProgress isMaster response:
[js_test:mongos_quiesce_mode_non_streamable_ismaster_hanging_read] 2020-05-27T19:32:10.852-0400 s20024| 2020-05-27T19:32:10.852-04:00 D1 - 23074 [conn16] "User assertion","attr":{"error":"ShutdownInProgress: Mongos is in quiesce mode and will shut down","file":"src/mongo/s/mongos_topology_coordinator.cpp","line":117}
|
[js_test:mongos_quiesce_mode_non_streamable_ismaster_hanging_read] 2020-05-27T19:32:10.853-0400 s20024| 2020-05-27T19:32:10.852-04:00 D1 SHARDING 22772 [conn16] "Exception thrown while processing command","attr":{"db":"admin","headerId":1350490027,"error":"ShutdownInProgress: Mongos is in quiesce mode and will shut down"}
|
[js_test:mongos_quiesce_mode_non_streamable_ismaster_hanging_read] 2020-05-27T19:32:10.853-0400 s20024| 2020-05-27T19:32:10.852-04:00 I COMMAND 51803 [conn16] "Slow query","attr":{"type":"command","ns":"admin.$cmd","command":{"ismaster":1,"$clusterTime":{"clusterTime":{"$timestamp":{"t":1590622328,"i":7}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"admin","$readPreference":{"mode":"primary"}},"numYields":0,"ok":0,"errMsg":"Mongos is in quiesce mode and will shut down","errName":91,"errCode":91,"reslen":325,"protocol":"op_msg","durationMillis":0}
|
We disable the failpoint to allow the hanging read to finish:
[js_test:mongos_quiesce_mode_non_streamable_ismaster_hanging_read] 2020-05-27T19:32:14.278-0400 s20024| 2020-05-27T19:32:14.278-04:00 I COMMAND 51803 [conn17] "Slow query","attr":{"type":"command","ns":"test.state","command":{"find":"state","filter":{},"limit":1,"singleBatch":true,"lsid":{"id":{"$uuid":"d73ff390-6f73-44fe-bb49-daf53853fc18"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1590622328,"i":7}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"test","$readPreference":{"mode":"primary"}},"nShards":1,"cursorExhausted":true,"numYields":0,"nreturned":1,"reslen":272,"protocol":"op_msg","durationMillis":5064}
|
Future reads are sent to the other mongos node s20025. The logs do not show node s20024 receiving anymore reads for the rest of quiesce mode:
[js_test:mongos_quiesce_mode_non_streamable_ismaster_hanging_read] 2020-05-27T19:32:14.532-0400 s20025| 2020-05-27T19:32:14.531-04:00 I COMMAND 51803 [conn15] "Slow query","attr":{"type":"command","ns":"test.state","command":{"find":"state","filter":{},"limit":1,"singleBatch":true,"lsid":{"id":{"$uuid":"d73ff390-6f73-44fe-bb49-daf53853fc18"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1590622328,"i":7}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"test","$readPreference":{"mode":"primary"}},"nShards":1,"cursorExhausted":true,"numYields":0,"nreturned":1,"reslen":269,"protocol":"op_msg","durationMillis":1}
|
...
|
The test completes successfully with driver option retryReads set to false, indicating that all reads completed successfully without the driver ever having to retry.
Example logs for this test case: mongos_quiesce_mode_non_streamable_ismaster_with_hanging_read.txt
|
|
For this test, we expect to observe that as a mongos enters quiesce mode, it should still be able to service short-running reads and writes. Once the mongos returns a ShutdownInProgress error to the driver, the driver should know to no longer target the shutting down mongos for future operations.
Our observations for this test case in a sharded cluster with two mongos nodes:
mongos node s20025 receives a series of read and write operations:
[js_test:mongos_quiesce_mode_non_streamable_ismaster] 2020-05-28T00:15:26.596-0400 s20025| 2020-05-28T00:15:26.595-04:00 I COMMAND 51803 [conn16] "Slow query","attr":{"type":"command","ns":"test.state","command":{"update":"state","ordered":true,"lsid":{"id":{"$uuid":"fa1fa44a-1501-4e63-afe5-ed285e530c98"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1590639326,"i":2}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"test","$readPreference":{"mode":"primary"}},"nShards":1,"nMatched":1,"nModified":1,"numYields":0,"reslen":185,"protocol":"op_msg","durationMillis":1}{"mode":"primary"}},"nShards":1,"nMatched":1,"nModified":1,"numYields":0,"reslen":185,"protocol":"op_msg","durationMillis":1}
|
...
|
[js_test:mongos_quiesce_mode_non_streamable_ismaster] 2020-05-28T00:15:26.597-0400 s20025| 2020-05-28T00:15:26.597-04:00 I COMMAND 51803 [conn16] "Slow query","attr":{"type":"command","ns":"test.state","command":{"find":"state","filter":{},"limit":1,"singleBatch":true,"lsid":{"id":{"$uuid":"fa1fa44a-1501-4e63-afe5-ed285e530c98"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1590639326,"i":3}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"test","$readPreference":{"mode":"primary"}},"nShards":1,"cursorExhausted":true,"numYields":0,"nreturned":1,"reslen":279,"protocol":"op_msg","durationMillis":0}
|
mongos node s20025 receives a shutdown command and enters quiesce mode:
[js_test:mongos_quiesce_mode_non_streamable_ismaster] 2020-05-28T00:15:26.791-0400 s20025| 2020-05-28T00:15:26.790-04:00 I CONTROL 23377 [SignalHandler] "got signal {sig} ({strsignal_sig})","attr":{"sig":15,"strsignal_sig":"Terminated"}
|
[js_test:mongos_quiesce_mode_non_streamable_ismaster] 2020-05-28T00:15:26.791-0400 s20025| 2020-05-28T00:15:26.790-04:00 I CONTROL 23378 [SignalHandler] "kill from pid:{si_si_pid} uid:{si_si_uid}","attr":{"si_si_pid":18828,"si_si_uid":1000}
|
[js_test:mongos_quiesce_mode_non_streamable_ismaster] 2020-05-28T00:15:26.791-0400 s20025| 2020-05-28T00:15:26.790-04:00 I CONTROL 23381 [SignalHandler] "will terminate after current cmd ends"
|
[js_test:mongos_quiesce_mode_non_streamable_ismaster] 2020-05-28T00:15:26.791-0400 s20025| 2020-05-28T00:15:26.791-04:00 I COMMAND 4695701 [SignalHandler] "Entering quiesce mode for mongos shutdown","attr":{"quiesceTimeMillis":5000}
|
mongos node s20025 is still able to service operations while in quiesce mode:
[js_test:mongos_quiesce_mode_non_streamable_ismaster] 2020-05-28T00:15:26.851-0400 s20025| 2020-05-28T00:15:26.850-04:00 I COMMAND 51803 [conn16] "Slow query","attr":{"type":"command","ns":"test.state","command":{"update":"state","ordered":true,"lsid":{"id":{"$uuid":"fa1fa44a-1501-4e63-afe5-ed285e530c98"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1590639326,"i":3}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"test","$readPreference":{"mode":"primary"}},"nShards":1,"nMatched":1,"nModified":1,"numYields":0,"reslen":185,"protocol":"op_msg","durationMillis":1}
|
…
|
[js_test:mongos_quiesce_mode_non_streamable_ismaster] 2020-05-28T00:15:26.853-0400 s20025| 2020-05-28T00:15:26.853-04:00 I COMMAND 51803 [conn16] "Slow query","attr":{"type":"command","ns":"test.state","command":{"find":"state","filter":{},"limit":1,"singleBatch":true,"lsid":{"id":{"$uuid":"fa1fa44a-1501-4e63-afe5-ed285e530c98"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1590639326,"i":4}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"test","$readPreference":{"mode":"primary"}},"nShards":1,"cursorExhausted":true,"numYields":0,"nreturned":1,"reslen":279,"protocol":"op_msg","durationMillis":1}
|
mongos node s20025 eventually replies to the driver with a ShutdownInProgress isMaster response:
[js_test:mongos_quiesce_mode_non_streamable_ismaster] 2020-05-28T00:15:27.367-0400 s20025| 2020-05-28T00:15:27.367-04:00 I NETWORK 51800 [conn21] "client metadata","attr":{"remote":"127.0.0.1:44210","client":"conn21","doc":{"driver":{"name":"PyMongo","version":"3.10.1"},"os":{"type":"Linux","name":"Linux","architecture":"x86_64","version":"4.13.0-46-generic"},"platform":"CPython 3.7.0.final.0"}}
|
[js_test:mongos_quiesce_mode_non_streamable_ismaster] 2020-05-28T00:15:27.367-0400 s20025| 2020-05-28T00:15:27.367-04:00 D1 - 23074 [conn21] "User assertion","attr":{"error":"ShutdownInProgress: Mongos is in quiesce mode and will shut down","file":"src/mongo/s/mongos_topology_coordinator.cpp","line":117}
|
[js_test:mongos_quiesce_mode_non_streamable_ismaster] 2020-05-28T00:15:27.368-0400 s20025| 2020-05-28T00:15:27.367-04:00 D1 SHARDING 22772 [conn21] "Exception thrown while processing command","attr":{"db":"admin","headerId":1141616124,"error":"ShutdownInProgress: Mongos is in quiesce mode and will shut down"}
|
[js_test:mongos_quiesce_mode_non_streamable_ismaster] 2020-05-28T00:15:27.368-0400 s20025| 2020-05-28T00:15:27.367-04:00 I COMMAND 51803 [conn21] "Slow query","attr":{"type":"command","ns":"admin.$cmd","command":{"ismaster":1,"client":{"driver":{"name":"PyMongo","version":"3.10.1"},"os":{"type":"Linux","name":"Linux","architecture":"x86_64","version":"4.13.0-46-generic"},"platform":"CPython 3.7.0.final.0"},"$db":"admin"},"numYields":0,"ok":0,"errMsg":"Mongos is in quiesce mode and will shut down","errName":91,"errCode":91,"reslen":340,"protocol":"op_query","durationMillis":0}
|
We see that all future operations are sent to the other mongos node s20024:
[js_test:mongos_quiesce_mode_non_streamable_ismaster] 2020-05-28T00:15:27.623-0400 s20024| 2020-05-28T00:15:27.622-04:00 I COMMAND 51803 [conn17] "Slow query","attr":{"type":"command","ns":"test.state","command":{"update":"state","ordered":true,"lsid":{"id":{"$uuid":"fa1fa44a-1501-4e63-afe5-ed285e530c98"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1590639327,"i":2}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"test","$readPreference":{"mode":"primary"}},"nShards":1,"nMatched":1,"nModified":1,"numYields":0,"reslen":185,"protocol":"op_msg","durationMillis":4}
|
...
|
[js_test:mongos_quiesce_mode_non_streamable_ismaster] 2020-05-28T00:15:27.625-0400 s20024| 2020-05-28T00:15:27.625-04:00 I COMMAND 51803 [conn17] "Slow query","attr":{"type":"command","ns":"test.state","command":{"find":"state","filter":{},"limit":1,"singleBatch":true,"lsid":{"id":{"$uuid":"fa1fa44a-1501-4e63-afe5-ed285e530c98"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1590639327,"i":3}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"test","$readPreference":{"mode":"primary"}},"nShards":1,"cursorExhausted":true,"numYields":0,"nreturned":1,"reslen":279,"protocol":"op_msg","durationMillis":1}
|
The test completes successfully with both driver options retryReads and retryWrites set to false, indicating that all operations completed successfully without the driver ever having to retry.
Example logs for this test case: mongos_quiesce_mode_non_streamable_ismaster.txt
|