|
Similarly, we expect that if a mongos enters quiesce mode while a read is still in progress, the read should return successfully to the driver even if the driver has already marked the node as UNKNOWN.
Our observations for this test case in a sharded cluster with two mongos nodes:
We set two failpoints - one to hang a read on the mongos, and one to cause the same mongos to hang once it enters quiesce mode:
[js_test:mongos_quiesce_mode_streamable_ismaster_with_hanging_read] 2020-05-27T20:12:15.988-0400 s20024| 2020-05-27T20:12:15.988-04:00 W CONTROL 23829 [conn12] "Set failpoint","attr":{"failPointName":"waitInFindBeforeMakingBatch","failPoint":{"mode":1,"data":{},"timesEntered":0}}
|
[js_test:mongos_quiesce_mode_streamable_ismaster_with_hanging_read] 2020-05-27T20:12:15.989-0400 s20024| 2020-05-27T20:12:15.988-04:00 I COMMAND 51803 [conn12] "Slow query","attr":{"type":"command","ns":"admin.waitInFindBeforeMakingBatch","appName":"MongoDB Shell","command":{"configureFailPoint":"waitInFindBeforeMakingBatch","mode":"alwaysOn","data":{},"lsid":{"id":{"$uuid":"aeadc76f-41a5-43d4-ad86-9f38f8db4351"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1590624735,"i":7}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"admin"},"numYields":0,"reslen":174,"protocol":"op_msg","durationMillis":0}
|
...
|
[js_test:mongos_quiesce_mode_streamable_ismaster_with_hanging_read] 2020-05-27T20:12:16.093-0400 s20024| 2020-05-27T20:12:16.092-04:00 W CONTROL 23829 [conn12] "Set failpoint","attr":{"failPointName":"hangDuringQuiesceMode","failPoint":{"mode":1,"data":{},"timesEntered":0}}
|
[js_test:mongos_quiesce_mode_streamable_ismaster_with_hanging_read] 2020-05-27T20:12:16.093-0400 s20024| 2020-05-27T20:12:16.093-04:00 I COMMAND 51803 [conn12] "Slow query","attr":{"type":"command","ns":"admin.hangDuringQuiesceMode","appName":"MongoDB Shell","command":{"configureFailPoint":"hangDuringQuiesceMode","mode":"alwaysOn","data":{},"lsid":{"id":{"$uuid":"aeadc76f-41a5-43d4-ad86-9f38f8db4351"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1590624735,"i":7}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"admin"},"numYields":0,"reslen":174,"protocol":"op_msg","durationMillis":0}
|
Wait for the read to hang on mongos node s20024:
[js_test:mongos_quiesce_mode_streamable_ismaster_with_hanging_read] 2020-05-27T20:12:16.074-0400 s20024| 2020-05-27T20:12:16.074-04:00 I QUERY 20908 [conn16] "Waiting in find before making batch for query - {cq_Short}","attr":{"cq_Short":"ns: test.state query: {} sort: {} projection: {} limit: 1"}
|
Call shutdown on mongos node s20024 and wait for it to hang in quiesce mode:
[js_test:mongos_quiesce_mode_streamable_ismaster_with_hanging_read] 2020-05-27T20:12:16.100-0400 s20024| 2020-05-27T20:12:16.099-04:00 I CONTROL 23377 [SignalHandler] "got signal {sig} ({strsignal_sig})","attr":{"sig":15,"strsignal_sig":"Terminated"}
|
[js_test:mongos_quiesce_mode_streamable_ismaster_with_hanging_read] 2020-05-27T20:12:16.100-0400 s20024| 2020-05-27T20:12:16.099-04:00 I CONTROL 23378 [SignalHandler] "kill from pid:{si_si_pid} uid:{si_si_uid}","attr":{"si_si_pid":13110,"si_si_uid":1000}
|
[js_test:mongos_quiesce_mode_streamable_ismaster_with_hanging_read] 2020-05-27T20:12:16.100-0400 s20024| 2020-05-27T20:12:16.099-04:00 I CONTROL 23381 [SignalHandler] "will terminate after current cmd ends"
|
[js_test:mongos_quiesce_mode_streamable_ismaster_with_hanging_read] 2020-05-27T20:12:16.101-0400 s20024| 2020-05-27T20:12:16.099-04:00 I COMMAND 4695700 [SignalHandler] "hangDuringQuiesceMode failpoint enabled"
|
The node s20024 returns a ShutdownInProgress isMaster response to the driver immediately:
[js_test:mongos_quiesce_mode_streamable_ismaster_with_hanging_read] 2020-05-27T20:12:16.102-0400 s20024| 2020-05-27T20:12:16.102-04:00 I NETWORK 51800 [conn21] "client metadata","attr":{"remote":"127.0.0.1:35914","client":"conn21","doc":{"driver":{"name":"PyMongo","version":"3.11.0b1.dev0"},"os":{"type":"Linux","name":"Linux","architecture":"x86_64","version":"4.13.0-46-generic"},"platform":"CPython 3.7.0.final.0"}}
|
[js_test:mongos_quiesce_mode_streamable_ismaster_with_hanging_read] 2020-05-27T20:12:16.103-0400 s20024| 2020-05-27T20:12:16.102-04:00 D1 - 23074 [conn21] "User assertion","attr":{"error":"ShutdownInProgress: Mongos is in quiesce mode and will shut down","file":"src/mongo/s/mongos_topology_coordinator.cpp","line":117}
|
[js_test:mongos_quiesce_mode_streamable_ismaster_with_hanging_read] 2020-05-27T20:12:16.103-0400 s20024| 2020-05-27T20:12:16.102-04:00 D1 SHARDING 22772 [conn21] "Exception thrown while processing command","attr":{"db":"admin","headerId":1102520059,"error":"ShutdownInProgress: Mongos is in quiesce mode and will shut down"}
|
[js_test:mongos_quiesce_mode_streamable_ismaster_with_hanging_read] 2020-05-27T20:12:16.103-0400 s20024| 2020-05-27T20:12:16.102-04:00 I COMMAND 51803 [conn21] "Slow query","attr":{"type":"command","ns":"admin.$cmd","command":{"ismaster":1,"client":{"driver":{"name":"PyMongo","version":"3.11.0b1.dev0"},"os":{"type":"Linux","name":"Linux","architecture":"x86_64","version":"4.13.0-46-generic"},"platform":"CPython 3.7.0.final.0"},"$db":"admin"},"numYields":0,"ok":0,"errMsg":"Mongos is in quiesce mode and will shut down","errName":91,"errCode":91,"reslen":340,"protocol":"op_query","durationMillis":0}'
|
Allow the hanging read to finish on s20024. The driver should not raise any errors here:
[js_test:mongos_quiesce_mode_streamable_ismaster_with_hanging_read] 2020-05-27T20:12:17.105-0400 s20024| 2020-05-27T20:12:17.105-04:00 I COMMAND 51803 [conn16] "Slow query","attr":{"type":"command","ns":"test.state","command":{"find":"state","filter":{},"limit":1,"singleBatch":true,"lsid":{"id":{"$uuid":"018e25e2-5ad8-4225-aaee-5e12e09a115d"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1590624735,"i":7}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"test","$readPreference":{"mode":"primary"}},"nShards":1,"cursorExhausted":true,"numYields":0,"nreturned":1,"reslen":272,"protocol":"op_msg","durationMillis":1042}
|
We see that all future reads target the other mongos node s20025. Node s20024 is not targeted for the rest of its quiesce period:
[js_test:mongos_quiesce_mode_streamable_ismaster_with_hanging_read] 2020-05-27T20:12:17.358-0400 s20025| 2020-05-27T20:12:17.358-04:00 I COMMAND 51803 [conn14] "Slow query","attr":{"type":"command","ns":"test.state","command":{"find":"state","filter":{},"limit":1,"singleBatch":true,"lsid":{"id":{"$uuid":"018e25e2-5ad8-4225-aaee-5e12e09a115d"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1590624735,"i":7}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"test","$readPreference":{"mode":"primary"}},"nShards":1,"cursorExhausted":true,"numYields":0,"nreturned":1,"reslen":269,"protocol":"op_msg","durationMillis":1}
|
The test completes successfully with driver option retryReads set to false, indicating that all reads completed successfully without the driver ever having to retry.
Example logs of test case: mongos_quiesce_mode_streamable_ismaster_with_hanging_read.txt
|
|
For this test, we expect to observe that as a mongos enters quiesce mode, it should still be able to service short-running reads and writes. Since we are using the streamable isMaster protocol, the mongos should return a ShutdownInProgress error to the driver immediately upon entering quiesce mode. The driver should then set the server to be in the UNKNOWN state and no longer consider it as a target for future operations.
Our observations for this test case in a sharded cluster with two mongos nodes:
Mongos node s20024 services a series of short-running reads and writes:
[js_test:mongos_quiesce_mode_streamable_ismaster] 2020-05-27T23:56:34.511-0400 s20024| 2020-05-27T23:56:34.510-04:00 I COMMAND 51803 [conn16] "Slow query","attr":{"type":"command","ns":"test.state","command":{"update":"state","ordered":true,"lsid":{"id":{"$uuid":"3379383f-b7b2-447d-9485-d0ab8fd5f169"}},"txnNumber":3,"$clusterTime":{"clusterTime":{"$timestamp":{"t":1590638194,"i":8}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"test","$readPreference":{"mode":"primary"}},"nShards":1,"nMatched":1,"nModified":1,"numYields":0,"reslen":185,"protocol":"op_msg","durationMillis":1}
|
...
|
[js_test:mongos_quiesce_mode_streamable_ismaster] 2020-05-27T23:56:34.514-0400 s20024| 2020-05-27T23:56:34.513-04:00 I COMMAND 51803 [conn16] "Slow query","attr":{"type":"command","ns":"test.state","command":{"find":"state","filter":{},"limit":1,"singleBatch":true,"lsid":{"id":{"$uuid":"3379383f-b7b2-447d-9485-d0ab8fd5f169"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1590638194,"i":9}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"test","$readPreference":{"mode":"primary"}},"nShards":1,"cursorExhausted":true,"numYields":0,"nreturned":1,"reslen":279,"protocol":"op_msg","durationMillis":1}
|
We call shutdown on mongos node s20024, causing it to enter quiesce mode:
[js_test:mongos_quiesce_mode_streamable_ismaster] 2020-05-27T23:56:34.765-0400 s20024| 2020-05-27T23:56:34.764-04:00 I CONTROL 23377 [SignalHandler] "got signal {sig} ({strsignal_sig})","attr":{"sig":15,"strsignal_sig":"Terminated"}
|
[js_test:mongos_quiesce_mode_streamable_ismaster] 2020-05-27T23:56:34.766-0400 s20024| 2020-05-27T23:56:34.764-04:00 I CONTROL 23378 [SignalHandler] "kill from pid:{si_si_pid} uid:{si_si_uid}","attr":{"si_si_pid":15306,"si_si_uid":1000}
|
[js_test:mongos_quiesce_mode_streamable_ismaster] 2020-05-27T23:56:34.766-0400 s20024| 2020-05-27T23:56:34.765-04:00 I CONTROL 23381 [SignalHandler] "will terminate after current cmd ends"
|
[js_test:mongos_quiesce_mode_streamable_ismaster] 2020-05-27T23:56:34.766-0400 s20024| 2020-05-27T23:56:34.765-04:00 I COMMAND 4695701 [SignalHandler] "Entering quiesce mode for mongos shutdown","attr":{"quiesceTimeMillis":5000}
|
Mongos node s20024 responds to the driver immediately with a ShutdownInProgressError:
[js_test:mongos_quiesce_mode_streamable_ismaster] 2020-05-27T23:56:34.768-0400 s20024| 2020-05-27T23:56:34.767-04:00 I NETWORK 51800 [conn20] "client metadata","attr":{"remote":"127.0.0.1:36304","client":"conn20","doc":{"driver":{"name":"PyMongo","version":"3.11.0b1.dev0"},"os":{"type":"Linux","name":"Linux","architecture":"x86_64","version":"4.13.0-46-generic"},"platform":"CPython 3.7.0.final.0"}}
|
[js_test:mongos_quiesce_mode_streamable_ismaster] 2020-05-27T23:56:34.768-0400 s20024| 2020-05-27T23:56:34.767-04:00 D1 - 23074 [conn20] "User assertion","attr":{"error":"ShutdownInProgress: Mongos is in quiesce mode and will shut down","file":"src/mongo/s/mongos_topology_coordinator.cpp","line":117}
|
[js_test:mongos_quiesce_mode_streamable_ismaster] 2020-05-27T23:56:34.768-0400 s20024| 2020-05-27T23:56:34.767-04:00 D1 SHARDING 22772 [conn20] "Exception thrown while processing command","attr":{"db":"admin","headerId":2044897763,"error":"ShutdownInProgress: Mongos is in quiesce mode and will shut down"}
|
[js_test:mongos_quiesce_mode_streamable_ismaster] 2020-05-27T23:56:34.769-0400 s20024| 2020-05-27T23:56:34.768-04:00 I COMMAND 51803 [conn20] "Slow query","attr":{"type":"command","ns":"admin.$cmd","command":{"ismaster":1,"client":{"driver":{"name":"PyMongo","version":"3.11.0b1.dev0"},"os":{"type":"Linux","name":"Linux","architecture":"x86_64","version":"4.13.0-46-generic"},"platform":"CPython 3.7.0.final.0"},"$db":"admin"},"numYields":0,"ok":0,"errMsg":"Mongos is in quiesce mode and will shut down","errName":91,"errCode":91,"reslen":340,"protocol":"op_query","durationMillis":0}
|
The driver sends all future reads and writes to the other mongos node s20025:
[js_test:mongos_quiesce_mode_streamable_ismaster] 2020-05-27T23:56:35.023-0400 s20025| 2020-05-27T23:56:35.023-04:00 I COMMAND 51803 [conn16] "Slow query","attr":{"type":"command","ns":"test.state","command":{"update":"state","ordered":true,"lsid":{"id":{"$uuid":"3379383f-b7b2-447d-9485-d0ab8fd5f169"}},"txnNumber":5,"$clusterTime":{"clusterTime":{"$timestamp":{"t":1590638194,"i":10}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"test","$readPreference":{"mode":"primary"}},"nShards":1,"nMatched":1,"nModified":1,"numYields":0,"reslen":185,"protocol":"op_msg","durationMillis":1}
|
...
|
[js_test:mongos_quiesce_mode_streamable_ismaster] 2020-05-27T23:56:35.026-0400 s20025| 2020-05-27T23:56:35.026-04:00 I COMMAND 51803 [conn16] "Slow query","attr":{"type":"command","ns":"test.state","command":{"find":"state","filter":{},"limit":1,"singleBatch":true,"lsid":{"id":{"$uuid":"3379383f-b7b2-447d-9485-d0ab8fd5f169"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1590638195,"i":1}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"test","$readPreference":{"mode":"primary"}},"nShards":1,"cursorExhausted":true,"numYields":0,"nreturned":1,"reslen":279,"protocol":"op_msg","durationMillis":1}
|
The test completes successfully with both driver options retryReads and retryWrites set to false, indicating that all operations completed successfully without the driver ever having to retry.
Example logs for this test case: mongos_quiesce_mode_streamable_ismaster_with_writes.txt
|