[SERVER-48633] WaitForReplication hook should ignore errors from quiesce mode Created: 05/Jun/20  Updated: 29/Oct/23  Resolved: 08/Jun/20

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 4.7.0

Type: Bug Priority: Major - P3
Reporter: Lingzhi Deng Assignee: Lingzhi Deng
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Repl 2020-06-15
Participants:
Linked BF Score: 21

 Description   

Based on this, it doesnt seem we would skip errors and we still always throw. In a build failure, we didn't see the log line "Ignoring shutdown error" either. So we should verify if the if-statement is still correct and if InterruptedAtShutdown and ShutdownInProgress are the onlyu two expected errors for quiesce mode.



 Comments   
Comment by Githook User [ 08/Jun/20 ]

Author:

{'name': 'Lingzhi Deng', 'email': 'lingzhi.deng@mongodb.com', 'username': 'ldennis'}

Message: SERVER-48633: WaitForReplication hook should ignore errors from quiesce mode
Branch: master
https://github.com/mongodb/mongo/commit/149fdde77c1ee408f4fc467a1022024ec8679044

Comment by Lingzhi Deng [ 08/Jun/20 ]

I reproduced this logging all errors and it seems that the error was a javascript exception:

[js_test:backup_restore_stop_start] 2020-06-06T00:50:57.421+0000 sh16276| [WaitForReplication:job0] [jsTest] ----
[js_test:backup_restore_stop_start] 2020-06-06T00:50:57.422+0000 sh16276| [WaitForReplication:job0] [jsTest] WaitForReplication got error: Error: The server is in quiesce mode and will shut down
[js_test:backup_restore_stop_start] 2020-06-06T00:50:57.422+0000 sh16276| [WaitForReplication:job0] [jsTest] ----
[js_test:backup_restore_stop_start] 2020-06-06T00:50:57.422+0000 sh16276| [WaitForReplication:job0]
[js_test:backup_restore_stop_start] 2020-06-06T00:50:57.422+0000 sh16276| [WaitForReplication:job0] Error: The server is in quiesce mode and will shut down :
[js_test:backup_restore_stop_start] 2020-06-06T00:50:57.422+0000 sh16276| [WaitForReplication:job0] _constructFromExistingSeedNode/self.nodes<@src/mongo/shell/replsettest.js:3310:48
[js_test:backup_restore_stop_start] 2020-06-06T00:50:57.422+0000 sh16276| [WaitForReplication:job0] _constructFromExistingSeedNode@src/mongo/shell/replsettest.js:3310:22
[js_test:backup_restore_stop_start] 2020-06-06T00:50:57.422+0000 sh16276| [WaitForReplication:job0] ReplSetTest/<@src/mongo/shell/replsettest.js:3321:13
[js_test:backup_restore_stop_start] 2020-06-06T00:50:57.422+0000 sh16276| [WaitForReplication:job0] retryOnNetworkError@src/mongo/shell/utils.js:57:20
[js_test:backup_restore_stop_start] 2020-06-06T00:50:57.422+0000 sh16276| [WaitForReplication:job0] ReplSetTest@src/mongo/shell/replsettest.js:3317:9
[js_test:backup_restore_stop_start] 2020-06-06T00:50:57.422+0000 sh16276| [WaitForReplication:job0] @(shell eval):4:23
[js_test:backup_restore_stop_start] 2020-06-06T00:50:57.422+0000 sh16276| [WaitForReplication:job0] exiting with code -4

So we should probably need to check the e.message instead like we did in shell/utils.js.

Comment by Lingzhi Deng [ 05/Jun/20 ]

Yes, but the fact that we didnt even see the log line makes me think the if-statement could be wrong. Maybe the error code part alone is correct but something else is wrong.

Comment by Pavithra Vetriselvan [ 05/Jun/20 ]

Just a note, ShutdownInProgress is the only error that we would expect to receive during quiesce mode. When I talked to execution about the original fix, we decided that it is fine to ignore all shutdown errors since the backup restore hook doesn't have anything to gain by running concurrently with shutdowns.

Generated at Thu Feb 08 05:17:40 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.