[SERVER-34799] Wrap SyncSourceFeedback thread in try/catch that shuts down the server Created: 02/May/18  Updated: 08/Jan/24  Resolved: 07/May/18

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Spencer Brody (Inactive) Assignee: Backlog - Replication Team
Resolution: Works as Designed Votes: 0
Labels: neweng
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Related
Assigned Teams:
Replication
Backport Requested:
v4.0, v3.6, v3.4
Participants:

 Description   

If the SyncSourceFeedback thread stops running for any reason, nothing will ever restart it and this node will stop forwarding replication progress to its sync source.  We should ensure that any errors in executing SSF don't result in the SSF thread dying, and to enforce that we should have the server fassert if any exception escapes from SSF execution.



 Comments   
Comment by Mira Carey [ 04/May/18 ]

stdx::thread's only substantial modification from std::thread is that it also calls std::terminate if construction fails.  It definitely calls terminate if the function passed to it leaks an exception out.

For types we have that run generic background tasks, we follow similar patterns:

  • PeriodicRunner - fasserts
  • BackgroundJob - throws, catchs, re-throws, lets std::thread terminate
  • TaskExecutor/NetworkInterface - invokes the callback in a noexcept function, which calls terminate
  • ThreadPool - invokes std::terminate

It's totally believable to me that we had some subsystems eating errors at some point, but I don't think any of them are now (and it'd be a bug if they did)

It also looks to me like the SyncSourceFeedback::run is called from a stdx::thread, and that there aren't any try/catcher's. Which makes me think that exceptions that bubble up terminate in this particular case.

Comment by Spencer Brody (Inactive) [ 03/May/18 ]

Oh, maybe.  I wasn't sure if the thread would just die.  I think in the past that was the case, but maybe we changed that at some point to abort the server, which does seem like preferable behavior.  mira.carey@mongodb.com

Comment by Andy Schwerin [ 03/May/18 ]

Don't exception that escape the thread already call std:: terminate you execute? That's the behavior unless we're catching the exception and swallowing it.

Generated at Thu Feb 08 04:37:53 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.