[SERVER-19872] logging subsystem may block the event loop in NetworkInterfaceASIO Created: 11/Aug/15  Updated: 06/Dec/22  Resolved: 29/Oct/19

Status: Closed
Project: Core Server
Component/s: Logging, Networking
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Adam Midvidy Assignee: Backlog - Service Architecture
Resolution: Won't Fix Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-31891 Service entry point holds session loc... Closed
Assigned Teams:
Service Arch
Participants:
Linked BF Score: 0

 Description   

Loging calls can periodically block the calling thread in our logsystem when we fsync the log file.

Although we have not empirically measured the impact of this on NetworkInterfaceASIO, there is an assumption in the design of NIA that threads running in the event loop should never be blocked. If the event loop is blocked by the logging subsystem, this could potentially cause network operation handling to block as well.

We should investigate potential workarounds, including implementing limited support for asynchronous logging.

cc schwerin acm



 Comments   
Comment by Mira Carey [ 29/Oct/19 ]

We've lived with blocking logging on networking threads for years. I don't think we're going to end up fixing this

Comment by Andy Schwerin [ 13/Aug/15 ]

I believe that implementing a non-blocking, best-effort logging subsystem is out of scope for the network interface asio project. The actual networking code in NetworkInterfaceASIO shouldn't log much at the default log level, anyways – maybe nothing – and we can stipulate that the callbacks must not log, just as they must not acquire database locks or perform other blocking actions.

Eventually, I believe we'll want to replace the entire implementation of the logging system with one that supports best-effort asynchronous logging, at least for certain timing-sensitive paths like replication liveness transmission and the async networking layer, but I do not think it is necessary at this time.

Generated at Thu Feb 08 03:52:20 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.