[SERVER-69945] Synchronize setting/canceling ASIO timers Created: 23/Sep/22  Updated: 29/Oct/23  Resolved: 02/Nov/22

Status: Closed
Project: Core Server
Component/s: Internal Code
Affects Version/s: None
Fix Version/s: 6.2.0-rc0

Type: Bug Priority: Major - P3
Reporter: Amirsaman Memaripour Assignee: Amirsaman Memaripour
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Service Arch 2022-10-17, Service Arch 2022-10-31, Service Arch 2022-11-14
Participants:
Linked BF Score: 5

 Description   

Mongo's existing RPC system (i.e., NetworkInterface) may use ASIO timers to enforce timeouts. The behavior, however, is racy as threads may set and cancel a timer without synchronization. Consider the following:

  • We schedule a remote command through startCommand, but since there are no connections available, the invocation returns before starting any remote command.
  • We decide to cancel the scheduled remote command through cancelCommand, which attempts to cancel the timer from here.
  • At the same time, the connection pool provides a connection, so NetworkInterface starts sending the remote command and setting the timer as a prerequisite from here.
  • Now, we have two threads that are concurrency reading and writing an ASIO-internal value without synchronization:
    • The thread canceling the remote command is reading this value.
    • The thread that is trying to send the remote command is writing this value.

A possible solution to address this issue is to synchronize setting/canceling timers at NetworkInterface layer.



 Comments   
Comment by Githook User [ 02/Nov/22 ]

Author:

{'name': 'Celina Tala', 'email': 'celinahtala@gmail.com', 'username': 'celinatala-1'}

Message: SERVER-69945 Changed `_timer` in transport_layer_asio to synchronized_value
Branch: master
https://github.com/mongodb/mongo/commit/e39ef953f5836461e8593f9dd77e90b484390a4c

Comment by Amirsaman Memaripour [ 13/Oct/22 ]

Updating the ticket requirement:

I believe we need to add the synchronization to ASIOReactorTimer, in particular to ASIOReactorTimer::_timer. The data-race concern concurrent accesses to async_wait and cancel on the asio::system_timer, so wrapping the timer in synchronized_value should fix the data-race:

std::shared_ptr<synchronized_value<asio::system_timer, RawSynchronizedValueMutexPolicy>> _timer;

Generated at Thu Feb 08 06:14:51 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.