[SERVER-51425] Restarting JournalFlusher after rollback is not thread-safe Created: 07/Oct/20  Updated: 29/Oct/23  Resolved: 13/Nov/20

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 4.9.0, 4.4.6

Type: Bug Priority: Major - P3
Reporter: Lingzhi Deng Assignee: Lingzhi Deng
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.4
Sprint: Execution Team 2020-11-16, Execution Team 2020-11-30
Participants:
Linked BF Score: 27

 Description   

In startStorageControls, we create a new instance of JournalFlusher and set it to the serviceContext. This is fine at startup time. But if startStorageControls is called during rollback, we will be destroying the existing instance of JournalFlusher which some preexisting callers of JournalFlusher::waitForJournalFlush may still have references to (via JournalFlusher::get). And so those callers could be accessing freed memory and would likely trash the memory.



 Comments   
Comment by Githook User [ 07/Apr/21 ]

Author:

{'name': 'Lingzhi Deng', 'email': 'lingzhi.deng@mongodb.com', 'username': 'ldennis'}

Message: SERVER-51425: Pause JournalFlusher during replication rollback

(cherry picked from commit 8a6082c74e1f6336784ad46cc609adc540d51712)
Branch: v4.4
https://github.com/mongodb/mongo/commit/d30814ebb3eb9146456b8f6b05cbb773ad60e19c

Comment by Githook User [ 13/Nov/20 ]

Author:

{'name': 'Lingzhi Deng', 'email': 'lingzhi.deng@mongodb.com', 'username': 'ldennis'}

Message: SERVER-51425: Pause JournalFlusher during replication rollback
Branch: master
https://github.com/mongodb/mongo/commit/8a6082c74e1f6336784ad46cc609adc540d51712

Comment by Lingzhi Deng [ 10/Nov/20 ]

The Checkpointer potentially has the same problem. If we reset the Checkpointer unique ptr on the ServiceContext while callers of Checkpointer::get are still holding references to the Checkpointer instance, we could end up accessing freed memory. But the Checkpointer is currently only used here in setStableTimestamp. So I guess we are probably safe for now because we probably don't expect setStableTimestamp to be called during rollback.

Generated at Thu Feb 08 05:25:26 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.