[SERVER-39915] Access to _localOplogCollection is not synchronized Created: 01/Mar/19  Updated: 20/Sep/19  Resolved: 20/Sep/19

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 3.4.19
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Tess Avitabile (Inactive) Assignee: Siyuan Zhou
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Operating System: ALL
Sprint: Repl 2019-07-01
Participants:
Linked BF Score: 15

 Description   

Access to _localOplogCollection is not synchronized on 3.4. We take a Global X lock when setting the pointer to null, an IX lock on local.oplog.rs when setting the pointer to a non-null value, and no lock when reading the pointer. This means that we can read that the pointer is non-null, then call a function on a null pointer, leading to an invalid access.

This issue is fixed on versions 3.6 and later by SERVER-30639. This commit ensures that we only set the pointer to null when dropping the local database, which is not allowed when replication is enabled. The pointer is only used when replication is enabled.



 Comments   
Comment by Siyuan Zhou [ 20/Sep/19 ]

In 4.2+, the oplog pointer is always protected by the global lock. acquireOplogCollectionForLogging() and establishOplogCollectionForLogging either acquire locks or invariant the lock is already acquired. The oplog could not be dropped/renamed while in replset mode. The only thing destroying the oplog collection object (calling clearLocalOplogPtr) is rollback/restart catalog and shutdown, where we either have the global X lock or nothing could happen concurrently.

As mentioned by tess.avitabile, this is only an issue in 3.4 and only occurred once in tests, I'd prefer to close this ticket as "Won't Fix".

Generated at Thu Feb 08 04:53:29 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.