[SERVER-45626] Consistent Oplog Locking Rules Created: 13/Jan/20  Updated: 29/Oct/23  Resolved: 13/Mar/20

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 4.7.0, 4.4.2

Type: Task Priority: Major - P3
Reporter: PM Bot Assignee: Lingzhi Deng
Resolution: Fixed Votes: 0
Labels: former-quick-wins
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
depends on SERVER-44906 Rollback should take global write loc... Backlog
Problem/Incident
causes SERVER-47959 Retry JournalFlusher oplog reads on W... Closed
Related
is related to SERVER-46930 AutoGetOplog doesn't acquire collecti... Closed
Backwards Compatibility: Fully Compatible
Backport Requested:
v4.4
Sprint: Repl 2020-03-09, Repl 2020-03-23
Participants:
Linked BF Score: 23

 Description   

In principle we should only need the Global IS/IX lock to read/write the oplog. However, see SERVER-44906, we're inconsistent. Furthermore, there are cases our code requires a DB and Collection lock in order to get a Collection pointer to the oplog, although this is not fundamentally necessary.

Inconsistent locking invites mistakes: deadlocks, race conditions, and unnecessary waiting.



 Comments   
Comment by Githook User [ 14/Sep/20 ]

Author:

{'name': 'Lingzhi Deng', 'email': 'lingzhi.deng@mongodb.com', 'username': 'ldennis'}

Message: SERVER-45626: Introduce AutoGetOplog for consistent oplog locking rules SERVER-46930: Fix AutoGetOplog for non-document-locking storage engines SERVER-47959 JournalFlusher will retry oplog reads on WriteConflictExceptions caused by a concurrent

{full:true}

validate command on the oplog collection

(cherry picked from commit c15e8ae74071482d69179c7e5e5e6bdc882d2beb)
(cherry picked from commit 4c4cd897f09a85441ad60058b42ea1149b65d7de)
(cherry picked from commit dcc42b3db40ecc1cb3ca278d9dcc2208a6c7734a)
Branch: v4.4
https://github.com/mongodb/mongo/commit/503dce2ce6465116cd05d04e4ea88f837e406e8d

Comment by Tess Avitabile (Inactive) [ 14/Sep/20 ]

Sounds good to me!

Comment by Lingzhi Deng [ 11/Sep/20 ]

Ah I see. In that case, yes, we can schedule a backport I guess. Feel free to request one and we can put it in the execution sprint then. CC tess.avitabile any objections?

Comment by Louis Williams [ 11/Sep/20 ]

Yes, SERVER-48452 fixes the bug underlying SERVER-49781 (present in 4.4). That is: internal operations may unexpectedly read at timestamps, and this can lead to crashing.

Because AutoGetOplog eliminated unnecessary callsites of AutoGetCollectionForRead for oplog I guess

Correct.

Your commit landed only a week after the 4.4 branch (on Mar 4), so it seems possible that this isn't extremly risky to put into 4.4.

Comment by Lingzhi Deng [ 11/Sep/20 ]

Why does SERVER-48452 depends on this? Because AutoGetOplog eliminated unnecessary callsites of AutoGetCollectionForRead for oplog I guess? Does SERVER-48452 fix a bug in 4.4? Otherwise backporting this (and we also need SERVER-47959) seems to come with some risks of destabilizing the 4.4 branch.

Comment by Louis Williams [ 11/Sep/20 ]

lingzhi.deng I'm trying to backport SERVER-48452 to 4.4, but I'm realizing that it depends on this AutoGetOplog change. How realistic would it be to backport would this to 4.4?

Comment by Githook User [ 13/Mar/20 ]

Author:

{'username': 'ldennis', 'name': 'Lingzhi Deng', 'email': 'lingzhi.deng@mongodb.com'}

Message: SERVER-45626: Introduce AutoGetOplog for consistent oplog locking rules
Branch: master
https://github.com/mongodb/mongo/commit/c15e8ae74071482d69179c7e5e5e6bdc882d2beb

Generated at Thu Feb 08 05:09:19 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.