Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 4.0.2, 4.1.2
Affects Version/s: 4.0.0-rc1, 4.1.1
Component/s: Storage
Labels:
- SWNA
- nyc

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Backport Requested:

v4.0, v3.6
Sprint:
Repl 2018-07-30, Repl 2018-08-13, Repl 2018-08-27
Case:
Linked BF Score:
36
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

ReplicationCoordinatorExternalStateImpl::waitForAllEarlierOplogWritesToBeVisible() holds a collection lock on the oplog while doing a blocking wait. This can cause a hang described below:

1. First, perform an insert into a replicated collection using insertDocuments(). An optime is generated, but not committed. If another write occurs after this at a later optime, a "hole" is created by the timestamped write is that is not yet committed.

2. A reader using readConcern "atClusterTime" or "afterClusterTime" begins a read. This uses ReplicationCoordinatorExternalStateImpl::waitForAllEarlierOplogWritesToBeVisible() to wait for all uncommitted operations to become committed and visible.

This waits for the uncommitted insert in step 1 to be commited while holding a DBLock("local", MODE_IS)

3. A dropCollection command is received on the "local" database, and enqueues a DBLock("local", MODE_X).

4. The first insert completes the insert in the storage engine and attempts to write the oplog entry at the generated optime. It attempts to acquire a DBLock("local", MODE_IX).

The previously enqueued dropCollection operation prevents the insert from acquiring the "local" database lock.
waitForAllEarlierOplogWritesToBeVisible() holds its collection lock while waiting the insert to become visible, which waits behind the dropCollection operation

This method should be redesigned so that a collection lock is not required to be held while waiting for the last oplog entry to become visible.

causes

SERVER-37048 Hold global intent lock whenever accessing the oplog collection pointer

Closed

depends on

SERVER-36508 _getNextSessionMods command should not hold locks on migration collection while querying the oplog

Closed

is related to

SERVER-35365 MapReduce temporary inc collections should be written to the local database

Closed

SERVER-36514 Hold lock on oplog as soon as optime is reserved

Closed

SERVER-36534 Don't acquire locks on oplog when writing oplog entries

Closed

related to

SERVER-40498 Writing transaction oplog entries must not take locks while holding an oplog slot

Closed

(1 related to)

Assignee:: Spencer Brody (Inactive)
Reporter:: Louis Williams
Participants:: Eric Milkie, Githook User, Louis Williams, Spencer Brody, Tess Avitabile
Votes:: 0 Vote for this issue
Watchers:: 14 Start watching this issue

Created:: Jun 01 2018 09:03:21 PM UTC
Updated:: Oct 29 2023 10:31:09 PM UTC
Resolved:: Aug 15 2018 09:25:03 PM UTC

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates