Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Won't Fix
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: Replication
Labels:
None

Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

There exists a potential 3-way deadlock currently.

1. First, perform an insert into a replicated collection using insertDocuments(). An optime is generated, but not committed. If another write transaction commits after this with a later optime, a "hole" is created by the timestamped write that is not yet committed.

2. Someone tries to read the oplog, for example a secondary querying against its sync source. Oplog reads call waitForAllEarlierOplogWritesToBeVisible() to wait for all uncommitted operations to become committed and visible.

This waits for the uncommitted insert in step 1 to be committed while holding an IS lock on the oplog (and thus on the 'local' database).

3. A DDL operation is received on the "local" database, and enqueues a request for an X lock on the 'local' database.

Now the local database DDL operation in thread 3 blocks behind the IX lock held by thread 2. Thread 2 can't complete until the insert in thread 1 completes. And the insert in thread 1 winds up blocking when it goes to acquire an IX lock on the local database in order to write its oplog entry. That lock acquisition is blocked by the pending X lock request by thread 3.

This deadlock was made far more likely to hit as part of ~~SERVER-35365~~, which changed all mapReduce commands to perform DDL operations on the local database.

One way to fix this would be if thread 1 took the lock on the oplog before creating the oplog hole and held on to that lock until the oplog entry was written and the hole removed.

is related to

SERVER-36534 Don't acquire locks on oplog when writing oplog entries

Closed

related to

SERVER-35367 Hold locks in fewer callers of waitForAllEarlierOplogWritesToBeVisible()

Closed

Assignee:: Spencer Brody (Inactive)
Reporter:: Spencer Brody (Inactive)
Participants:: Eric Milkie, Gregory McKeon, Spencer Brody
Votes:: 0 Vote for this issue
Watchers:: 9 Start watching this issue

Created:: Aug 07 2018 10:23:59 PM UTC
Updated:: Sep 15 2018 02:48:55 PM UTC
Resolved:: Aug 24 2018 08:19:15 PM UTC
Confidence Status Last Update:: 13/Aug/18 6:49 PM

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates