Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 7.1.0-rc0
Affects Version/s: 7.1.0-rc0
Component/s: None
Labels:
None

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Sprint:
Execution NAMR Team 2023-07-24
Linked BF Score:
140
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

SERVER-73539 introduced replay protection for setAllowmigrations, as part of those changes (and the posterior fix SERVER-78021), we create an AlternativeClientRegion where a transaction with a majority write concern is performed. Currently the JournalFlusher is an unkillable thread that tries to get the RSTL lock when waiting until all commits before the call are durable in the journal, so, in the presence of a stepdown, the following scenario might happen in the config server:

A thread with setAllowMigrations (which checked out a session) waits for the changes to the metadata to be majority committed
A stepdown thread takes the RSTL lock and tries to checkout the session of 1. to kill it
Another thread with the JournalFlusher tries to take the RSTL lock taken by 2.

After 3 we have one thread (1) waiting for majority, but the thread that waits for the changes to become durable (2) is waiting for the RSTL lock that is taken by the stepdown thread (3) waiting for a session to be checked in, causing a 3-way deadlock. Attached to the ticket we can find 2 stacktraces with the problem described above.

One way this could be solved is by making the JournalFlusher thread to also be killable like the main operation (in this case the setAllowMigrations thread).

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

gdb_s0_n2.txt
1.18 MB
Jul 17 2023 05:58:34 PM UTC
gdb.BFG-2016684.c_n1.txt
1.37 MB
Jul 17 2023 05:58:31 PM UTC

is related to

SERVER-55745 The Fuzzer can run killOp on the JournalFlusher thread and cause it to throw an unexpected error

Closed

SERVER-73539 stopMigrations/resumeMigrations don't have replay protection

Closed

SERVER-78021 Retrying setAllowMigrations command may end up in a deadlock

Closed

SERVER-70127 Default system operations to be killable by stepdown

Closed

SERVER-74657 revisit if thread marked as unkillable is okay to be killable for storage execution related

Closed

related to

SERVER-79810 make JournalFlusher::waitForJournalFlush() interruptible when waiting for write concern

Closed

SERVER-79174 Improve journal flusher interruption handling

Closed

(2 related to)

Assignee:: Gregory Wlodarek
Reporter:: Marcos José Grillo Ramirez
Participants:: Benety Goh, Githook User, Gregory Wlodarek, Marcos José Grillo Ramirez
Votes:: 0 Vote for this issue
Watchers:: 6 Start watching this issue

Created:: Jul 17 2023 06:00:20 PM UTC
Updated:: Oct 29 2023 09:18:50 PM UTC
Resolved:: Jul 18 2023 07:33:01 PM UTC
Confidence Status Last Update:: 17/Jul/23 6:39 PM

Details

Description

Attachments

Attachments

Issue Links

Forms

Activity

People

Dates