[SERVER-59297] Allow system threads to survive InterruptedDueToStorageChange Created: 11/Aug/21  Updated: 29/Oct/23  Resolved: 15/Nov/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 5.1.1

Type: Task Priority: Major - P3
Reporter: Matthew Russotto Assignee: Adi Zaimi
Resolution: Fixed Votes: 0
Labels: pm-1897-M2
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Backwards Compatibility: Fully Compatible
Sprint: Repl 2021-10-18, Repl 2021-11-01, Replication 2021-11-15, Replication 2021-11-29
Participants:

 Description   

When we switch out the storage engine at the end of File Based Initial Sync, all opCtxs will be interrupted with InterruptedDueToStorageChange. There should be no user opCtxs (and if there are, it is OK to interrupt them), but there will also be system opCtxs such as FTDC and the watchdog thread. This ticket has two parts

1) Determine which system opCtxs exist at the end of initial sync

2) Ensure the associated tasks can survive by creating a new opCtx



 Comments   
Comment by Githook User [ 15/Nov/21 ]

Author:

{'name': 'Adi Zaimi', 'email': 'adizaimi@yahoo.com', 'username': 'adizaimi'}

Message: SERVER-59297 Catch InterruptedDueToStorageChange and continue
Branch: master
https://github.com/mongodb/mongo/commit/8d577f2fc233ef336b0ed6a55bdd9afc889bf153

Comment by Adi Zaimi [ 03/Nov/21 ]

Running a simple test with FCBIS and printing out what opCtx we are killing shows the following list:

[js_test:fcbis_simple_case] d20021| {"t":{"$date":"2021-11-03T18:43:1
4.089+00:00"},"s":"I",  "c":"STORAGE",  "id":5781190, "ctx":"ReplCoordExtern-1","msg":"Killed OpCtx for storage change","attr":{"killedOperationId":0,"client":"monitoring-keys-for-HMAC"}}
[js_test:fcbis_simple_case] d20021| {"t":{"$date":"2021-11-03T18:43:14.089+00:00"},"s":"I",  "c":"STORAGE",  "id":5781190, "ctx":"ReplCoordExtern-1","msg":"Killed OpCtx for storage change","attr":{"killedOperationId":0,"client":"OplogCapMaintainerThread-local.oplog.rs"}}
[js_test:fcbis_simple_case] d20021| {"t":{"$date":"2021-11-03T18:43:14.089+00:00"},"s":"I",  "c":"STORAGE",  "id":5781190, "ctx":"ReplCoordExtern-1","msg":"Killed OpCtx for storage change","attr":{"killedOperationId":0,"client":"TopologyVersionObserver"}}
[js_test:fcbis_simple_case] d20021| {"t":{"$date":"2021-11-03T18:43:14.397+00:00"},"s":"I",  "c":"STORAGE",  "id":5781190, "ctx":"ReplCoordExtern-1","msg":"Killed OpCtx for storage change","attr":{"killedOperationId":32713,"client":"monitoring-keys-for-HMAC"}}
[js_test:fcbis_simple_case] d20021| {"t":{"$date":"2021-11-03T18:43:14.397+00:00"},"s":"I",  "c":"STORAGE",  "id":5781190, "ctx":"ReplCoordExtern-1","msg":"Killed OpCtx for storage change","attr":{"killedOperationId":32713,"client":"TopologyVersionObserver"}}
[js_test:fcbis_simple_case] d20021| {"t":{"$date":"2021-11-03T18:43:14.579+00:00"},"s":"I",  "c":"STORAGE",  "id":5781190, "ctx":"ReplCoordExtern-0","msg":"Killed OpCtx for storage change","attr":{"killedOperationId":0,"client":"monitoring-keys-for-HMAC"}}
[js_test:fcbis_simple_case] d20021| {"t":{"$date":"2021-11-03T18:43:14.579+00:00"},"s":"I",  "c":"STORAGE",  "id":5781190, "ctx":"ReplCoordExtern-0","msg":"Killed OpCtx for storage change","attr":{"killedOperationId":0,"client":"TopologyVersionObserver"}}

Generated at Thu Feb 08 05:46:53 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.