The rollback tests for resumable index builds sometimes hang during the RollbackTest.transitionToSyncSourceOperationsBeforeRollback phase. The logs indicate that the replSetStepDown is failing to acquire the RSTL to complete the stepdown:
[js_test:rollback_resumable_index_build_drain_writes_phase] 2020-09-06T11:16:14.276+0000 d21280| 2020-09-06T11:16:14.276+00:00 I COMMAND 51803 [conn103] "Slow query","attr":{"type":"command","ns":"admin.$cmd","appName":"MongoDB Shell","command":{"replSetStepDown":1,"force":true,"lsid":{"id":{"$uuid":"b9fb9ad3-9c2d-4a97-90ae-7861250da89c"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1599390943,"i":4}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$readPreference":{"mode":"secondaryPreferred"},"$db":"admin"},"numYields":0,"ok":0,"errMsg":"Unable to acquire X lock on '{4611686018427387905: ReplicationStateTransition, 1}' within 1000ms. opId: 2497, op: conn103, connId: 103.","errName":"LockTimeout","errCode":24,"reslen":347,"locks":{"ReplicationStateTransition":{"acquireCount":{"W":1},"acquireWaitCount":{"W":1},"timeAcquiringMicros":{"W":1001442}}},"protocol":"op_msg","durationMillis":1002}
The resumable index build rollback test fixture is supposed to detect an interrupted index build and disable the fail point that is blocking the stepdown in this parallel shell script.
[js_test:rollback_resumable_index_build_drain_writes_phase] 2020-09-06T11:15:44.360+0000 d21280| 2020-09-06T11:15:44.359+00:00 W CONTROL 23829 [conn152] "Set failpoint","attr":{"failPointName":"hangIndexBuildDuringBulkLoadPhase","failPoint":{"mode":0,"data":{},"timesEntered":2}}
However, the lock manager diagnostic output shows the index build still waiting in the (disabled) fail point:
[js_test:rollback_resumable_index_build_drain_writes_phase] 2020-09-06T11:28:15.455+0000 d21280| 2020-09-06T11:28:15.454+00:00 I - 20521 [initandlisten] "lock manager dump","attr":{"addr":"0x7f157b3805d0","locks":[{"lockAddr":"0x7f1587ac7f00","resourceId":"{6917529027641081857: Global, 1}","granted":[{"lockRequest":"0x877","lockRequestAddr":"0x7f1588764380","thread":"139729275131648","mode":"IX","convertMode":"NONE","enqueueAtFront":false,"compatibleFirst":false,"debugInfo":"index build: e7e6b33d-5d1e-475b-93ff-556d37705231","clientInfo":{"desc":"IndexBuildsCoordinatorMongod-0","opid":2167}}],"pending":[]},{"lockAddr":"0x7f1587ac7e80","resourceId":"{4611686018427387905: ReplicationStateTransition, 1}","granted":[{"lockRequest":"0x877","lockRequestAddr":"0x7f1588764380","thread":"139729275131648","mode":"IX","convertMode":"NONE","enqueueAtFront":false,"compatibleFirst":false,"debugInfo":"index build: e7e6b33d-5d1e-475b-93ff-556d37705231","clientInfo":{"desc":"IndexBuildsCoordinatorMongod-0","opid":2167}}],"pending":[]},{"lockAddr":"0x7f15889d0100","resourceId":"{10123292395995783581: Database, 899920359141007773, test}","granted":[{"lockRequest":"0x877","lockRequestAddr":"0x7f1588764380","thread":"139729275131648","mode":"IX","convertMode":"NONE","enqueueAtFront":false,"compatibleFirst":false,"debugInfo":"index build: e7e6b33d-5d1e-475b-93ff-556d37705231","clientInfo":{"desc":"IndexBuildsCoordinatorMongod-0","opid":2167}}],"pending":[]},{"lockAddr":"0x7f15889d0400","resourceId":"{11723802635811401853: Collection, 194587589742932093, test.rollback_resumable_index_build_drain_writes_phase}","granted":[{"lockRequest":"0x877","lockRequestAddr":"0x7f1588764380","thread":"139729275131648","mode":"IX","convertMode":"NONE","enqueueAtFront":false,"compatibleFirst":false,"debugInfo":"index build: e7e6b33d-5d1e-475b-93ff-556d37705231","clientInfo":{"desc":"IndexBuildsCoordinatorMongod-0","opid":2167}}],"pending":[]}]}
- is related to
-
SERVER-48419 Extend rollback to recover resumable index builds efficiently
- Closed
-
SERVER-49847 Turn on resumable index build feature by default
- Closed