[SERVER-50519] resumable index build hangs waiting for majority commit point in replica_sets_kill_secondaries_jscore_passthrough Created: 25/Aug/20  Updated: 04/Dec/23  Resolved: 11/Sep/20

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: None
Fix Version/s: 4.8.0

Type: Bug Priority: Major - P3
Reporter: Benety Goh Assignee: Benety Goh
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-50539 extend resumable index build majority... Closed
is related to SERVER-50446 make index builds non-resumable when ... Closed
is related to SERVER-44563 Isolate isMajorityCommittedRead waite... Closed
is related to SERVER-48476 resumable index build should use majo... Closed
is related to SERVER-49847 Turn on resumable index build feature... Closed
is related to SERVER-48398 Writing config document to "local.sys... Backlog
is related to SERVER-48399 Writing config document to "local.sys... Closed
is related to SERVER-83857 Allow index builds to be resumable on... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Execution Team 2020-09-21
Participants:

 Description   

When resumable index builds are enabled (see SERVER-49847), the replica_sets_kill_secondaries_jscore_passthrough test suite regularly times out. The logs indicate that there are resumable index builds stuck waiting for the majority commit point before scanning the collection.

The majority wait functionality was added in SERVER-48476 to support rolling back a resumable index build.



 Comments   
Comment by Githook User [ 11/Sep/20 ]

Author:

{'name': 'Benety Goh', 'email': 'benety@mongodb.com', 'username': 'benety'}

Message: SERVER-50519 index build is resumable only if commit quorum is the default (i.e. all-voters) and node is a voter
Branch: master
https://github.com/mongodb/mongo/commit/6a031f67c7040b925a36e8092c4f5d36510d68f2

Comment by Benety Goh [ 10/Sep/20 ]

We can avoid having the commitIndexBuild oplog entry be generated before the majority wait by restricting resumable index builds to voting nodes. This extends the conditions for resumable index builds added in SERVER-50446.

Comment by Benety Goh [ 09/Sep/20 ]

The reconfig problem observed in this ticket might be alleviated by SERVER-49398 and SERVER-48399.

Comment by Samyukta Lanka [ 03/Sep/20 ]

After looking at the logs and hang analyzer output, it looks like this is a specific situation that can only happen with a node that isn't required for the commit quorum because the primary is able to write the commitIndexBuild oplog entry before the secondary completes the majority wait before starting the index build. The primary was recently restarted, so it is in the secondary state when the non-voting secondary syncs from it, meaning that the commit point isn't advancing at this time.

[ReplicaSetFixture:job10:primary] 2020-08-28T02:09:37.137+0000 | 2020-08-28T02:09:37.137+00:00 I  STORAGE  3856201 [conn40] "Index build: commit quorum satisfied","attr":{"indexBuildEntry":{"_id":{"$uuid":"33679498-1022-419b-8575-f1d5711463d5"},"collectionUUID":{"$uuid":"70d3c4dd-26cc-447b-bf33-7e2cb80a71b4"},"commitQuorum":"votingMembers","indexNames":["point_2d"],"commitReadyMembers":["localhost:22500"]}}
...
[ReplicaSetFixture:job10:primary] 2020-08-28T02:09:37.139+0000 | 2020-08-28T02:09:37.139+00:00 I  INDEX    20447   [conn36] "Index build: completed","attr":{"buildUUID":{"uuid":{"$uuid":"33679498-1022-419b-8575-f1d5711463d5"}}}
...
[ReplicaSetFixture:job10:primary] 2020-08-28T02:10:12.350+0000 | 2020-08-28T02:10:12.350+00:00 I  CONTROL  23377   [SignalHandler] "Received signal","attr":{"signal":15,"error":"Terminated"}
...
[ReplicaSetFixture:job10:primary] 2020-08-28T02:10:26.718+0000 | 2020-08-28T02:10:26.718+00:00 I  REPL     21358   [OplogApplier-0] "Replica set state transition","attr":{"newState":"SECONDARY","oldState":"RECOVERING"}
...
[ReplicaSetFixture:job10:secondary] 2020-08-28T02:10:27.501+0000 | 2020-08-28T02:10:27.499+00:00 I  REPL     21092   [BackgroundSync] "Scheduling fetcher to read remote oplog","attr":{"syncSource":"localhost:22500","lastOpTimeFetched":{"ts":{"$timestamp":{"t":1598580577,"i":4}},"t":1}}
...
[ReplicaSetFixture:job10:secondary] 2020-08-28T02:10:27.522+0000 | 2020-08-28T02:10:27.521+00:00 I  REPL     21832   [ReplCoordExtern-0] "Choosing new sync source. Our current sync source is not primary and does not have a sync source, so we require that it is ahead of us","attr":{"syncSource":"localhost:22500","lastOpTimeFetched":{"ts":{"$timestamp":{"t":1598580612,"i":1}},"t":1},"syncSourceLatestOplogOpTime":{"ts":{"$timestamp":{"t":1598580612,"i":1}},"t":1},"isPrimary":false}
...
[ReplicaSetFixture:job10:secondary] 2020-08-28T02:10:27.543+0000 | 2020-08-28T02:10:27.533+00:00 D2 REPL_HB  4615620 [ReplCoord-0] "Received response to heartbeat","attr":{"requestId":8,"target":"localhost:22500","response":{"ok":1,"state":2,"v":2,"configTerm":1,"set":"rs","term":1,"durableOpTime":{"ts":{"$timestamp":{"t":1598580612,"i":1}},"t":1},"durableWallTime":{"$date":"2020-08-28T02:10:12.174Z"},"opTime":{"ts":{"$timestamp":{"t":1598580612,"i":1}},"t":1},"wallTime":{"$date":"2020-08-28T02:10:12.174Z"},"$replData":{"term":1,"lastOpCommitted":{"ts":{"$timestamp":{"t":0,"i":0}},"t":-1},"lastCommittedWall":{"$date":"1970-01-01T00:00:00.000Z"},"lastOpVisible":{"ts":{"$timestamp":{"t":0,"i":0}},"t":-1},"configVersion":2,"configTerm":1,"replicaSetId":{"$oid":"5f48675ed265de5f69496b2f"},"syncSourceIndex":-1,"isPrimary":false},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1598580612,"i":1}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"operationTime":{"$timestamp":{"t":1598580612,"i":1}}}}
...
[ReplicaSetFixture:job10:secondary] 2020-08-28T02:10:27.546+0000 | 2020-08-28T02:10:27.545+00:00 I  STORAGE  4847600 [IndexBuildsCoordinatorMongod-0] "Index build: waiting for last optime before interceptors to be majority committed","attr":{"buildUUID":{"uuid":{"$uuid":"33679498-1022-419b-8575-f1d5711463d5"}},"deadline":{"$date":{"$numberLong":"9223372036854775807"}},"timeoutMillis":9223372036854776000,"lastOpTime":{"ts":{"$timestamp":{"t":1598580577,"i":8}},"t":1}}

At this point, the former primary runs for election again and wins, causing the term to increment. An automatic reconfig from the stepUp causes the config term to increment. The secondary eventually learns of the new config via heartbeat and tries to persist the new config to disk.

[ReplicaSetFixture:job10:primary] 2020-08-28T02:10:32.328+0000 | 2020-08-28T02:10:32.326+00:00 I  ELECTION 21450   [ReplCoord-0] "Election succeeded, assuming primary role","attr":{"term":2}
...
[ReplicaSetFixture:job10:primary] 2020-08-28T02:10:32.329+0000 | 2020-08-28T02:10:32.328+00:00 I  REPL     4508103 [OplogApplier-0] "Increment the config term via reconfig"
...
[ReplicaSetFixture:job10:secondary] 2020-08-28T02:10:32.333+0000 | 2020-08-28T02:10:32.331+00:00 D1 REPL_HB  4615622 [ReplCoord-0] "Received new config via heartbeat","attr":{"newConfigVersionAndTerm":"{version: 2, term: 2}"}
[ReplicaSetFixture:job10:secondary] 2020-08-28T02:10:32.333+0000 | 2020-08-28T02:10:32.332+00:00 D2 REPL_HB  4615626 [ReplCoord-2] "Config validated for reconfig; persisting to disk","attr":{"newConfigVersionAndTerm":"{version: 2, term: 2}"}

From the hang analyzer output, we see that the secondary is stuck persisting the new config while trying to get the DBLock. This is because the oplog applier is holding the PBWM in X mode while trying to apply the commitIndexBuild oplog entry (which is stuck waiting for the majority wait to unblock the index build).

Thread 28 (Thread 0x7f5ed5077700 (LWP 62791)):
...
#12 mongo::Interruptible::waitForConditionOrInterruptFor<std::unique_lock<mongo::latch_detail::Latch>, mongo::CondVarLockGrantNotification::wait(mongo::OperationContext*, mongo::Milliseconds)::<lambda()> > (waitTimer=0x0, pred=..., ms=..., m=..., cv=..., this=0x7f5eff786600) at src/mongo/util/interruptible.h:480
#13 mongo::CondVarLockGrantNotification::wait (this=0x7f5efdbf4728, opCtx=0x7f5eff786600, timeout=...) at src/mongo/db/concurrency/lock_state.cpp:249
#14 0x00007f5ef13869ee in mongo::LockerImpl::_lockComplete (this=0x7f5efdbf4680, opCtx=0x7f5eff786600, resId=..., mode=mongo::MODE_IS, deadline=...) at src/mongo/db/concurrency/lock_state.cpp:939
#15 0x00007f5ef136771b in mongo::Lock::ResourceLock::lock (this=0x7f5ed5075588, opCtx=<optimized out>, mode=<optimized out>, deadline=...) at src/mongo/db/concurrency/d_concurrency.cpp:306
#16 0x00007f5ef13678fb in mongo::Lock::GlobalLock::GlobalLock (this=0x7f5ed5075578, opCtx=0x7f5eff786600, lockMode=mongo::MODE_IX, deadline=..., behavior=<optimized out>) at src/mongo/db/concurrency/d_concurrency.cpp:152
#17 0x00007f5ef13679eb in mongo::Lock::DBLock::DBLock (this=0x7f5ed5075560, opCtx=0x7f5eff786600, db="local", mode=<optimized out>, deadline=...) at src/mongo/db/concurrency/lock_manager_defs.h:106
#18 0x00007f5ef44dc569 in mongo::repl::ReplicationCoordinatorExternalStateImpl::<lambda()>::operator()(void) const (__closure=0x7f5ed50756a0) at src/mongo/base/string_data.h:82
#19 0x00007f5ef44dc9b7 in mongo::writeConflictRetry<mongo::repl::ReplicationCoordinatorExternalStateImpl::storeLocalConfigDocument(mongo::OperationContext*, const mongo::BSONObj&, bool)::<lambda()> > (f=..., ns=..., opStr=..., opCtx=0x7f5eff786600) at /opt/mongodbtoolchain/revisions/e5348beb43e147b74a40f4ca5fb05a330ea646cf/stow/gcc-v3.Fno/include/c++/8.2.0/bits/unique_ptr.h:342
#20 mongo::repl::ReplicationCoordinatorExternalStateImpl::storeLocalConfigDocument (this=this@entry=0x7f5ef90cb600, opCtx=<optimized out>, config=owned BSONObj 669 bytes @ 0x7f5efdc47d88 - unprintable or invalid, writeOplog=<optimized out>, writeOplog@entry=false) at src/mongo/db/repl/replication_coordinator_external_state_impl.cpp:585
#21 0x00007f5ef41ed15b in mongo::repl::ReplicationCoordinatorImpl::_heartbeatReconfigStore (this=0x7f5ef9147800, cbd=..., newConfig=...) at /opt/mongodbtoolchain/revisions/e5348beb43e147b74a40f4ca5fb05a330ea646cf/stow/gcc-v3.Fno/include/c++/8.2.0/bits/unique_ptr.h:342
 
Thread 52 (Thread 0x7f5ec4c5d700 (LWP 62832)):
...
#10 mongo::Interruptible::waitForConditionOrInterrupt<std::unique_lock<mongo::latch_detail::Latch>, mongo::future_details::SharedStateBase::wait(mongo::Interruptible*)::{lambda()#1}>(mongo::stdx::condition_variable&, std::unique_lock<mongo::latch_detail::Latch>&, mongo::future_details::SharedStateBase::wait(mongo::Interruptible*)::{lambda()#1}, mongo::AtomicWord<long>*) (waitTimer=0x0, pred=..., m=..., cv=..., this=0x7f5efe5ee980) at src/mongo/util/interruptible.h:466
#11 mongo::future_details::SharedStateBase::wait (this=<optimized out>, interruptible=0x7f5efe5ee980) at src/mongo/util/future_impl.h:402
#12 0x00007f5ef1d592a5 in mongo::future_details::SharedStateHolder<mongo::ReplIndexBuildState::IndexCatalogStats>::waitNoThrow (interruptible=0x7f5efe5ee980, this=<synthetic pointer>) at src/mongo/util/future_impl.h:652
#13 mongo::SharedSemiFuture<mongo::ReplIndexBuildState::IndexCatalogStats>::waitNoThrow (interruptible=0x7f5efe5ee980, this=<synthetic pointer>) at src/mongo/util/future.h:938
#14 mongo::IndexBuildsCoordinator::applyCommitIndexBuild (this=<optimized out>, opCtx=opCtx@entry=0x7f5efe5ee980, oplogEntry=...) at src/mongo/db/index_builds_coordinator.cpp:976

Since the replica set now has a primary, we'd expect the commit point to advance and propagate to the secondary, unblocking the index build. However, because the secondary is stuck while applying oplog entries, its last applied is from a previous term, which means that it cannot advance its commit point via heartbeats. The node also doesn't choose a new sync source for the rest of the test (which is the other way we'd propagate the commit point), although I'm still not sure why (I suspect something is making the primary an ineligible sync source).

Generated at Thu Feb 08 05:22:52 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.