[SERVER-46558] Bgsync stops all index builds even before transitioning to rollback state and causes secondary replication to hang Created: 03/Mar/20 Updated: 29/Oct/23 Resolved: 18/Mar/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Storage |
| Affects Version/s: | None |
| Fix Version/s: | 4.4.0-rc0, 4.7.0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Suganthi Mani | Assignee: | Louis Williams |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||
| Backport Requested: |
v4.4
|
||||||||||||||||||||||||
| Sprint: | Execution Team 2020-03-23 | ||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Linked BF Score: | 24 | ||||||||||||||||||||||||
| Description |
|
Since bgsync aborts the index build even before transitioning to rollback state, side effect of that is really bad, as the node is still eligible to run election and become primary. One notable consequence of that behavior is that, consider a case where we have 3 node replica set. (node A is the primary and node B secondary1 and node C is secondary2) and the thread pool size is 1. 1) node A (primary for term 10) starts the index Build 'x_1', uses indexbuildCoordinator thread pool and generates startIndexBuild oplog entries to both secondaries. As a result of step 5, node A will no longer run the real rollback step. This is because, on node A becoming primary, it stops the oplog fetcher service, so this check or [this|https://github.com/mongodb/mongo/blob/17984db6c531594c00bf226804d9ab7ed6225643/src/mongo/db/repl/rollback_impl.cpp#L190 check might fails making the node not to rollback any oplog entries. Problems: Solution: We should abort index build only when the node transitioned its state to rollback and we are sure that the entries are going to get rolled back. And, it applies to both rollback via recoverToStableTimestamp and rollback via refetch. P.S: I noticed this failure frequently in my patch build. And, currently, since the index build is generating high volumes of timeout error. The BF stating this issue is lost. |
| Comments |
| Comment by Githook User [ 18/Mar/20 ] |
|
Author: {'name': 'Louis Williams', 'username': 'louiswilliams', 'email': 'louis.williams@mongodb.com'}Message: (cherry picked from commit 03dc2fefa0fb1c77d2caeb6dd166166276bc5b15) |
| Comment by Githook User [ 18/Mar/20 ] |
|
Author: {'email': 'louis.williams@mongodb.com', 'name': 'Louis Williams', 'username': 'louiswilliams'}Message: |