[SERVER-55574] Migration distlock acquisition fails to catch status Created: 26/Mar/21  Updated: 29/Oct/23  Resolved: 28/Jun/21

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 4.9.0
Fix Version/s: 5.0.2, 5.1.0-rc0

Type: Bug Priority: Major - P3
Reporter: Blake Oler Assignee: Kaloian Manassiev
Resolution: Fixed Votes: 0
Labels: bkp, sharding-csrs-stepdown-also, sharding-wfbf-day
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Problem/Incident
is caused by SERVER-53118 Make DistLock resilient to step downs... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v5.0
Sprint: Sharding 2021-04-05, Sharding EMEA 2021-05-03, Sharding EMEA 2021-05-17, Sharding EMEA 2021-05-31, Sharding EMEA 2021-06-14, Sharding EMEA 2021-06-28
Participants:
Linked BF Score: 127

 Description   

During scheduling a migration, we attempt to acquire the local distlock for the namespace. This acquisition isn't exception-safe – in particular, in can throw LockBusy.

We don't attempt to catch this exception (unlike in the next dist-lock acquisition, where we check the status then just return).

As a result of this uncaught exception, the balancer thread will crash, terminating the whole process.
 



 Comments   
Comment by Vivian Ge (Inactive) [ 06/Oct/21 ]

Updating the fixversion since branching activities occurred yesterday. This ticket will be in rc0 when it’s been triggered. For more active release information, please keep an eye on #server-release. Thank you!

Comment by Githook User [ 26/Jul/21 ]

Author:

{'name': 'Kaloian Manassiev', 'email': 'kaloian.manassiev@mongodb.com', 'username': 'kaloianm'}

Message: SERVER-55574 Catch all exceptions during scheduling of migrations

(cherry picked from commit ac39058711acfa3639eac3b237cdd2bb80b43a1c)
Branch: v5.0
https://github.com/mongodb/mongo/commit/82830528103db49176c2a6dc886e941b90c9bbc5

Comment by Githook User [ 28/Jun/21 ]

Author:

{'name': 'Kaloian Manassiev', 'email': 'kaloian.manassiev@mongodb.com', 'username': 'kaloianm'}

Message: SERVER-55574 Catch all exceptions during scheduling of migrations
Branch: master
https://github.com/mongodb/mongo/commit/ac39058711acfa3639eac3b237cdd2bb80b43a1c

Comment by Kaloian Manassiev [ 26/Jun/21 ]

https://mongodbcr.appspot.com/798940001

Comment by Tommaso Tocci [ 27/Mar/21 ]

This bug has been introduced by SERVER-53118 that added the unguarded distLock acquisition call

Comment by Max Hirschhorn [ 26/Mar/21 ]

kaloian.manassiev, tommaso.tocci, is there a more holistic change we could make to the way the Balancer thread handles exceptions? SERVER-53973 is another case where an exception left uncaught by the Balancer thread caused the server to crash.

Generated at Thu Feb 08 05:36:51 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.