[SERVER-76829] Calling `setAllowMigrations` with a session while the balancer is enabled may result in deadlock Created: 04/May/23  Updated: 04/May/23  Resolved: 04/May/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: Pierlauro Sciarelli Assignee: Pierlauro Sciarelli
Resolution: Won't Do Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Operating System: ALL
Sprint: Sharding EMEA 2023-05-15
Participants:
Linked BF Score: 110

 Description   

SERVER-73539 introduced replay protection when calling stopMigrations from the rename coordinator.

The flow of setAllowMigrations is the following:

  1. Shard (DDL coordinator) asks the CSRS to stop migrations
  2. CSRS updates the collection's metadata and then asks every shard to:
    1. Abort current migrations
    2. Refresh the routing table

Step (1) is performed with a session checked out and may get executed while a migration for one of the collections involved in the rename is happening on the same node the DDL coordinator runs on. In that case, the lsid will get propagated down until step (2.1) that will get stuck trying to checkout the same session.



 Comments   
Comment by Pierlauro Sciarelli [ 04/May/23 ]

It actually turns out that the underlying issue is SERVER-76720 , so closing as "incorrect"

Generated at Thu Feb 08 06:33:44 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.