[SERVER-51834] Race in moveChunk tests Created: 26/Oct/20  Updated: 29/Oct/23  Resolved: 17/Nov/20

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 4.9.0, 4.4.3

Type: Bug Priority: Major - P3
Reporter: Misha Tyulenev Assignee: Sergi Mateo Bellido
Resolution: Fixed Votes: 0
Labels: sharding-wfbf-day
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.4
Participants:
Linked BF Score: 11

 Description   

Sequential moveChunk commands sent to a host can result in the "Unable to start new migration because this shard is currently donating chunk"
For example this
and
this tests
have shown this error.

The issue is raised in the moveChunk code



 Comments   
Comment by Githook User [ 19/Nov/20 ]

Author:

{'name': 'Sergi Mateo Bellido', 'email': 'sergi.mateo-bellido@mongodb.com', 'username': 'smateo'}

Message: SERVER-51834 Race in moveChunk tests
Branch: v4.4
https://github.com/mongodb/mongo/commit/5bcf61fcaadc7f1385c21d21e37b85e5d4c46eea

Comment by Githook User [ 17/Nov/20 ]

Author:

{'name': 'Sergi Mateo Bellido', 'email': 'sergi.mateo-bellido@mongodb.com', 'username': 'smateo'}

Message: SERVER-51834 Race in moveChunk tests
Branch: master
https://github.com/mongodb/mongo/commit/9b0e366a75a9cc25705969932b3374d21d4d13c9

Comment by Sergi Mateo Bellido [ 16/Nov/20 ]

The issue is that this code wrongly assumes that when the future is ready the objects captured in the functor were already destroyed.

Comment by Sergi Mateo Bellido [ 16/Nov/20 ]

I managed to deterministically reproduce this issue after adding an sleep of a few seconds inside this if-stmt.

Generated at Thu Feb 08 05:26:34 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.