[SERVER-51306] Asynchronous operations spawned during lookup should not assume that the CatalogCache is alive Created: 02/Oct/20  Updated: 29/Oct/23  Resolved: 08/Oct/20

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 4.9.0

Type: Bug Priority: Major - P3
Reporter: Sergi Mateo Bellido Assignee: Alexander Taskov (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

1. Add a sleep of one second before this line

2. Run 'build/install/bin/db_s_shard_server_test --filter TestGet'

 

Sprint: Sharding 2020-10-19
Participants:
Linked BF Score: 45

 Description   

When we catch an ErrorCodes::ShardInvalidatedForTargeting in here, we end up executing the getCollectionRoutingInfoWithRefresh function. Part of the work of this function is done asynchronously and it doesn't check wether the CatalogCache is still alive.

The failure happened when executing the resharding_destined_recipient_test.cpp unit tests. More concretely, the data race was between these two points: tearDown and [assertThrow

https://github.com/mongodb/mongo/blob/45637f4d481c8badd6d5a2d95dcb8ae947c78c92/src/mongo/db/s/resharding_destined_recipient_test.cpp#L266]

We should change the test, waiting until all the asynchronous work was completed before tearing down everything.
 



 Comments   
Comment by Alexander Taskov (Inactive) [ 08/Oct/20 ]

Yes! I'll go through them now.

Comment by Kelly Lewis [ 08/Oct/20 ]

Hi alex.taskov, with this complete, are you able to close the 3 BFs depending on this ticket?

Comment by Githook User [ 08/Oct/20 ]

Author:

{'name': 'Alex Taskov', 'email': 'alex.taskov@mongodb.com', 'username': 'alextaskov'}

Message: SERVER-51306 Wait for refresh to complete before test exits
Branch: master
https://github.com/mongodb/mongo/commit/bf6db0deab2e700390f3ec3bf11d04eb2f68cb4b

Comment by Kaloian Manassiev [ 07/Oct/20 ]

alex.taskov, assigning it to you since these tests came from this commit.

Comment by Gregory Wlodarek [ 07/Oct/20 ]

Can we prioritize this fix? It's causing commit queue merges to fail.

Generated at Thu Feb 08 05:25:07 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.