[SERVER-46032] needsRefresh flag is not checked on catalogue cache's onStaleShardVersion Created: 07/Feb/20  Updated: 29/Oct/23  Resolved: 26/Mar/20

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 4.4.0-rc0, 4.7.0

Type: Bug Priority: Major - P3
Reporter: Marcos José Grillo Ramirez Assignee: Blake Oler
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.4
Sprint: Sharding 2020-03-09, Sharding 2020-03-23, Sharding 2020-04-06
Participants:
Linked BF Score: 9

 Description   

There is a check on catalogue cache that sets as needRefresh the RoutingInfoCache of a collection.

Before setting the flag, the function makes sure that the collection haven't been dropped, however this is not enough. We could've marked the cache as needsRefresh from another thread, which causes the server to crash as we have seen on some build failures.

Besides checking if the collection was dropped, we should check if the needsRefresh flag was already set.



 Comments   
Comment by Githook User [ 25/Mar/20 ]

Author:

{'email': 'blake.oler@mongodb.com', 'name': 'Blake Oler', 'username': 'BlakeIsBlake'}

Message: SERVER-46032 Check if collection has stale epoch before setting shard stale

(cherry picked from commit c02679b143b4635ca0ed5fd542520c3e594fe204)
Branch: v4.4
https://github.com/mongodb/mongo/commit/2503e0fc03e910eddc097e1d35fbf8f325e584d2

Comment by Githook User [ 23/Mar/20 ]

Author:

{'name': 'Blake Oler', 'username': 'BlakeIsBlake', 'email': 'blake.oler@mongodb.com'}

Message: SERVER-46032 Check if collection has stale epoch before setting shard stale
Branch: master
https://github.com/mongodb/mongo/commit/c02679b143b4635ca0ed5fd542520c3e594fe204

Comment by Blake Oler [ 10/Feb/20 ]

Ah, so we've here confirmed that the extra branch where we check if needsRefresh is already true was not in fact redundant. Since needsRefresh is gone, my initial hunch is that checking needsFullRefresh before marking shards as stale in onStaleShardVersion will solve this problem. I will poke around some more and update the ticket if necessary.

Comment by Esha Maharishi (Inactive) [ 10/Feb/20 ]

CC blake.oler

Generated at Thu Feb 08 05:10:19 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.