[SERVER-31428] Poor performance when many concurrent ops refresh sharding metadata Created: 05/Oct/17  Updated: 30/Oct/23  Resolved: 20/Oct/17

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.4.9, 3.6.0-rc0
Fix Version/s: 3.4.10, 3.6.0-rc1

Type: Bug Priority: Major - P3
Reporter: Kaloian Manassiev Assignee: Kevin Pulo
Resolution: Fixed Votes: 1
Labels: SWCW
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Related
related to SERVER-31595 Generate shardMaps outside MODE_X col... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v3.4
Participants:
Case:

 Description   

Consider a shard node, which just started and/or became primary and does not have any sharding metadata cached.

If many threads running sharded operations (i.e., operations containing a non-UNSHARDED version) arrive at the same time, all these threads will get StaleConfigException and will enter the refresh code here. From these threads, only one will do the refresh from the config server, but all of them will eventually call this line, which will do nothing if the metadata is already fresh, but in the end all these threads will acquire the collection X-lock and cause stalls on an already overloaded server.

In addition, all threads will redundantly process the new metadata.

The complete solution to fix this would be to serialize collection refreshes on the shard, outside of the synchronization already happening through the catalog cache.

A quick solution to the MODE_X aspect would be to add a check (under collection IS lock) just before the X lock is acquired to re-check that the version obtained from the CatalogCache is not different and skip acquiring the X-lock in this case.



 Comments   
Comment by Githook User [ 20/Oct/17 ]

Author:

{'email': 'kevin.pulo@mongodb.com', 'name': 'Kevin Pulo', 'username': 'devkev'}

Message: SERVER-31428 avoid redundant concurrent generation of new chunkMaps
Branch: master
https://github.com/mongodb/mongo/commit/f26be76fa76156fa13878c7eedb1cc53e79fd029

Comment by Githook User [ 18/Oct/17 ]

Author:

{'email': 'kevin.pulo@mongodb.com', 'name': 'Kevin Pulo', 'username': 'devkev'}

Message: SERVER-31428 correctly check collVersion epoch when short-circuiting metadata refresh
Branch: v3.4
https://github.com/mongodb/mongo/commit/998863e3f1af804063a2f6a0c1633b6df40a3350

Comment by Githook User [ 18/Oct/17 ]

Author:

{'email': 'kevin.pulo@mongodb.com', 'name': 'Kevin Pulo', 'username': 'devkev'}

Message: SERVER-31428 avoid redundant concurrent generation of new chunkMaps
Branch: v3.4
https://github.com/mongodb/mongo/commit/c00aaae0843f14296b2c08411475f62847737ae7

Generated at Thu Feb 08 04:27:01 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.