[SERVER-39756] Sharding a very large collection can result in a long stall of writes against this collection Created: 22/Feb/19  Updated: 29/Oct/23  Resolved: 16/May/19

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 4.0.6
Fix Version/s: 4.1.12, 4.0.11

Type: Bug Priority: Major - P3
Reporter: Kaloian Manassiev Assignee: Blake Oler
Resolution: Fixed Votes: 0
Labels: sharding-wfbf-day
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Related
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.0
Sprint: Sharding 2019-04-08, Sharding 2019-04-22, Sharding 2019-05-06, Sharding 2019-05-20
Participants:

 Description   

Starting in 4.0.6, with the initial collection splits (when sharding a collection) being driven by the database primary, the shard takes the critical section for the collection, which blocks writes. If the collection is not empty, then the shard will call splitVector in order to determine the initial split points for the collection.

For very large collections, the splitVector call can take very long time, during which the collection will be unavailable for writes.

In order to improve this, we should rearrange the code so that the splitVector operation is run outside of the critical section.



 Comments   
Comment by Githook User [ 11/Jun/19 ]

Author:

{'name': 'Blake Oler', 'email': 'blake.oler@mongodb.com', 'username': 'BlakeIsBlake'}

Message: SERVER-39756 Rearrange shardCollection code to allow critical section to be exited earlier

(cherry picked from commit 2c3bfbb890c76f21e1531ea8af81dccd034b37cc)
Branch: v4.0
https://github.com/mongodb/mongo/commit/07958f6b07f91b55781cbde694e82ca4e575f38e

Comment by Githook User [ 16/May/19 ]

Author:

{'email': 'blake.oler@mongodb.com', 'name': 'Blake Oler', 'username': 'BlakeIsBlake'}

Message: SERVER-39756 Rearrange shardCollection code to allow critical section to be exited earlier
Branch: master
https://github.com/mongodb/mongo/commit/2c3bfbb890c76f21e1531ea8af81dccd034b37cc

Comment by Kaloian Manassiev [ 13/May/19 ]

Most likely we will do a backport, but it is dependent on the complexity of the fix.

Comment by Gregory McKeon (Inactive) [ 08/May/19 ]

blake.oler does this need a 4.0 backport as well?

Generated at Thu Feb 08 04:53:01 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.