Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 5.0.0-rc0
Affects Version/s: None
Component/s: Sharding
Labels:
- PM-1965-Milestone-1

Backwards Compatibility:
Fully Compatible
Sprint:
Sharding 2021-04-05, Sharding EMEA 2021-05-03
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

If we are creating a collection the following scenario might happen:

We start sharding an empty collection with a hashed key index
There is a stepdown
A write sneaks in before resuming the coordinator
The new primary starts the phase two, but we cannot shard the collection with a hashed index that already has data so the shard collection fails

In order to ensure there is no leftover data at command termination, a drop index was added to the create collection path that will be executed only if the index was not already on the collection and we are recovering from a step down, however, this might generate the following scenario:

Imagine we have a 10TB collection and we try to shard the collection:

The request would block for 15 min in the critical section to create an index
The server is overwhelmed with parked network requests which block on the CS and crashes
The server comes back up and resumes shard collection, same thing happens over and over, because we drop the index and then recreate it again.

We must remove this drop index operation, in order to do so, we could, for example, use the same approach of resharding, by preventing writes on the collections on step up.

causes

SERVER-56342 Throw an exception if the update on config.collections of the shardCollection operation doesn't suceed

Closed

depends on

SERVER-54587 Make create collection resilient to stepdowns

Closed

SERVER-55494 Retake the collection critical section on step up (in drain mode)

Closed

is depended on by

SERVER-55969 Stop checking for DisableIncompleteShardingDDLSupport future flag in create collection

Closed

is duplicated by

SERVER-55485 Add create collection parameters when reporting for current op

Closed

Assignee:: Marcos José Grillo Ramirez
Reporter:: Marcos José Grillo Ramirez
Participants:: Githook User, Marcos José Grillo Ramirez
Votes:: 0 Vote for this issue
Watchers:: 1 Start watching this issue

Created:: Mar 26 2021 12:24:18 PM UTC
Updated:: Oct 29 2023 09:55:43 PM UTC
Resolved:: Apr 19 2021 05:28:51 PM UTC
Confidence Status Last Update:: 29/Mar/21 4:02 PM

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates