Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 8.1.0-rc0, 8.0.5, 7.0.17
Component/s: Sharding
Labels:
None

Assigned Teams:

Catalog and Routing
Operating System:
ALL
Steps To Reproduce:

Hide

1. Apply the attached patch.
2. Run it with buildscripts/resmoke.py run --suites=sharding jstests/sharding/inconsistent_indexes_fail_migrations.js

Show
1. Apply the attached patch. 2. Run it with buildscripts/resmoke.py run --suites=sharding jstests/sharding/inconsistent_indexes_fail_migrations.js
Linked BF Score:
0
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Currently a dropIndex command that runs concurrently with a migration aborts it when commiting the drop. This works in the majority of scenarios, however, considering that the drop indexes command currently uses a shard version retry loop to send the command to all data bearing shards, the following operation interleaving might happen:

A drop indexes is received in a router, it reaches the primary db shard, and then send the commands throughout the cluster to the data bearing shards
A migration starts and reaches the point where there is a source and a destination manager instantiated
The drop index is received and executed in the recipient shard
The migration destination manager copies the indexes from the source shard
The drop index is received and executed in the source shard, aborting the migration

Generating an index inconsistency in the cluster. If every shard had a chunk before this happens, and, there were documents in the source, but no documents in the recipient, then any subsequent migration going the other way (as in, from the former destination shard to the former source shard) will fail, because the former source shard (now a destination shard in this example) will find that there are documents but inconsistent indexes.

In the field, a customer might not be able to drain a shard being removed using removeShard and by extenstion might not finish a transition to dedicated config server considering transitionToDedicatedConfigServer uses the remove shard machinery.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

BF-36185-repro.patch
4 kB
Jan 14 2025 02:38:28 PM UTC

Assignee:: Unassigned
Reporter:: Marcos José Grillo Ramirez
Participants:: Marcos José Grillo Ramirez
Votes:: 0 Vote for this issue
Watchers:: 5 Start watching this issue

Created:: Jan 14 2025 02:38:33 PM UTC
Updated:: Jan 22 2025 07:11:56 PM UTC

Details

Description

Attachments

Attachments

Activity

People

Dates