Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 8.1.0-rc0, 8.0.5, 7.0.17
Component/s: Sharding
Labels:
None

Assigned Teams:

Catalog and Routing
Operating System:
ALL
Steps To Reproduce:

Hide

1. Apply the attached patch.
2. Run it with buildscripts/resmoke.py run --suites=sharding jstests/sharding/inconsistent_indexes_fail_migrations.js

For the create Indexes reproduction:
1. Apply the attached patch
2. Run it with buildscripts/resmoke.py run --suites=sharding jstests/sharding/inconsistent_indexes_fail_migrations_create.js

Show
1. Apply the attached patch. 2. Run it with buildscripts/resmoke.py run --suites=sharding jstests/sharding/inconsistent_indexes_fail_migrations.js For the create Indexes reproduction: 1. Apply the attached patch 2. Run it with buildscripts/resmoke.py run --suites=sharding jstests/sharding/inconsistent_indexes_fail_migrations_create.js
Linked BF Score:
0
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Currently a dropIndex command that runs concurrently with a migration aborts it when commiting the drop. This works in the majority of scenarios, however, considering that the drop indexes command uses a shard version retry loop to send the command to all data bearing shards, the following operation interleaving might happen:

A drop indexes is received in a router, it reaches the primary db shard, and then send the commands throughout the cluster to the data bearing shards
A migration starts and reaches the point where there is a source and a destination manager instantiated
The drop index is received and executed in the recipient shard
The migration destination manager copies the indexes from the source shard
The drop index is received and executed in the source shard, aborting the migration

Generating an index inconsistency in the cluster. If every shard had a chunk before this happens, and, there were documents in the source, but no documents in the recipient, then any subsequent migration going the other way (as in, from the former destination shard to the former source shard) will fail, because the former source shard (now a destination shard in this example) will find that there are documents but inconsistent indexes.

This same situation can also happen with createIndexes:

A create index is received in the router
A migration starts and reaches the point where there is a source and a destination manager instantiated
The create index is executed in the recipient shard
The migration destination manager copies the indexes from the source shard
The migration concludes succesfully
The create index is finished in the source shard

In the field, a customer might not be able to drain a shard being removed using removeShard and by extenstion might not finish a transition to dedicated config server considering transitionToDedicatedConfigServer uses the remove shard machinery. Additionally, a mongosync would fail due to the index inconsistency.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

SERVER_99357_repro.patch
Aug 18 2025 06:03:41 PM UTC
8 kB
Marcos José Grillo Ramirez

Assignee:: Unassigned
Reporter:: Marcos José Grillo Ramirez
Participants:: Marcos José Grillo Ramirez
Votes:: 0 Vote for this issue
Watchers:: 6 Start watching this issue

Created:: Jan 14 2025 02:38:33 PM UTC
Updated:: Aug 18 2025 06:05:14 PM UTC

Details

Description

Attachments

Attachments

Activity

People

Dates