Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: Querying, Sharding
Labels:
None

Assigned Teams:

Cluster Scalability
Operating System:
ALL
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Observed in mr_output_options.js where coll.remove({}) completed successfully, but then coll.find().itcount() != 0 (with no concurrent inserts). This is in contrast to running coll.remove({}) on a standalone or replica set, or on a sharded cluster in the absence of a concurrent chunk migration.

Sequence of events is:

Start chunk migration from shardA to shardB.
After the range deletion on the recipient (shardB), but before the clone starts, the mongos gets coll.remove({}), and broadcasts it unversioned to both shards.
shardB finishes that deletion quickly. shardB now has 0 documents in coll.
Meanwhile, shardA has started processing the multi-delete, but is working on other documents, not those in the chunk range being moved.
Now the clone of documents from shardA to shardB happens (starts and completes). shardB now has non-zero documents (the contents of the chunk being moved).
The migration enters the critical section to commit, interrupting the multi-delete on shardA with StaleConfig "migration commit in progress for dbname.collname".
The migration gets the final xfermods from the donor's OpObsever inside the critical section, but because the multi-delete on shardA hasn't yet gotten to any of the chunk range documents, there are no mods to apply. The migration finishes normally.
In the meantime, the mongos received StaleConfig from the multi-delete on shardA, so it has resent the multi-delete but only to shardA. It blocks until the critical section exits, then runs normally to successful completion. The mongos multi-delete command now also completes successfully. shardA now has 0 documents, but shardB still has the documents from the migrated chunk.

is related to

SERVER-46211 Chunk migration concurrent with multi-delete can cause matching documents to not be deleted

Closed

Assignee:: [DO NOT USE] Backlog - Cluster Scalability
Reporter:: Kevin Pulo
Participants:: [DO NOT USE] Backlog - Cluster Scalability, Alexey Maltsev, Kevin Pulo
Votes:: 0 Vote for this issue
Watchers:: 10 Start watching this issue

Created:: Apr 06 2020 09:20:11 PM UTC
Updated:: Apr 11 2025 06:27:19 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates