Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 4.4.0-rc0, 4.7.0
Affects Version/s: None
Component/s: Querying, Sharding
Labels:
None

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Backport Requested:

v4.4
Linked BF Score:
20
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Observed in mr_output_options.js where coll.remove({}) completed successfully, but then coll.find().itcount() != 0 (with no concurrent inserts). This is in contrast to running coll.remove({}) on a standalone or replica set, or on a sharded cluster in the absence of a concurrent chunk migration.

Sequence of events is:

Start chunk migration from shardA to shardB.
After the range deletion on the recipient (shardB), but before the clone starts, the mongos gets coll.remove({}), and broadcasts it unversioned to both shards.
shardB finishes that deletion quickly. shardB now has 0 documents in coll.
Meanwhile, shardA has started processing the multi-delete, but is working on other documents, not those in the chunk range being moved.
Now the clone of documents from shardA to shardB happens (starts and completes). shardB now has non-zero documents (the contents of the chunk being moved).
The migration enters the critical section to commit, interrupting the multi-delete on shardA with StaleConfig "migration commit in progress for dbname.collname".
The migration gets the final xfermods from the donor's OpObsever inside the critical section, but because the multi-delete on shardA hasn't yet gotten to any of the chunk range documents, there are no mods to apply. The migration finishes normally.
In the meantime, the mongos received StaleConfig from the multi-delete on shardA, so it has resent the multi-delete but only to shardA. It blocks until the critical section exits, then runs normally to successful completion. The mongos multi-delete command now also completes successfully. shardA now has 0 documents, but shardB still has the documents from the migrated chunk.

related to

SERVER-47371 Chunk migration concurrent with multi-delete can cause matching documents to not be deleted

Backlog

Assignee:: Randolph Tan
Reporter:: Kevin Pulo
Participants:: Githook User, Kevin Pulo, Max Hirschhorn, Randolph Tan
Votes:: 0 Vote for this issue
Watchers:: 6 Start watching this issue

Created:: Feb 17 2020 01:39:31 AM UTC
Updated:: Oct 29 2023 10:12:06 PM UTC
Resolved:: Apr 06 2020 09:26:05 PM UTC
Confidence Status Last Update:: 17/Mar/20 8:45 PM

Details

Description

Attachments

Issue Links

Activity

People

Dates