-
Type:
Bug
-
Status: Closed
-
Priority:
Major - P3
-
Resolution: Fixed
-
Affects Version/s: 3.2.13, 3.4.4, 3.5.7
-
Component/s: Sharding
-
Labels:None
-
Backwards Compatibility:Fully Compatible
-
Operating System:ALL
-
Backport Completed:
-
Backport Requested:v3.4
-
Steps To Reproduce:
-
Sprint:Sharding 2017-05-29
We have a sharded cluster. One of our primaries had several queued up RangeDeletes from chunks being moved off due to chunk migration. Typically the log shows the following for deleting a chunk after the migration of the chunk to a new primary:
1. Deleter starting delete for: <namespace> from {<begin-range-of-chunk>} -> {<end-range-of-chunk>}, with opId: xxxxxxxx
|
2. Some time later...Helpers::removeRangeUnlocked time spent waiting for replication: x ms
|
3. rangeDeleter deleted n documents for <namespace> from {<begin-range-of-chunk>} -> {<end-range-of-chunk>}
|
However, occasionally we see:
1. Deleter starting delete for: ... (normal log statement as above)
|
2. some time later... Error encountered while trying to delete range: Error encountered while deleting range: ns<namespace> from {<begin-range-of-chunk>} -> {<end-range-of-chunk>}, cause by: :: caused by :: 112 WriteConflict
|
3. No further log statements by the RangeDeleter for the specified chunk range that experienced a write conflict.
|
I can only assume that the Write Conflict was not handled properly, and the documents were never successfully deleted??