-
Type:
Bug
-
Resolution: Fixed
-
Priority:
Major - P3
-
Affects Version/s: 7.0.0, 8.1.0-rc0, 8.0.0
-
Component/s: None
-
None
-
Cluster Scalability
-
Fully Compatible
-
ALL
-
v8.0, v7.0
In the function where we perform QueryAnalysisWriter::_flush, it looks like the logic to update the invalid set with the index of the bad doc is incorrect, this can result in a valid, invalid or garbage index value added. This stems from the incorrect update of baseIndex value. Scenario is explained below:
Consider 6 docs in the buffer with maxBatchSize of 2:
[D0 (throws BSONObjectTooLarge when inserting), D1 (Duplicate of D4), D2, D3 (Duplicate of D4), D4, D5]
Init
lastIndex = 6
baseIndex = 5
Iteration 1:
docsToInsert: [D5, D4] (We read from back of the buffer)
tmpBuffer: [D0, D1, D2, D3]
lastIndex = 4
baseIndex = 1
Iteration 2:
docsToInsert: [D3, D2]
tmpBuffer: [D0, D1]
lastIndex = 2
invalid.insert(baseIndex - err.getIndex()) = invalid.insert(1-0) => invalid = {1}
baseIndex = 1-4 = 18446744073709551615 (unsigned long)
Iteration 3:
docsToInsert: [D1, D0]
lastIndex = 0
tmpBuffer: [D1] => This document is added back to the buffer which was a duplicate ID.
Reproducer is attached.