-
Type:
Bug
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Storage Execution
-
ALL
-
-
Storage Execution 2025-08-04, Storage Execution 2025-08-18, Storage Execution 2025-09-01
-
200
-
None
-
3
-
TBD
-
None
-
None
-
None
-
None
-
None
-
None
-
None
When a createIndex is a no-op, and this has been detected while attempting to start the index build, as opposed to the pre-start checks, the createIndex command returns an operationTime that is earlier than the commitIndexBuild timestamp (even though it correctly waits for the index build thread to finish).
In general, we have in place a mechanism in the SEP to bump the current operation's opTime to the system's last opTime, when we detect the operation is a write but resulted in a no-op.
This mechanism relies on the current operationTime being different than the opTime before starting, but createIndexes bumps the current operation's opTime in case of failure while running IndexBuildsCoordinator::startIndexBuild, and does so before waiting for the index build thread to finish.
After waiting for the build to finish, the code is structured in such a way that we execute the same function to generate the reply. But the second time we won't execute the code path that bumps the operationTime. Afterwards, when the createIndexes command goes through the SEP code which usually bumps the operationTime in case of no-op, given that the lastOpAfterRun is already different than lastOpBeforeRun, nothing is done. Thus returning an operationTime which predates the commit timestamp of the index.
The above means that waiting for write concern may be done with an incorrect timestamp, and that causally consistent sessions which rely on the operationTime might not work as expected.