-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: 6.0.0, 7.0.0, 7.3.0, 8.1.0-rc0, 8.0.0-rc10
-
Component/s: None
-
Catalog and Routing
-
Fully Compatible
-
ALL
-
v8.0, v7.0, v6.0, v5.0
-
CAR Team 2024-09-16, CAR Team 2024-09-30
-
0
-
1
Context
The test does the following:
- create an index with ttl on 20s (which will delete all the documens after 20s)
- Inserts 100 docs on shard0
- runs a moveChunk from shard0 to shard1 (which will make the 100 docs orphans on shard0)
- Waits for the ttl to expire
- Verifies the orphans are not delete by the ttl
Problem
In case the moveChunk take longer then 20s (which can happen on slow variants), the shard0 won't have any orphan (the ttl index has deleted the documents already, before the move could complete). The final check will therefore fail.
The test fails non deterministically because we rely on the moveChunk to complete before the 20s expire.
Refactoring proposal
The test would be determinist if the createIndex runs after the moveChunk, that would guarantee to have orphans.
Currently the test is relying only on 1 chunk that moves the entire collection from shard0 to shard1. However, if the test creates 2 chunks, and moves only 1 to shard1, then the shard0 would still have 1 chunk + some orphans:
- Leaving 1 chunks ensures the createIndex creates a ttl index on shard0
- Having orphans for the same collections guarantees the ttl delete is attempted and not performed on orphans.