[SERVER-77831] [test-only bug] CheckRoutingTableConsistency may be executing while sessions collection is being sharded Created: 06/Jun/23 Updated: 29/Oct/23 Resolved: 01/Sep/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 7.2.0-rc0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Pierlauro Sciarelli | Assignee: | Pierlauro Sciarelli |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | car-71-backport-declined, shardingemea-qw | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Assigned Teams: |
Sharding EMEA
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Operating System: | ALL | ||||||||
| Sprint: | Sharding EMEA 2023-06-26, Sharding EMEA 2023-07-10, Sharding EMEA 2023-08-21, Sharding EMEA 2023-09-04 | ||||||||
| Participants: | |||||||||
| Linked BF Score: | 5 | ||||||||
| Story Points: | 2 | ||||||||
| Description |
|
The CheckRoutingTableConsistency hook can sporadically fail when it is executed exactly while the sessions collection is being sharded. The logicalSessionRefreshMillis parameter (defaulted to 5 minutes) is driving the logical session cache refresh that - when no sessions have been used during testing - basically spawns the creation+sharding of the sessions collection. Since the refresh is asynchronous, it can totally happen for it to overlap with the execution of teardown hooks. The failing flow is the following, happening more or less 5 minutes after the sharded cluster has been spawned for testing:
|
| Comments |
| Comment by Pierlauro Sciarelli [ 01/Sep/23 ] |
|
Closing as "Gone away" because |
| Comment by Pierlauro Sciarelli [ 30/Aug/23 ] |
|
There is no compelling reason for insertChunks not to happen in the transaction inserting collection and placement history entries.
|
| Comment by Pierlauro Sciarelli [ 06/Jun/23 ] |
Correct, this is not possible for other collections because config.system.sessions is the only one that can be sharded "by the system" without a client requesting it. That's why it can run during teardown (after the test finished but before shutting down). I believe a possible solution could be to transactionally insert collection and chunks entries. We may even consider doing it for all collections considering we never have to shard with "too many" chunks after |
| Comment by Max Hirschhorn [ 06/Jun/23 ] |
|
Is a similar spurious failure not possible for other collections as they are in the midst of being sharded? If so, then what is making the config.system.sessions collection special in how it becomes sharded? |