[SERVER-70784] Create TTL index for config.sampledQueries and config.sampledQueriesDiff on stepup Created: 23/Oct/22 Updated: 29/Oct/23 Resolved: 25/Jan/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 6.3.0-rc0 |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Cheahuychou Mao | Assignee: | Israel Hsu |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Backwards Compatibility: | Fully Compatible | ||||
| Sprint: | Sharding NYC 2022-11-28, Sharding 2022-12-12, Sharding NYC 2022-12-26, Sharding NYC 2023-01-09, Sharding NYC 2023-01-23, Sharding NYC 2023-02-06 | ||||
| Participants: | |||||
| Linked BF Score: | 22 | ||||
| Description |
|
| Comments |
| Comment by Githook User [ 25/Jan/23 ] |
|
Author: {'name': 'Israel Hsu', 'email': 'israel.hsu@mongodb.com', 'username': 'israelhsu'}Message: |
| Comment by Israel Hsu [ 18/Jan/23 ] |
|
Instead of a javascript test, I implemented unit tests that uses the failCommand failpoint to cause createIndexes to return certain errors such as IndexAlreadyExists and PrimarySteppedDown. |
| Comment by Israel Hsu [ 12/Jan/23 ] |
|
Our case is simpler than what Pierlauro mentioned. I implemented the following: The thread for creating the TTL indexes can retry until success (either the indexes are created or they already exist), or a NotPrimaryError is caught. If the indexes have not been created, the new primary will attempt to create them onStepUpComplete. I'm working on a javascript test that might test this. There is a test in `internal_transactions_reap_service_test.js` called DoesNotReapAsSecondaryAndClearsSessionOnStepdown that manipulates the state of the replica in order to test associated behaviors. |
| Comment by Israel Hsu [ 11/Jan/23 ] |
|
Chou rightly mentions that the TTL index creation that gets asynchronously started onStepUpComplete() might take time, during which the server could step down again. The index-creation thread should be stopped or canceled if this happens – that is, not allowed to retry forever. Pierlauro mentioned that this pattern of execution (starting a thread on step-up and stopping it on step-down) is common and provided some hints to implement this. His comment is copied to the comment in the PR I'm working on implementing this for QueryAnalysisWriter. |
| Comment by Israel Hsu [ 21/Dec/22 ] |
|
@Jack Mulrow advised to check with the replication team about making local writes (in our case, creating the TTL index) in the onStepUpComplete hook. In ReplicationCoordinatorImpl::signalDrainComplete(), onStepUpComplete is invoked before _canAcceptNonLocalWrites is set, so index creation may fail. Possible solutions include using an executor to create the indexes a little bit later and/or retry on failure; or using a lower-level API than DBClient to avoid the _canAcceptNonLocalWrites check. |