[SERVER-77828] listIndexes should report expireAfterSeconds as a float Created: 06/Jun/23 Updated: 11/Jan/24 Resolved: 11/Jan/24 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 7.3.0-rc0 |
| Type: | Bug | Priority: | Minor - P4 |
| Reporter: | Felipe Gasper | Assignee: | Daotang Yang |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Sprint: | Execution NAMR Team 2023-07-24, Execution NAMR Team 2023-08-21, Execution NAMR Team 2023-09-04, Execution Team 2023-12-11, Execution Team 2023-12-25, Execution Team 2024-01-08, Execution Team 2024-01-22 | ||||||||||||||||
| Participants: | |||||||||||||||||
| Linked BF Score: | 135 | ||||||||||||||||
| Description |
|
Per a conversation with Max just now, it seems chunk migration uses `listIndexes` to fetch index stats, which (again, per Max) reports expireAfterSeconds as an int. This value, though, is internally a floating-point, which can trigger rounding problems. This seems to have caused REP-2567, in which the source cluster’s shards built the `x_before_epoch` index with different expireAfterSeconds values. That happened because ttl_index_options.js sends a UNIX timestamp. Assumedly that value, these days, is just high enough that rounding errors will happen, where previously they may not have. |
| Comments |
| Comment by Githook User [ 11/Jan/24 ] | ||||||||||||
|
Author: {'name': 'mongodt', 'email': '146988481+mongodt@users.noreply.github.com', 'username': 'mongodt'}Message: GitOrigin-RevId: ee7b5dbe682a4381d1445b7fdb2dc5f3766dd199 | ||||||||||||
| Comment by Githook User [ 12/Sep/23 ] | ||||||||||||
|
Author: {'name': 'Dan Larkin-York', 'email': 'dan.larkin-york@mongodb.com', 'username': 'dhly-etc'}Message: Revert " This reverts commit 07d91ec750e7ad0942f7bd535606f486a3647118. | ||||||||||||
| Comment by Githook User [ 12/Sep/23 ] | ||||||||||||
|
Author: {'name': 'Dan Larkin-York', 'email': 'dan.larkin-york@mongodb.com', 'username': 'dhly-etc'}Message: Revert " This reverts commit 07d91ec750e7ad0942f7bd535606f486a3647118. | ||||||||||||
| Comment by Githook User [ 25/Aug/23 ] | ||||||||||||
|
Author: {'name': 'Dan Larkin-York', 'email': 'dan.larkin-york@mongodb.com', 'username': 'dhly-etc'}Message: | ||||||||||||
| Comment by Dan Larkin-York [ 02/Aug/23 ] | ||||||||||||
|
max.hirschhorn@mongodb.com I'll be digging into this in the next few weeks. The plan you suggested (coerce on create + upgrade rewrite) sounds like it might be an option, but I'll have to chase down the details to be sure. As to why it cares about the current time value, there's an explanation in the comment with the code you linked. The restriction only applies to clustered collections, to handle the fact that time series collections are clustered by OID, which has a 32-bit unsigned timestamp. It's definitely janky, and could almost certainly be handled another way (e.g. by doing the restriction at TTL-delete time rather than index creation time. I suspect though that changing this doesn't have a great deal of value for most customers, as it would certainly be uncommon to have an expireAfterSeconds value that large. | ||||||||||||
| Comment by Max Hirschhorn [ 02/Aug/23 ] | ||||||||||||
|
dan.larkin-york@mongodb.com, connie.chen@mongodb.com, has a path forward on My expectation for
Separately it would be nice to receive an explanation for why the expireAfterSeconds validation cares about the current Unix epoch time value. It doesn't make immediate sense to me why a create command at time T can fail with InvalidOptions yet at time T+1 can succeed without issue. What am I missing? | ||||||||||||
| Comment by Felipe Gasper [ 09/Jun/23 ] | ||||||||||||
|
Actually, int32 should round-trip safely through an IEEE 754 double:
So, as long as `Math.floor(Date.now() / 1000) + 1000` (from ttl_index_options.js) fits within an int32, it shouldn’t explain the disparity seen in REP-2567. | ||||||||||||
| Comment by Max Hirschhorn [ 06/Jun/23 ] | ||||||||||||
I prefer we change the listIndexes command to return the true value stored in the local mdb catalog and have asked this ticket be assigned to the Storage Execution team for this reason. If we aren't going to have the createIndexes command use the NewIndexSpec struct to coerce the index specification to have an integral expireAfterSeconds then I don't see a reason for the listIndexes command to do the coercion. The implemented behavior is misleading. For example, running a createIndexes command with the output of the listIndexes command can return an ok:0 error response rather than an ok:1 "all indexes already exist" response. |