[DRIVERS-1492] Clustered indexes for Time-series collections Created: 08/Jan/21  Updated: 15/Feb/23  Resolved: 16/Feb/21

Status: Closed
Project: Drivers
Component/s: None
Fix Version/s: None

Type: Epic Priority: Major - P3
Reporter: Backlog - Core Eng Program Management Team Assignee: Unassigned
Resolution: Won't Do Votes: 0
Labels: init-53-m1.0
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on CDRIVER-3898 Clustered indexes for Time-series col... Closed
depends on CSHARP-3401 Clustered indexes for Time-series col... Closed
depends on CXX-2189 Clustered indexes for Time-series col... Closed
depends on GODRIVER-1872 Clustered indexes for Time-series col... Closed
depends on JAVA-3998 Clustered indexes for Time-series col... Closed
depends on MOTOR-670 Clustered indexes for Time-series col... Closed
depends on NODE-3096 Clustered indexes for Time-series col... Closed
depends on PHPC-1767 Clustered indexes for Time-series col... Closed
depends on PYTHON-2558 Clustered indexes for Time-series col... Closed
depends on RUBY-2530 Clustered indexes for Time-series col... Closed
depends on RUST-667 Clustered indexes for Time-series col... Closed
Initiative
Driver Changes: Needed
Server Compat: 5.0
Quarter: FY22Q2
Driver Compliance:
Key Status/Resolution FixVersion
CDRIVER-3898 Won't Do
CXX-2189 Won't Do
CSHARP-3401 Won't Do
GODRIVER-1872 Won't Do
JAVA-3998 Won't Do
NODE-3096 Won't Do
MOTOR-670 Duplicate
PYTHON-2558 Won't Do
PHPC-1767 Won't Do
RUBY-2530 Won't Do
RUST-667 Won't Do
SWIFT-1109 Won't Do

 Description   
Downstream Change Summary

There may be downstream impacts for this project. We will update with potential impacts after we move to design.

Description of Linked Ticket

Epic Summary

Make the RecordStore for a collection a mapping from _id key to BSON document, instead of a mapping from RecordId to BSON document. This will then allow us to remove the separate _id index.

Motivation

  • Queries that currently use the _id index will only need to do one read instead of two.
  • Inserts and deletes also have one less index to update.
  • Range queries on _id will be able to use a collection scan rather than index scan with fetch stage. This allows for far more efficient sequential storage access where random order access is required now.
  • Range removes on _id allow for efficient truncations that avoid reading data to be removed if there are no other indexes.
  • Significantly speed up chunk migrations on sharded clusters.
  • Improve usability of MongoDB for timeseries data.

Doing a smaller internal-only project first, that leaves full support for sharded collections with shard keys other than _id as future work, helps evaluating performance gains before committing to implementing the more expensive future work, and allows answering design questions for that work.



 Comments   
Comment by Esha Bhargava [ 16/Feb/21 ]

No driver changes needed.

Generated at Thu Feb 08 08:23:29 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.