[SERVER-38879] TTL Index-Only Removal Created: 07/Jan/19  Updated: 05/Feb/19  Resolved: 05/Feb/19

Status: Closed
Project: Core Server
Component/s: TTL
Affects Version/s: None
Fix Version/s: None

Type: New Feature Priority: Minor - P4
Reporter: Tommy Sullivan Assignee: Asya Kamsky
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Participants:

 Description   

Today, when the TTL condition is reached, we remove the old records from the collection and the index.

With this feature, a configuration option exists to "remove from index only after expiry".

Benefit to customer - keep history of TTL collection there, while also having fast indexed queries for recent records.



 Comments   
Comment by Asya Kamsky [ 05/Feb/19 ]

Closing this as this use case can be handled via partial index - the server does not have another mechanism to remove only a subset of data from an index and TTL feature is intended to affect the documents in the collection, not their index entries.

Comment by Tommy Sullivan [ 09/Jan/19 ]

Right. Its cool for us to do something like aging it out by updating the predicate field. I just think it could be a powerful feature for u guys to do that and u may have most of it done already!

Dynamic expressions in partial queries would be really powerful but may come with some complexities depending on how elaborate the expressions can be! So if that did get released it would probably obviate the need for this feature; but this feature is probably much simpler to release because of its limited complexity (the condition is always just a date aging out, as opposed to more complex expressions)

I think it would enable your sales guys and gals to make a stronger case for stream-like use cases especially "durable streams"

Comment by Asya Kamsky [ 08/Jan/19 ]

> instead we have a boolean field "expired" as our partial index predicate

This is exactly what we recommend currently - have a job that "ages out" documents by updating a field ("active") and then partial index can have the predicate active:true  I thought there was already a server ticket to support more "dynamic" type of filter on partial index I may have been thinking of SERVER-14784 which is for expression indexes (which is a bit different).

Comment by Tommy Sullivan [ 08/Jan/19 ]

by the way, for context, we are just summing up a bunch of numbers, but doing it only for the recent couple weeks of data, so we want those to be really quickly computable, and so we want them indexed, but since we have years of data in that collection, we don't need everything in the index, slowing it down, so we really want an index with a moving window of stuff in it. TTL is like a collection with a moving window of stuff in it that is controlled by an index; so we just want a TTL that leaves the collection and only has a moving window on the index... 

 

come to think of it you all may be indexing based on everything in the collection and so that may ultimately be why u delete from the collection (so that the index needn't keep the records)... depending how yall did it the feature may be easier or tougher to implement based on existing TTL implementation

Comment by Tommy Sullivan [ 08/Jan/19 ]

from a business perspective u could sell this feature as a way to model durable streams for companies that are already on mongo and thinking of switching to a durable stream technology; instead they could speedily read from recent records but also get to older records without any copying or reindexing or jobs or all that and i don't know how much it would involve in terms of technical changes on mongo side. hope that helps. thanks again appreciate it.

Comment by Tommy Sullivan [ 08/Jan/19 ]

Asya,

Thank you kindly for your reply. Partial Index, from what I understand, requires that we define a predicate condition. For us, the logical predicate would be something like "document createdAt date field is within the last 2 weeks".

The thing is, that the meaning of "within the last two weeks" changes over time. As I understand it, we cannot put a dynamic date expression into our partial index predicate.

Of course, we have means by which to work around this.

We could create a partial index today, let that work for a while, and then periodically rebuild the index with an updated date. But that would be taxing on our mongo cluster and impact users. It would also be a little clunky. We may indeed do that if need be but trying to avoid it.

We could also create a partial index whose predicate is not dependent on anything dynamic, such as "today's date". Perhaps instead we have a boolean field "expired" as our partial index predicate. Then, code in our application could regularly calculate the age of the document and update the field at the appropriate time, causing mongo to take it out of the partial index. Unfortunately, in this solution we have to do a bunch of computation on a schedule just to really update a date counter in order to TTL something. As you can see, this invites us to consider whether the existing TTL feature could be modified to achieve this objective, since it seems to do very close to what we need already.

The TTL solution actually has exactly what we want, if only it would just do slightly less than what it already does. If it did everything it already does, but then when it decides an item has expired, it removes it from the index and not from the collection. That perfectly solves the use case.

Let me know if that is a reasonable and correct line of thought

thanks

 

 

Comment by Asya Kamsky [ 08/Jan/19 ]

tommyatclassdojo what you are describing sounds like the partial index feature - you're talking about indexing only a subset of (unexpired) documents?

Generated at Thu Feb 08 04:50:19 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.