[SERVER-211] TTL collections Created: 05/Aug/09  Updated: 12/Jul/16  Resolved: 27/May/12

Status: Closed
Project: Core Server
Component/s: Usability
Affects Version/s: None
Fix Version/s: 2.1.2

Type: New Feature Priority: Major - P3
Reporter: Michael Dirolf Assignee: Eliot Horowitz (Inactive)
Resolution: Done Votes: 144
Labels: rn
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
related to DOCS-204 TTL Collections Closed
related to SERVER-2654 Allows sharded capped collections Closed
is related to SERVER-6700 Allow updates to expireAfterSeconds f... Closed
is related to SERVER-6701 TTL expiration using _id index Closed
Participants:

 Description   

For each collection have a filter such that any document matching that filter is deleted.

This is not for capped collections, but regular collections.



 Comments   
Comment by auto [ 06/Aug/12 ]

Author:

{u'date': u'2012-08-06T09:03:20-07:00', u'email': u'matulef@10gen.com', u'name': u'Kevin Matulef'}

Message: SERVER-211 more tests for TTL collections
Branch: master
https://github.com/mongodb/mongo/commit/0d89bf4b199ababfa63b72ef848146769cee81b4

Comment by Eliot Horowitz (Inactive) [ 27/May/12 ]

@pablo - background task is once a minute. once it deletes, its immediate

Comment by Pablo Molnar [ 23/May/12 ]

@Eliot what I mean is how much delay is expected to a time exceeded message to gets really deleted. In terms of seconds? milliseconds? thx

Comment by Eliot Horowitz (Inactive) [ 22/May/12 ]

@pablo - not sure what you mean

Comment by Pablo Molnar [ 22/May/12 ]

@Eliot, What is the expire accuracy expected?

Comment by Hasan Tayyar BESIK [ 22/May/12 ]

This is the most usefull feature in redis.

Comment by Eliot Horowitz (Inactive) [ 21/May/12 ]

The reason why that may have failed are numerous, and I don't think this will be similar in performance.
For example, if you did that daily, and it hard to delete ~40M records, if may have been trying to do that in a single transaction, which is incredibly expensive.

So this should work well for this.

Remember, if you're using pure time data (like logs) you can use a capped collection.

Comment by João Pedro Ataíde Fonseca [ 21/May/12 ]

@Eliot - Of course you know how the engine will query and delete the olde records, so apologies in advance, I'm just trying to understand how this TTL feature will be in terms of performance.

I once built a system using a relational database - think of it as a massive log file, where you could make ad hoc queries based on some of the fields. The system was writing 600 records/second, about 2 million per hour, 50 million per day.

Disk space was limited, so I had to delete data older than 15 days. Simple approach: "DELETE * FROM my_table WHERE timestamp < ?", but try doing that in a table with 700 million records, being written at a rate of 600/sec...

So, I was hoping for something a little more clever - space used by expired records would automatically be made available, without any background "DELETE * FROM ..." operations.

Do you think your method will handle these kinds of numbers?

Thanks for taking the time to clarify this.

Comment by Ammo Goettsch [ 21/May/12 ]

@Eliot - Ok, I probably just don't understand how the system works well enough. Sorry if I am off base here. I had assumed that some sort of lock contention would happen if there is a cleaner thread constantly submitting delete queries into the same sharded collection that all my application threads are constantly writing into.

Comment by Eliot Horowitz (Inactive) [ 21/May/12 ]

@ammo - this works sharded as currently implemented with resources per mongod.
no mongos resources are used whatsoever for this.

Comment by Ammo Goettsch [ 21/May/12 ]

@Eliot - I had the same expectation as João. In the provided implementation, don't you incur the very same BTree operations when you run your "what is expired" query?

I had expected this would be some low level "storage engine" type functionality that would run independently on the mongod's. It would not compete for resources in the mongos' once this grows up into a sharded feature. There must be something clever that can be done at steady state when the mongod's are essentially asked to write continuously by the application, and records expire at that same rate. Using an entire mongod thread to remove records that you are about to overwrite does not seem optimal.

Comment by Eliot Horowitz (Inactive) [ 21/May/12 ]

@joao - I think the amount of work the system has to do is almost identical in both approaches.
For example, when you delete a document vs re-use. The slow part is cleaning all the old index entries, which is exactly the same in both methods.
The only difference is that in yours, the document doesn't ever get put on the free-list, but in a high write system that doesn't matter much.

To tell if something could be re-used efficiently, you'd basically have to check the low end of the btree for every single write, which is actually slower than the free list.

Comment by João Pedro Ataíde Fonseca [ 21/May/12 ]

The above implementation seems to be a simple background process continuously querying and deleting old records. For write intensive applications (as most mongoDB probably are) this can be inefficient. Are you sure that the process can delete data faster than the writers? The delete process may also compete for resources with the writers (when reading, and when deleting).

I don't know much about the internals of your storage implementation, but in my head I was seeing this TTL collections issue implemented in the following way:
1. For each record stored on disk, there should be a fast, simple way of telling if it has expired.
2. When mongoDB is looking for somewhere to write new data, it treats expired records as free space, and overwrites the new information.

With the above method, the TTL housekeeping would seamlessly be integrated into the system, even if writing massive amounts of data. Expired records automatically become free disk space, ready to be used.

Comment by auto [ 17/May/12 ]

Author:

{u'login': u'erh', u'name': u'Eliot Horowitz', u'email': u'eliot@10gen.com'}

Message: don't try ttl work on secondaries SERVER-211
Branch: master
https://github.com/mongodb/mongo/commit/a60ef31c4981f49523908c7b237f2cba0b751814

Comment by Kristina Chodorow (Inactive) [ 17/May/12 ]

Seems to be running the monitor on secondaries, causing lots of errors on bb:

 m31001| Thu May 17 03:09:16 [TTLMonitor] assertion 13435 not master and slaveOk=false ns:bar.system.indexes query:{ expireAfterSeconds: { $exists: true } }
 m31001| Thu May 17 03:09:16 [TTLMonitor] { $err: "not master and slaveOk=false", code: 13435 }
 m31001| Thu May 17 03:09:16 [TTLMonitor] ERROR: backgroundjob TTLMonitorerror: invalid parameter: expected an object ()
 m31001| Thu May 17 03:09:16 ERROR: Client::shutdown not called: TTLMonitor

Comment by auto [ 11/May/12 ]

Author:

{u'login': u'erh', u'name': u'Eliot Horowitz', u'email': u'eliot@10gen.com'}

Message: SERVER-211 - simple ttl collections where we can expire based on age
Branch: master
https://github.com/mongodb/mongo/commit/25bdc679a0e559d64ec7f22b0468cf5b1671c4e7

Comment by Yuriy O&#39;Donnell [ 27/Apr/12 ]

@Eliot - that's what I thought Can you recommend a good way to do it, though? Our current solution is a nightly cron job.

Comment by Bob Patterson Jr [ 27/Apr/12 ]

When might this make it into a Development Release?

Comment by Eliot Horowitz (Inactive) [ 27/Apr/12 ]

@yuriy - basic ttl would not work with gridfs

Comment by Yuriy O&#39;Donnell [ 25/Apr/12 ]

Would TTL collection feature work with GridFS?
Or, perhaps, there is already a recommended way of removing old entries from GridFS?

Comment by Bob Patterson Jr [ 10/Apr/12 ]

This would be great. When could it actually get in a unstable branch at least?

Comment by Izzy [ 23/Mar/12 ]

+1 towards implementing this, especially TTL to replace memcached

Comment by Chris Clarke [ 21/Mar/12 ]

+1 towards implementing this on regular as well as capped collections

Comment by Chris Nagele [ 08/Mar/12 ]

+1

This would solve a lot of problems for us since we constantly have to delete records.

Comment by Matthias Felsche [ 22/Feb/12 ]

Yeah, me too.
Please consider this feature for future releases.

Comment by Harald Lapp [ 22/Jan/12 ]

i would love to have this, too. would simplify things a lot, if mongodb is used as session-storage.

Comment by Travis Laborde [ 17/Jan/12 ]

As I understand it this is like "SETEX" in Redis - another thing Mongo can do for us greatly simplifying our plans. I agree with people above that this should be added to all collections, regardless of capped.

Comment by Zeph Wang [ 10/Nov/11 ]

Look forward to the TTL collection as well. This will make it possible to get rid of memcached.

Comment by James Gosnell [ 26/Oct/11 ]

I'm really looking forward to this from our telco standpoint. The Law of Large Numbers should work in our favor as to not allow us to exceed our drive space by attaching a reasonable TTL, since capped sharded collections are not available and won't be anytime soon. I'll be able to use TTL in my presentations to management as something MongoDB can do that other db's can't (save us programming time since one currently must make one delete for every insert once at a certain db size), as we look for a noSQL (no JOIN) solution to our large single table, high-insertion problem that still needs to be query-able. We're at the point where a vertical system has a NUMA issue with I/O, yet sharding currently doesn't allow capped collections. No other (free) database is close to such a horizontal solution that I'm aware of.

TTL and capped sharded collections would be something MongoDB would stand out in the market with.

Comment by Ilya [ 03/Aug/11 ]

Any updates on this?
It really can simplify a lot our current architecture

Comment by Martin Lazarov [ 03/Aug/11 ]

This feature would certainly be realy useful to be available for normal collections.

Comment by free [ 20/Jul/11 ]

the time shold be in first level?
If doc is { a:1, task:[

{ t : Date1 }

,

{ t : Date2 }

]}, then could not fit

Comment by Kou Zuyang [ 20/Jul/11 ]

Would be great if we can set expiry time in standard collections.

Comment by Oded Maimon [ 03/Feb/11 ]

would be great for normal collections. capped collection is nice but not shardable + if used in capped collection it want be really possible to use this feature for session store..

Comment by Rama Roberts [ 24/Jan/11 ]

Would be a perfect fit for what I'm trying to do: capture log records for the previous N minutes.

Comment by Justin Smestad [ 15/Dec/10 ]

This is a really powerful feature for anyone doing analytics. +1

Comment by Ted Underhill [ 15/Nov/10 ]

This feature would certainly be useful for normal collections.

Comment by Eliot Horowitz (Inactive) [ 02/Jul/10 ]

@tim not really appropriate i think. easier just to do a separate implementation

Comment by Tim Hawkins [ 02/Jul/10 ]

Can this be a special case of a capped collection where if the current timestamp is greater than the ($('_id").getTimeStamp() + (collection TTL constant value)) then find and findOne etc return false instead of the data in a document. The caped collection mechanics would be responsible for flushing out the dead contents.

This is not 100% the same as memcache, as you would have to use a separate collection for each TTL, but its close enough.

Comment by Colin Steele [ 10/Jun/10 ]

This issue is definitely rising in urgency for us. Our production deployments would be much simplified if we could ditch memcache and use the collection expiry feature described here. Please move this to "things we're really QUITE SURE OF!"


Colin Steele
CTO, hotelicopter.com

Comment by Dmitry [ 14/Mar/10 ]

Am I right that this issue is about the auto-removal of "expired" objects from the collection?
If so, are there any plans of moving this feature from "not sure of" to the specific version, taking into account the number of votes for the issue?

Comment by Phillip Oldham [ 06/Aug/09 ]

I'm not sure whether this should just be applied to capped collections - this would be useful in normal collections also. Thinking in terms of a twitter clone - you only want to keep documents (tweets) that are under two weeks old. After that you don't care what happens to them, as they're "old" and probably irrelevant. Capping it would mean some messages get kicked before they loose their relevancy, and under heavy write load would render the app useless.

Generated at Thu Feb 08 02:53:24 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.