[SERVER-211] TTL collections Created: 05/Aug/09 Updated: 12/Jul/16 Resolved: 27/May/12 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Usability |
| Affects Version/s: | None |
| Fix Version/s: | 2.1.2 |
| Type: | New Feature | Priority: | Major - P3 |
| Reporter: | Michael Dirolf | Assignee: | Eliot Horowitz (Inactive) |
| Resolution: | Done | Votes: | 144 |
| Labels: | rn | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Participants: |
Ammo Goettsch, auto, Bob Patterson Jr, Chris Clarke, Chris Nagele, Colin Steele, Dmitry, Eliot Horowitz, free, Harald Lapp, Hasan Tayyar BESIK, Ilya, Izzy, James Gosnell, João Pedro Ataíde Fonseca, Justin Smestad, Kou Zuyang, Kristina Chodorow, Martin Lazarov, Matthias Felsche, Michael Dirolf, Oded Maimon, Pablo Molnar, Phillip Oldham, Rama Roberts, Ted Underhill, Tim Hawkins, Travis Laborde, Yuriy O'Donnell, Zeph Wang
|
||||||||||||||||||||||||
| Description |
|
For each collection have a filter such that any document matching that filter is deleted. This is not for capped collections, but regular collections. |
| Comments |
| Comment by auto [ 06/Aug/12 ] | ||||
|
Author: {u'date': u'2012-08-06T09:03:20-07:00', u'email': u'matulef@10gen.com', u'name': u'Kevin Matulef'}Message: | ||||
| Comment by Eliot Horowitz (Inactive) [ 27/May/12 ] | ||||
|
@pablo - background task is once a minute. once it deletes, its immediate | ||||
| Comment by Pablo Molnar [ 23/May/12 ] | ||||
|
@Eliot what I mean is how much delay is expected to a time exceeded message to gets really deleted. In terms of seconds? milliseconds? thx | ||||
| Comment by Eliot Horowitz (Inactive) [ 22/May/12 ] | ||||
|
@pablo - not sure what you mean | ||||
| Comment by Pablo Molnar [ 22/May/12 ] | ||||
|
@Eliot, What is the expire accuracy expected? | ||||
| Comment by Hasan Tayyar BESIK [ 22/May/12 ] | ||||
|
This is the most usefull feature in redis. | ||||
| Comment by Eliot Horowitz (Inactive) [ 21/May/12 ] | ||||
|
The reason why that may have failed are numerous, and I don't think this will be similar in performance. So this should work well for this. Remember, if you're using pure time data (like logs) you can use a capped collection. | ||||
| Comment by João Pedro Ataíde Fonseca [ 21/May/12 ] | ||||
|
@Eliot - Of course you know how the engine will query and delete the olde records, so apologies in advance, I'm just trying to understand how this TTL feature will be in terms of performance. I once built a system using a relational database - think of it as a massive log file, where you could make ad hoc queries based on some of the fields. The system was writing 600 records/second, about 2 million per hour, 50 million per day. Disk space was limited, so I had to delete data older than 15 days. Simple approach: "DELETE * FROM my_table WHERE timestamp < ?", but try doing that in a table with 700 million records, being written at a rate of 600/sec... So, I was hoping for something a little more clever - space used by expired records would automatically be made available, without any background "DELETE * FROM ..." operations. Do you think your method will handle these kinds of numbers? Thanks for taking the time to clarify this. | ||||
| Comment by Ammo Goettsch [ 21/May/12 ] | ||||
|
@Eliot - Ok, I probably just don't understand how the system works well enough. Sorry if I am off base here. I had assumed that some sort of lock contention would happen if there is a cleaner thread constantly submitting delete queries into the same sharded collection that all my application threads are constantly writing into. | ||||
| Comment by Eliot Horowitz (Inactive) [ 21/May/12 ] | ||||
|
@ammo - this works sharded as currently implemented with resources per mongod. | ||||
| Comment by Ammo Goettsch [ 21/May/12 ] | ||||
|
@Eliot - I had the same expectation as João. In the provided implementation, don't you incur the very same BTree operations when you run your "what is expired" query? I had expected this would be some low level "storage engine" type functionality that would run independently on the mongod's. It would not compete for resources in the mongos' once this grows up into a sharded feature. There must be something clever that can be done at steady state when the mongod's are essentially asked to write continuously by the application, and records expire at that same rate. Using an entire mongod thread to remove records that you are about to overwrite does not seem optimal. | ||||
| Comment by Eliot Horowitz (Inactive) [ 21/May/12 ] | ||||
|
@joao - I think the amount of work the system has to do is almost identical in both approaches. To tell if something could be re-used efficiently, you'd basically have to check the low end of the btree for every single write, which is actually slower than the free list. | ||||
| Comment by João Pedro Ataíde Fonseca [ 21/May/12 ] | ||||
|
The above implementation seems to be a simple background process continuously querying and deleting old records. For write intensive applications (as most mongoDB probably are) this can be inefficient. Are you sure that the process can delete data faster than the writers? The delete process may also compete for resources with the writers (when reading, and when deleting). I don't know much about the internals of your storage implementation, but in my head I was seeing this TTL collections issue implemented in the following way: With the above method, the TTL housekeeping would seamlessly be integrated into the system, even if writing massive amounts of data. Expired records automatically become free disk space, ready to be used. | ||||
| Comment by auto [ 17/May/12 ] | ||||
|
Author: {u'login': u'erh', u'name': u'Eliot Horowitz', u'email': u'eliot@10gen.com'}Message: don't try ttl work on secondaries | ||||
| Comment by Kristina Chodorow (Inactive) [ 17/May/12 ] | ||||
|
Seems to be running the monitor on secondaries, causing lots of errors on bb:
| ||||
| Comment by auto [ 11/May/12 ] | ||||
|
Author: {u'login': u'erh', u'name': u'Eliot Horowitz', u'email': u'eliot@10gen.com'}Message: | ||||
| Comment by Yuriy O'Donnell [ 27/Apr/12 ] | ||||
|
@Eliot - that's what I thought | ||||
| Comment by Bob Patterson Jr [ 27/Apr/12 ] | ||||
|
When might this make it into a Development Release? | ||||
| Comment by Eliot Horowitz (Inactive) [ 27/Apr/12 ] | ||||
|
@yuriy - basic ttl would not work with gridfs | ||||
| Comment by Yuriy O'Donnell [ 25/Apr/12 ] | ||||
|
Would TTL collection feature work with GridFS? | ||||
| Comment by Bob Patterson Jr [ 10/Apr/12 ] | ||||
|
This would be great. When could it actually get in a unstable branch at least? | ||||
| Comment by Izzy [ 23/Mar/12 ] | ||||
|
+1 towards implementing this, especially TTL to replace memcached | ||||
| Comment by Chris Clarke [ 21/Mar/12 ] | ||||
|
+1 towards implementing this on regular as well as capped collections | ||||
| Comment by Chris Nagele [ 08/Mar/12 ] | ||||
|
+1 This would solve a lot of problems for us since we constantly have to delete records. | ||||
| Comment by Matthias Felsche [ 22/Feb/12 ] | ||||
|
Yeah, me too. | ||||
| Comment by Harald Lapp [ 22/Jan/12 ] | ||||
|
i would love to have this, too. would simplify things a lot, if mongodb is used as session-storage. | ||||
| Comment by Travis Laborde [ 17/Jan/12 ] | ||||
|
As I understand it this is like "SETEX" in Redis - another thing Mongo can do for us greatly simplifying our plans. I agree with people above that this should be added to all collections, regardless of capped. | ||||
| Comment by Zeph Wang [ 10/Nov/11 ] | ||||
|
Look forward to the TTL collection as well. This will make it possible to get rid of memcached. | ||||
| Comment by James Gosnell [ 26/Oct/11 ] | ||||
|
I'm really looking forward to this from our telco standpoint. The Law of Large Numbers should work in our favor as to not allow us to exceed our drive space by attaching a reasonable TTL, since capped sharded collections are not available and won't be anytime soon. I'll be able to use TTL in my presentations to management as something MongoDB can do that other db's can't (save us programming time since one currently must make one delete for every insert once at a certain db size), as we look for a noSQL (no JOIN) solution to our large single table, high-insertion problem that still needs to be query-able. We're at the point where a vertical system has a NUMA issue with I/O, yet sharding currently doesn't allow capped collections. No other (free) database is close to such a horizontal solution that I'm aware of. TTL and capped sharded collections would be something MongoDB would stand out in the market with. | ||||
| Comment by Ilya [ 03/Aug/11 ] | ||||
|
Any updates on this? | ||||
| Comment by Martin Lazarov [ 03/Aug/11 ] | ||||
|
This feature would certainly be realy useful to be available for normal collections. | ||||
| Comment by free [ 20/Jul/11 ] | ||||
|
the time shold be in first level? , { t : Date2 }]}, then could not fit | ||||
| Comment by Kou Zuyang [ 20/Jul/11 ] | ||||
|
Would be great if we can set expiry time in standard collections. | ||||
| Comment by Oded Maimon [ 03/Feb/11 ] | ||||
|
would be great for normal collections. capped collection is nice but not shardable + if used in capped collection it want be really possible to use this feature for session store.. | ||||
| Comment by Rama Roberts [ 24/Jan/11 ] | ||||
|
Would be a perfect fit for what I'm trying to do: capture log records for the previous N minutes. | ||||
| Comment by Justin Smestad [ 15/Dec/10 ] | ||||
|
This is a really powerful feature for anyone doing analytics. +1 | ||||
| Comment by Ted Underhill [ 15/Nov/10 ] | ||||
|
This feature would certainly be useful for normal collections. | ||||
| Comment by Eliot Horowitz (Inactive) [ 02/Jul/10 ] | ||||
|
@tim not really appropriate i think. easier just to do a separate implementation | ||||
| Comment by Tim Hawkins [ 02/Jul/10 ] | ||||
|
Can this be a special case of a capped collection where if the current timestamp is greater than the ($('_id").getTimeStamp() + (collection TTL constant value)) then find and findOne etc return false instead of the data in a document. The caped collection mechanics would be responsible for flushing out the dead contents. This is not 100% the same as memcache, as you would have to use a separate collection for each TTL, but its close enough. | ||||
| Comment by Colin Steele [ 10/Jun/10 ] | ||||
|
This issue is definitely rising in urgency for us. Our production deployments would be much simplified if we could ditch memcache and use the collection expiry feature described here. Please move this to "things we're really QUITE SURE OF!" – | ||||
| Comment by Dmitry [ 14/Mar/10 ] | ||||
|
Am I right that this issue is about the auto-removal of "expired" objects from the collection? | ||||
| Comment by Phillip Oldham [ 06/Aug/09 ] | ||||
|
I'm not sure whether this should just be applied to capped collections - this would be useful in normal collections also. Thinking in terms of a twitter clone - you only want to keep documents (tweets) that are under two weeks old. After that you don't care what happens to them, as they're "old" and probably irrelevant. Capping it would mean some messages get kicked before they loose their relevancy, and under heavy write load would render the app useless. |