[SERVER-16735] capped document fields Created: 06/Jan/15  Updated: 24/Jan/15  Resolved: 09/Jan/15

Status: Closed
Project: Core Server
Component/s: GridFS, Sharding, Storage
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: guipulsar Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-16779 kind of store / batch query procedure Closed
Backwards Compatibility: Fully Compatible
Participants:

 Description   

Hi,
i didn't see any drawback for this feature, use case :
i have a pretty big collection and i have only one field (optionaly many could be an extra) who can grow for ever.

So thats actually the same logic of capped collectio but trashed data come oly from one field in each documents



 Comments   
Comment by Stennie Steneker (Inactive) [ 09/Jan/15 ]

Hi pulsar,

As I noted on your followup issue SERVER-16779, this sort of use case discussion is best suited to the mongodb-user community support forum so we can capture a better understanding of your requirements in order to provide recommendations or suggestions.

Fields with unbounded growth are a definite performance and scaling anti-pattern (see: Why shouldn't I embed large arrays in my documents?) and raising the document limit or trying to cap the array size are definitely more hacky solutions than addressing the underlying schema design.

I look forward to your post in mongodb-users so we can work out appropriate solutions.

Thanks,
Stephen

Comment by guipulsar [ 08/Jan/15 ]

In addition i would say thats this discution http://blog.mongolab.com/2013/04/thinking-about-arrays-in-mongodb/
resume the global pb i mention here and perhaps touch unfortunatly the capabilities limit of mongo db ..
My blacklist case seems a perfect example, you can't embed thousands of item in array for WRITE concern and especially where is no grow limit, and you can't putt ur blacklist ids in another collection and then call thousands of ref ids in a $in list for READ / post size concern . So now im very open to know ur point of vue , and please if you know a better nosql alternative software for my use case,
give me a name

Comment by guipulsar [ 08/Jan/15 ]

Thanks i understand ur point but i don't understand ur point
My blacklist is denormalized in a big collection profile, basicly this is a collection for finding people whith common interests. In each doc i have some other long list like blacklist you know, like favoris, etc..

I 'm deep thinking about that but finaly the only others alternatives i found were not acceptable ;
From your point of vue, it will better to split in many collections i guess with blacklist ref , favorite ref , and so on..

Problem is , this kind of shema simply doesn't work in my case , lets say you want to retrieve all profiles who are in ur blacklist..

1) first you get all _id ref in blacklist collection
2) you make the second query and post 3546 reference _id in a patetic big post size ?!

This kind of query
query : {
idprofile : {$in : [75010,75020,75011,75006,75007, with , thousand, of , _id, in the list, im, litle,concern,about,the,post,size ,for,performance,reasons]},

Really if i miss something say me , because i had lots of lecture and this kind of 2 query or multi query step is pretty hidden/unclear and not enought documented from my point of vue.
Yes i read the mongo document shema model pdf and so on but they never mention any concret example of 2 step querys , i query from php for a basic app, the post size concern is always ommited but in real life you cannot put 4450 ref id in a post ...

Comment by Ramon Fernandez Marina [ 07/Jan/15 ]

pulsar, the usage of arrays described above is far from ideal schema design, and it will give you very bad performance in the long run. Allowing documents to grow past 16MB to accommodate this usage will only make things worse, so I'd strongly encourage you to revisit your schema design if you want to avoid performance problems down the line.

I'd recommend you consider using a capped collection to hold the blacklist information, and maybe even a capped collection per _id. For further discussion in schema design pleabse post on the mongodb-user group or Stack Overflow with the mongodb tag, where your question will reach a larger audience.

Adding this feature to the server may encourage the use of sub-optimal schemas, so even if it seems useful now it may not be very attractive when performance degrades later on.

Comment by guipulsar [ 07/Jan/15 ]

thanks for this other tricks, not a really solution be cause the solution should be this feature request itself , thats lets me some time to live, an elternative is also to increase the doc size limit at 64mb.. No more worries about

Comment by Ramon Fernandez Marina [ 06/Jan/15 ]

I believe this could be best handled at the application level, as I think the use case is very specific. In the scenario described in this ticket, one could try updating the document, and if the update fails then use $pop repeatedly until the update succeeds; in python-like pseudocode:

while True:
   success = update({_id : 5), {$push : {$blacklist: <value>}})
   if success:
      break
   else:
      update({_id : 5 }, {$pop : {blacklist : -1}})

pulsar, would this meet your needs?

Comment by guipulsar [ 06/Jan/15 ]

Thanks , i understoud ur tips now but it's more or less a hack rather than a solution , ur query say keep last 1000 items but thats not do the job.. I want to keep the maximum of items and i don't know what and when there will not enought space for contain thoses items.
Nobody can says how many items in blacklist will fit with 16MB , lets says 300K for some five digit numbers item.. I don't want to capped my field if i don't need to do , i want to keep my blacklist item alog as possible ..When its no more possible , then each update will delete olders items, thats the "capped collection" logic if i didn't miss something.

My feature request make sens for me , This is resume in one phrase : If my collection ritched size limit , make some space in deleting datas from blacklist field only . I don't understand why you don't think its a good and needed feature.. Am i missing something here ..?

Comment by Eliot Horowitz (Inactive) [ 06/Jan/15 ]

How do you add items to the blacklist field?
If you do it like

update( { _id : 5 } , { $push : { "blacklist" : "foo" } } )

You could instead do

update( { _id : 5 } , { $push : { $each : [ "blacklist" : "foo" ], $slice : 1000 } } )

Which will only keep the newest 1000 entries in that field.

Comment by guipulsar [ 06/Jan/15 ]

with capped field i mean , i don't care to loose some olders blacklist field datas but i cannot loose any other data.
Actually with capped collection , olders datas are deleting for give space for newers without consideration of "what datas to delete" .
I need a capped collection with field capped option , so only blacklist field datas will be remove and i will keep important data. I hope my proplem and feature request is more understandable now, scuse my bad english..

Comment by guipulsar [ 06/Jan/15 ]

So my english is very poor i guess.. Here is my structure, the blaklist field size increaze day after days.. In 6 month i would happy to have an autodelete (capped field ) on this field , so new entryes will replace the olders, if not the 16mb limit will cause trouble.. i have thousands item in each blacklist field..

{
    "_id" : 1,
    "blacklist" : []
    "code_postal" : 67110,
    "loc" : {
        "type" : "Point",
        "coordinates" : [ 
            7.72, 
            48.91
        ]
    }
}
 
{
    "_id" : 2,
 "blacklist" : [18,1982,939,1982,98716,7611,983838, and thousands others ....]
    "code_postal" : 67110,
    "loc" : {
        "type" : "Point",
        "coordinates" : [ 
            7.63, 
            48.91
        ]
    }
}

Comment by Eliot Horowitz (Inactive) [ 06/Jan/15 ]

I don't think I understand what you're trying to do.
Can post a sample document and how it grows.

Comment by guipulsar [ 06/Jan/15 ]

heu seems i made mistake , the limit is 2MO infact.. so my need for this feature is bigger now ..lol

Comment by guipulsar [ 06/Jan/15 ]

not really, perhaps some kind of hack i'm not sure.. My need is simple, i have only one field "temp" in each document of my collection who can increase to the 16mo limit in future, i'd like to don't use grid , and i'd like to see the "temp" field autotrash (autodelete) and replace new entry by old when 16mo limit collection is reatched;
Thats seems a usefull feature no ?!

Comment by Eliot Horowitz (Inactive) [ 06/Jan/15 ]

I think $push with $slice is what you want.
See http://docs.mongodb.org/manual/reference/operator/update/slice/#up._S_slice

Generated at Thu Feb 08 03:42:05 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.