[SERVER-9506] initial or new document size flag to avoid manual document padding Created: 29/Apr/13  Updated: 01/May/13  Resolved: 01/May/13

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Minor - P4
Reporter: MediaMath Mongo Assignee: Stennie Steneker (Inactive)
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-1810 ability to set minimum allocation siz... Closed
Participants:

 Description   

usePowerOf2Sizes is great but it still requires us to do manual document padding for new documents. It should not be complicated to implemented a flag to does the padding for you. We have hacked something up called initialdocsize in our internal build of mongo but would like to see this feature implemented by 10gen. Padding factor is a good idea but not as practical as the combination of initialdocsize and userpowerof2sizes.



 Comments   
Comment by Eliot Horowitz (Inactive) [ 01/May/13 ]

See SERVER-1810

Comment by MediaMath Mongo [ 01/May/13 ]
  • We actually don't find padding factor useful at all except as a gauge to get an idea of difference in document sizes in particular collection. If I remember correctly, dump/restore does not preserve padding factor. Especially if you use
  • Space is usually not an issue, I/O and fragmentation are big problems for us. The worst part is, compact/repairdb does not work on individual data files instead requires you to have 100% extra free space.
  • Our documents start small but receive updates quite frequently which means in order to utilize update-in-place and avoid fragmentation, we would rather mongo pre-allocate certain amount of space when documents are created.

Basically, we just want 10gen to do the padding instead of padding manually.

"docInitialSize" : 256,

So this means that when document is first inserted, upserted or created, mongod will allocate 256 bytes on disk even if there is only _id in the document. Users should be able to change the values at anytime as this flag only affects newly created documents. We actually have our own patched mongo in production that does this and would like to see 10gen implements this officially.

db.getSisterDB("udb_prod").printCollectionStats()
Users
{
	"ns" : "udb_prod.Users",
	"count" : 398788692,
	"size" : 185653612416,
	"avgObjSize" : 465.54382343419104,
	"storageSize" : 193898130064,
	"numExtents" : 111,
	"nindexes" : 2,
	"lastExtentSize" : 2146426864,
	"paddingFactor" : 1.000000004875417,
	"docInitialSize" : 256,
	"systemFlags" : 1,
	"userFlags" : 1,
	"totalIndexSize" : 55315905344,
	"indexSizes" : {
		"_id_" : 37450135296,
		"lmt_1" : 17865770048
	},
	"ok" : 1
}

Comment by Stennie Steneker (Inactive) [ 01/May/13 ]

The usePowerOf2Sizes option is currently implemented as an alternative to the paddingFactor (see 2.2.4 code reference), and rounds up the document allocation to the next nearest power of 2. Depending on the size of your documents and expected growth, this can already be a significant extra allocation for document growth. In MongoDB 2.4 there was a further refinement to quantize the allocations (see SERVER-7159) for more efficient allocation of space.

Where would you suggest an initialdocsize value would be set? Do you have a proposed patch based on your internal build of MongoDB?

Given documents in a collection normally vary in size, it seems like this feature request may want to either be an option to apply the padding factor before the usePowerOf2Sizes calculation or an option to set a minimumDocumentSize.

Cheers,
Stephen

Generated at Thu Feb 08 03:20:37 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.