[DOCS-905] Preheating Cache/Database Created: 18/Dec/12  Updated: 30/Oct/23  Resolved: 27/Jul/16

Status: Closed
Project: Documentation
Component/s: manual
Affects Version/s: None
Fix Version/s: Server_Docs_20231030

Type: Task Priority: Major - P3
Reporter: Sam Kleinman (Inactive) Assignee: Unassigned
Resolution: Won't Fix Votes: 1
Labels: newwriter
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by DOCS-488 Add manual section with information o... Closed
Related
is related to SERVER-9415 MongoDB only uses small amount of RAM Closed
Participants:
Days since reply: 7 years, 29 weeks ago

 Description   

http://blog.mongodb.org/post/10407828262/cache-reheating-not-to-be-ignored



 Comments   
Comment by Emily Hall [ 27/Jul/16 ]

Closed for housekeeping on 7/27/2016 by Emily Hall.
If you require additional support, please open a new ticket for prioritization.
Thanks,
Emily

Comment by Adam Comerford [ 27/Feb/14 ]

I wrote most of this up as an e-mail when discussing the touch command (which should have a link to this doc when it is done). Here's the mail, reformatted for Jira:

First, notes about the touch command:

1. Touch does not change the resident memory stat, it reads data into the FS cache. The data is in there, but it does not yet belong to the mongod process. This can cause confusion
2. It's a blunt tool, as mentioned it will only load data/indexes, there is no control over what stays in memory if the data does not fit
3. If you want more control over what is in memory, then you will want to use a different method, such as pre-heating

There are various methods for pre-heating data in memory, and usually various different requirements. For example:

  • You have one (or more) very important index, that must be in memory, the rest you can live with being paged in as needed
  • You have a subset of data, that you can identify easily, that represents your working set - this is what you want in memory, and it is a mix of an index and data

Pre-Heating an Entire Specific Index

The easiest way to explain this is to use a simple example. Imagine there is an index such as

{"posts" : 1, "date" : 1}

that must be in memory. To load this index into memory, we essentially need to force the database to touch the entire index. Fortunately that is relatively easy to do by using a covered index query with hint() (to remove any doubt about the index used) and explain() to ensure the entire index is touched (explain forces this to happen because the entire query must be executed for it to return a time for completion):

> db.blogs.find({}, {"_id" : 0, "posts" : 1, "date" : 1}).hint({"posts" : 1, "date" : 1}).explain()

A couple of other noteworthy points about the command above:

  • The query criteria are an empty document so it will match everything, for those familiar with SQL this is the equivalent of a SELECT *
  • The following document is a projection that allows this to be a covered query and only return the fields in the index. If you were to leave off the projection, all the data would also be read into memory (which may be what you want).

Loading a Working Set

Every database has a different working set that should be in memory for optimal performance, but a common pattern in general is that the most recent data is the most likely to be part of that working set. So, rather than an entire index, or an entire collection: how do you go about loading the more recent data into memory?

The first requirement is that your data must have a field (or fields) that indicate creation date or updated date (or similar). The methods described here will assume that there is at least one such field. If you do not have such a field then you will need a substitute field to similarly select a subset of data, but otherwise the methodology is the same.

Note: The default value of the _id field is an ObjectId, and ObjectId's contain a timestamp per the specification.

Load recently created documents into memory

Using the ObjectId example spec, we can see that it would be relatively easy to construct an ObjectId that is, essentially, 5 days old and then use that as a way to query for all data newer than that time.

> last5Days = Math.round(((new Date().getTime() - (5 * 24 * 60 * 60))/1000))
1393499947

Date().getTime returns the current date in seconds, and simply change the 5 to the number of days back you wish to go - the rounding is necessary to remove the fractions of seconds returned. Next we need to change that into a proper ObjectId, thankfully this is pretty straight forward, all we need to do is convert to hexadecimal and pad the string with zeroes:

// convert to hexadecimal
> hex5Days = last5Days.toString(16)
530f1f2b
// pad the string and pass it to the ObjectId constructor
> historicId = ObjectId(hex5Days+"0000000000000000")
ObjectId("530f1f2b0000000000000000")
// Now query for it:
> db.foo.find({"_id" : {"$gt" : historicId}}).explain()

Hence, this will load all of the docs (and the rightmost branches of the "_id" index) from the last 5 days

Generated at Thu Feb 08 07:39:47 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.