[SERVER-7287] --tinyfiles (32M) option for effective GridFS datafile caching on large file server Created: 07/Oct/12 Updated: 15/Feb/13 Resolved: 10/Oct/12 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | GridFS, Performance, Storage |
| Affects Version/s: | 2.2.0 |
| Fix Version/s: | None |
| Type: | New Feature | Priority: | Major - P3 |
| Reporter: | Alex Yam | Assignee: | Mathias Stearn |
| Resolution: | Done | Votes: | 0 |
| Labels: | Cache, GridFS, ZFS, chunks, datafiles | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
ZFS |
||
| Participants: |
| Description |
|
Currently --smallfiles option lets GridFS save file chunks into 512MB datafiles, this is not small enough for effective read caching on ZFS, and is causing problems on our ZFS+GridFS file server: 1. Our GridFS server runs on ZFS, total data size is ~30TB, Ram is 64G. 2. Using the --smallfiles option, GridFS saves ~30TB, ~12 million files as chunks in ~60,000 x 512MB datafiles. 3. ZFS first uses RAM as file read cache, when RAM is filled up, ZFS then uses SSD as file read cache (L2ARC). 4. Our application has ~30,000 hot files being fetched frequently at random hours. 5. GridFS spreads the chunks of these 30,000 files across 1,000+ 512MB datafiles. 6. Using a 256GB SSD as read cache for ZFS, ZFS only has enough space to cache ~460 GridFS datafiles of 512MB each. 7. Excessive mechanical disk seeks are caused by the application making requests to GridFS for chunks in datafiles not cached by ZFS. 8. Adding more Ram does almost nothing to help the situation, using a bigger (512GB) SSD is not cost effective and will still cause disk seeks when a file chunk is saved in the ~921th datafile. 9. If --tinyfiles enables GridFS to save chunks into 32MB datafiles, then a 256GB SSD can now cache ~7400 datafiles which contains all the hot chunks, and reduce mechanical disk seek to almost zero. Bottom line: Tiny (32MB) datafile size may not make a difference for normal MongoDB data, but is critical for large GridFS servers on ZFS. |
| Comments |
| Comment by Alex Yam [ 10/Oct/12 ] |
|
After a few days of head scratching we finally found the problem, the HDD seeks were caused by Nginx deleting old cache files and writing new ones (LRU replacement policy when the cache is full), we missed this setting when we merged the Nginx config files from different servers, as a result large files were sharing the same small cache zone with the avatars/images. Took a while to find the problem because the blinkings were inconsistent, there were 4 different caches in play, some get flushed after reboot but some don't: Problem was solved by placing images/avatars and large files in different Nginx cache zones, after the change, no more disk seeks for images/avatars after hammering the large files server. Hope this ticket can help others who come across the same situation. |
| Comment by Alex Yam [ 07/Oct/12 ] |
|
If ZFS cache is block based then we must have misconfigured something. We recently merged multiple GridFS DBs into a new ZFS server to improve reliability, when we stress tested the GridFS server, we discovered that the HDD lights on the hot swap bays were blinking nonstop. Stress tests were done using http_load and 2 URL lists: OS is running FreeBSD9, HTTP is Nginx+PHP-FPM, ZFS zpool contains a raidz3 vdev using 11 x 3TB disks, 8 of which are connected to a HBA and 3 connected to on board sata ports. Our test procedures: Running "zpool iostat -v" from the shell shows the read cache SSD only has 16M free space left so the cache is working. We are new to ZFS, so when the HDD lights blink at step 4, we assumed this was caused by ZFS caching entire GridFS datafiles at step 3 and cleared the cache at step 2. The blinking HDD lights tell us if the system goes live, the random avatar/images fetches will wear out our disks more than necessary, this is where we are stuck at the moment. Are there tools we can use to pin point exactly what files are being accessed? |
| Comment by Eliot Horowitz (Inactive) [ 07/Oct/12 ] |
|
Filesystems (zfs) included don't have to cache the entire file, they can cache only the pages they need. |