Currently --smallfiles option lets GridFS save file chunks into 512MB datafiles, this is not small enough for effective read caching on ZFS, and is causing problems on our ZFS+GridFS file server:
1. Our GridFS server runs on ZFS, total data size is ~30TB, Ram is 64G.
2. Using the --smallfiles option, GridFS saves ~30TB, ~12 million files as chunks in ~60,000 x 512MB datafiles.
3. ZFS first uses RAM as file read cache, when RAM is filled up, ZFS then uses SSD as file read cache (L2ARC).
4. Our application has ~30,000 hot files being fetched frequently at random hours.
5. GridFS spreads the chunks of these 30,000 files across 1,000+ 512MB datafiles.
6. Using a 256GB SSD as read cache for ZFS, ZFS only has enough space to cache ~460 GridFS datafiles of 512MB each.
7. Excessive mechanical disk seeks are caused by the application making requests to GridFS for chunks in datafiles not cached by ZFS.
8. Adding more Ram does almost nothing to help the situation, using a bigger (512GB) SSD is not cost effective and will still cause disk seeks when a file chunk is saved in the ~921th datafile.
9. If --tinyfiles enables GridFS to save chunks into 32MB datafiles, then a 256GB SSD can now cache ~7400 datafiles which contains all the hot chunks, and reduce mechanical disk seek to almost zero.
Bottom line: Tiny (32MB) datafile size may not make a difference for normal MongoDB data, but is critical for large GridFS servers on ZFS.