[SERVER-26131] MongoDB, XFS, and SSDs Created: 15/Sep/16 Updated: 20/Sep/16 Resolved: 16/Sep/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Performance, Storage |
| Affects Version/s: | 2.6.12, 3.0.12, 3.2.9 |
| Fix Version/s: | None |
| Type: | Question | Priority: | Major - P3 |
| Reporter: | Gregory Banks | Assignee: | Unassigned |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Participants: |
| Description |
|
We have run into an issue with XFS’s FITRIM ioctl implementation (see: https://github.com/torvalds/linux/blob/master/fs/xfs/xfs_discard.c#L155) (used by the fstrim command (see: https://github.com/karelzak/util-linux/blob/master/sys-utils/fstrim.c#L87)) when running against local SSDs that is severely impacting IO in general and MongoDB specifically. Essentially, XFS is iterating over every allocation group and issuing TRIM s for all free extents every time this ioctl is called. This, coupled with the facts that Linux’s interface to the TRIM command is both synchronous and does not support a vectorized list of ranges (see: https://github.com/torvalds/linux/blob/3fc9d690936fb2e20e180710965ba2cc3a0881f8/block/blk-lib.c#L112), is leading to a large number of extraneous TRIM commands (each of which have been observed to be slow, see: http://oss.sgi.com/archives/xfs/2011-12/msg00311.html) being issued to the disk for ranges that both the filesystem and the disk know to be free. In practice, we have seen IO disruptions of up to 2 minutes. I realize that the duration of these disruptions may be controller dependent. Unfortunately, when running on a platform like AWS, one does not have the luxury of choosing specific hardware. EXT4, on the other hand, tracks blocks that have been deleted since the previous FITRIM ioctl and targets subsequent TRIM s to the appropriate block ranges (see: http://blog.taz.net.au/2012/01/07/fstrim-and-xfs/). In real-world tests this significantly reduces the impact of fstrim to the point that it is un-noticeable to the database / application. We are currently switching back to EXT4 as a result. Alternatively, we could mount the filesystem with the discard option (as AWS suggests here: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ssd-instance-store.html), however, our confidence in this performing better is not high given XFS developer comments on the subject (see: http://oss.sgi.com/archives/xfs/2014-08/msg00465.html):
I am aware that MongoDB strongly recommends using XFS (see: https://docs.mongodb.com/manual/administration/production-notes/#kernel-and-file-systems) and that this is because EXT4 journaling could impact Wired Tiger checkpointing under heavy write load (https://groups.google.com/forum/#!msg/mongodb-user/diGdooN_2Sw/4H7t5JTDcpAJ). Can you elaborate on this? Is this the only concern that drove the strong recommendation to go with XFS and, in MongoDB’s opinion, is this still valid given the performance issues with TRIM on Linux when running XFS on SSDs? We are currently running the MMAPv1 storage engine on MongoDB 2.6 and, as mentioned above, we have reverted to EXT4 without apparent consequence. Any more info that you could provide would really help us in weighing the pros and cons while we work toward Wired Tiger. Also, any more general recommendations for mitigating the disruption incurred by running fstrim would be more than welcome. |
| Comments |
| Comment by Gregory Banks [ 20/Sep/16 ] |
|
https://groups.google.com/forum/#!topic/mongodb-user/Mj0x6m-02Ms |
| Comment by Gregory Banks [ 16/Sep/16 ] |
|
Thanks Thomas. I'll move discussion over to the group. Cheers, |
| Comment by Kelsey Schubert [ 16/Sep/16 ] |
|
Hi gregbanks, Thank you for the detailed question. We recommend XFS since we have observed long pauses related to EXT4. However, if you have tested your workload with WiredTiger on EXT4 and see better results, then I don't see a reason why you can't move forward with it. Please note that SERVER project is for reporting bugs or feature suggestions for the MongoDB server. For MongoDB-related support discussion please post on the mongodb-users group or Stack Overflow with the mongodb tag. A question like this involving more discussion would be best posted on the mongodb-users group. Kind regards, |