[SERVER-13344] Pad journal writes to 4k not 8k Created: 25/Mar/14 Updated: 06/Dec/22 Resolved: 14/Sep/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | MMAPv1, Storage |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Minor - P4 |
| Reporter: | Mark Callaghan | Assignee: | Backlog - Storage Execution Team |
| Resolution: | Won't Fix | Votes: | 1 |
| Labels: | journal | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Assigned Teams: |
Storage Execution
|
||||||||
| Participants: | |||||||||
| Description |
|
Journal writes are padded to 8kb, see the use of Alignment in Journal::journal and "namespace dur { const unsigned Alignment = 8192;" With buffered IO the minimum transfer size is 4kb on Linux so padding to 8kb means that twice as much can be written to the journal in the worst case for workloads with j:1. This can hurt performance and also generate more wear on flash devices. Note that MongoDB already can write a lot more to disk for write-heavy workloads compared to its peers – http://smalldatum.blogspot.com/2014/03/redo-logs-in-mongodb-and-innodb.html This might help with |
| Comments |
| Comment by Andy Schwerin [ 15/Apr/14 ] |
|
Sounds reasonable. You'd have to be careful to ensure you did the write to a file on the correct file system, but that shouldn't be onerous. |
| Comment by Mark Callaghan [ 15/Apr/14 ] |
|
From a co-worker who has XFS & Linux internals experience... Unfortunately, I have not seen alignment/size restrictions for O_DIRECT IO’s documented anywhere, in a filesystem independent manner. I don’t believe this is part of POSIX either. XFS used to expose the alignment restriction via an (xfs-specific) ioctl(), from what I recall. But I don’t think there is a filesystem independent way to retrieve it. In practice, local filesystems require 512 (or better) alignment. So 4KB should certainly work for local filesystems. NFS requires PAGE_SIZE (4KB) alignment. One way to do this is to write a fragment of code (in the mongoDB init path) that tests the 4KB alignment (which should work in the majority of cases) and fall back to 8KB if that does not work (EINVAL). Would that be very contentious ? |
| Comment by Dwight Merriman [ 03/Apr/14 ] |
|
i'm not sure what it should round to, but i can give background on why it is 8KB. the page: http://linux.die.net/man/2/open says:
Given that comment, the idea was to be conservative and aim high. Is the 4KB # documented somewhere? |