[SERVER-13344] Pad journal writes to 4k not 8k Created: 25/Mar/14  Updated: 06/Dec/22  Resolved: 14/Sep/18

Status: Closed
Project: Core Server
Component/s: MMAPv1, Storage
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Minor - P4
Reporter: Mark Callaghan Assignee: Backlog - Storage Execution Team
Resolution: Won't Fix Votes: 1
Labels: journal
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-9802 Single-threaded journal compression b... Closed
Assigned Teams:
Storage Execution
Participants:

 Description   

Journal writes are padded to 8kb, see the use of Alignment in Journal::journal and "namespace dur { const unsigned Alignment = 8192;"

With buffered IO the minimum transfer size is 4kb on Linux so padding to 8kb means that twice as much can be written to the journal in the worst case for workloads with j:1.

This can hurt performance and also generate more wear on flash devices.

Note that MongoDB already can write a lot more to disk for write-heavy workloads compared to its peers – http://smalldatum.blogspot.com/2014/03/redo-logs-in-mongodb-and-innodb.html

This might help with SERVER-9802



 Comments   
Comment by Andy Schwerin [ 15/Apr/14 ]

Sounds reasonable. You'd have to be careful to ensure you did the write to a file on the correct file system, but that shouldn't be onerous.

Comment by Mark Callaghan [ 15/Apr/14 ]

From a co-worker who has XFS & Linux internals experience...

Unfortunately, I have not seen alignment/size restrictions for O_DIRECT IO’s documented anywhere, in a filesystem independent manner. I don’t believe this is part of POSIX either. XFS used to expose the alignment restriction via an (xfs-specific) ioctl(), from what I recall. But I don’t think there is a filesystem independent way to retrieve it.

In practice, local filesystems require 512 (or better) alignment. So 4KB should certainly work for local filesystems. NFS requires PAGE_SIZE (4KB) alignment.

One way to do this is to write a fragment of code (in the mongoDB init path) that tests the 4KB alignment (which should work in the majority of cases) and fall back to 8KB if that does not work (EINVAL). Would that be very contentious ?

Comment by Dwight Merriman [ 03/Apr/14 ]

i'm not sure what it should round to, but i can give background on why it is 8KB. the page:

http://linux.die.net/man/2/open

says:

The O_DIRECT flag may impose alignment restrictions on the length and address of user-space buffers and the file offset of I/Os. In Linux alignment restrictions vary by file system and kernel version and might be absent entirely. However there is currently no file system-independent interface for an application to discover these restrictions for a given file or file system.

Given that comment, the idea was to be conservative and aim high. Is the 4KB # documented somewhere?

Generated at Thu Feb 08 03:31:25 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.