Details

    • Type: Improvement
    • Status: Open
    • Priority: Minor - P4
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: Backlog
    • Component/s: MMAPv1, Storage
    • Labels:

      Description

      Journal writes are padded to 8kb, see the use of Alignment in Journal::journal and "namespace dur { const unsigned Alignment = 8192;"

      With buffered IO the minimum transfer size is 4kb on Linux so padding to 8kb means that twice as much can be written to the journal in the worst case for workloads with j:1.

      This can hurt performance and also generate more wear on flash devices.

      Note that MongoDB already can write a lot more to disk for write-heavy workloads compared to its peers – http://smalldatum.blogspot.com/2014/03/redo-logs-in-mongodb-and-innodb.html

      This might help with SERVER-9802

        Issue Links

          Activity

          Hide
          dwight_10gen Dwight Merriman added a comment -

          i'm not sure what it should round to, but i can give background on why it is 8KB. the page:

          http://linux.die.net/man/2/open

          says:

          The O_DIRECT flag may impose alignment restrictions on the length and address of user-space buffers and the file offset of I/Os. In Linux alignment restrictions vary by file system and kernel version and might be absent entirely. However there is currently no file system-independent interface for an application to discover these restrictions for a given file or file system.

          Given that comment, the idea was to be conservative and aim high. Is the 4KB # documented somewhere?

          Show
          dwight_10gen Dwight Merriman added a comment - i'm not sure what it should round to, but i can give background on why it is 8KB. the page: http://linux.die.net/man/2/open says: The O_DIRECT flag may impose alignment restrictions on the length and address of user-space buffers and the file offset of I/Os. In Linux alignment restrictions vary by file system and kernel version and might be absent entirely. However there is currently no file system-independent interface for an application to discover these restrictions for a given file or file system. Given that comment, the idea was to be conservative and aim high. Is the 4KB # documented somewhere?
          Hide
          mdcallag Mark Callaghan added a comment -

          From a co-worker who has XFS & Linux internals experience...

          Unfortunately, I have not seen alignment/size restrictions for O_DIRECT IO’s documented anywhere, in a filesystem independent manner. I don’t believe this is part of POSIX either. XFS used to expose the alignment restriction via an (xfs-specific) ioctl(), from what I recall. But I don’t think there is a filesystem independent way to retrieve it.

          In practice, local filesystems require 512 (or better) alignment. So 4KB should certainly work for local filesystems. NFS requires PAGE_SIZE (4KB) alignment.

          One way to do this is to write a fragment of code (in the mongoDB init path) that tests the 4KB alignment (which should work in the majority of cases) and fall back to 8KB if that does not work (EINVAL). Would that be very contentious ?

          Show
          mdcallag Mark Callaghan added a comment - From a co-worker who has XFS & Linux internals experience... Unfortunately, I have not seen alignment/size restrictions for O_DIRECT IO’s documented anywhere, in a filesystem independent manner. I don’t believe this is part of POSIX either. XFS used to expose the alignment restriction via an (xfs-specific) ioctl(), from what I recall. But I don’t think there is a filesystem independent way to retrieve it. In practice, local filesystems require 512 (or better) alignment. So 4KB should certainly work for local filesystems. NFS requires PAGE_SIZE (4KB) alignment. One way to do this is to write a fragment of code (in the mongoDB init path) that tests the 4KB alignment (which should work in the majority of cases) and fall back to 8KB if that does not work (EINVAL). Would that be very contentious ?
          Hide
          schwerin Andy Schwerin added a comment -

          Sounds reasonable. You'd have to be careful to ensure you did the write to a file on the correct file system, but that shouldn't be onerous.

          Show
          schwerin Andy Schwerin added a comment - Sounds reasonable. You'd have to be careful to ensure you did the write to a file on the correct file system, but that shouldn't be onerous.

            People

            • Votes:
              1 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated: