While observing mongod behavior using strace and perf with dbpath (wt home) "/mnt/db" and journal (wt log) directory "journal" I noticed that it was calling fdatasync apparently unnecessarily on "/mnt/db" and never calling fdatasync on "/mnt/db/journal".
The reason is because __wt_log_open initializes log->log_dir_fh by calling __wt_open(session, "journal", ...), which expands the path to "/mnt/db/journal" before passing it to __open_directory. But __open_directory appears to expect the path to a file in the directory rather than the path of the directory, as it strips the final component of the path and opens "/mnt/db". The result is that the wt log system calls fdatasync on "/mnt/db" and not on "/mnt/db/journal".
There are two consequences to this:
- Performance - the unnecessary fdatasync calls on /mnt/db can get stuck behind a lengthy call to fdatasync on a .wt file, or otherwise be hampered by i/o to .wt files; whereas an fdatsync call to /mnt/db/journal should be unimpeded by any activity in /mnt/db if the /mnt/db/journal has been placed on a separate drive, per our recommended best practices.
- Durability - I believe durability of journaled writes depends on the fdatasync of /mnt/db/journal, in order to ensure that newly created journal files are durable.