[SERVER-8966] DataFileSync thread and default value of syncdelay option don't work effectively on Linux. Created: 13/Mar/13  Updated: 28/Mar/16  Resolved: 28/Mar/16

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: 2.4.0-rc0
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Hiroaki Assignee: Daniel Pasette (Inactive)
Resolution: Won't Fix Votes: 0
Labels: srl
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Linux centos6 2.6.32-220.el6.x86_64


Operating System: ALL
Participants:

 Description   

This topic mention about only behavior of the mongod on Linux.

As you know, the mongod's storage engine uses mmap() and also uses msync() to write back to physical disk by "DataFileSync" thread basically.

In addition, we can control the interval of the msync() by using "--syncdelay" option.

The default value of the syncdelay is 60 , It means msync() will be called once per minutes.

But updated mapped pages will be written back automatically by kernel at least less than 30 seconds in recent Linux.

We can confirm this settings by below commands.

$ sysctl vm.dirty_expire_centisecs
vm.dirty_expire_centisecs = 3000
$ sysctl vm.dirty_writeback_centisecs
vm.dirty_writeback_centisecs = 500

These values are in hundredths of a second.

My proposal

  1. Change the default value of syncdelay from 60 to 0.
    The value of 0 means that it depend on your system.
  2. Revise manual appropriately.

Supplementary explanations

Detail of the behavior of mmap() (mapped pages).

  1. Start mongod with syncdelay=0.
  2. The page will be marked the dirty flag by kernel when the mongod updates the part of its mapped memory.
  3. Pages that are marked by dirty flag after certain period (vm.dirty_expire_centisecs=30secs) of time will be considered expired and must be written at the next opportunity.
  4. Next opportunity : Kernel kicks writeback process at regular (vm.dirty_writeback_centisecs=5secs) interval.

how to confirm.

  1. Update a correction per seconds.

    for i in {0..10000}; do
    mongo 127.0.0.1:27017 <<< "use testdb
    db.testcol.save({key:$i})";
    sleep 1
    done;

  2. Check DB file stat.

     stat data/testdb/testdb.0 -c'%y'

    The timing of the DB file update (when default)

    sysctl vm.dirty_expire_centisecs=3000
    sysctl vm.dirty_writeback_centisecs=500

    2013-03-13 00:20:57.441599513 -0700
    2013-03-13 00:21:04.838052872 -0700
    2013-03-13 00:21:29.125808067 -0700
    2013-03-13 00:21:34.408625047 -0700
    2013-03-13 00:21:42.857941700 -0700
    2013-03-13 00:21:59.761572768 -0700
    2013-03-13 00:22:00.817927798 -0700
    2013-03-13 00:22:28.286594599 -0700
          :

    The timing of the DB file update (modify kernel params as writing back per 5 seconds )

    sysctl vm.dirty_expire_centisecs=500
    sysctl vm.dirty_writeback_centisecs=100

    2013-03-13 00:23:09.481541037 -0700
    2013-03-13 00:23:14.760541494 -0700
    2013-03-13 00:23:20.040029026 -0700
    2013-03-13 00:23:24.267557606 -0700
    2013-03-13 00:23:29.551242097 -0700
    2013-03-13 00:23:34.832939305 -0700
    2013-03-13 00:23:35.888281140 -0700
    2013-03-13 00:23:40.109627939 -0700
    2013-03-13 00:23:45.389313622 -0700
    2013-03-13 00:23:50.671000401 -0700
    2013-03-13 00:23:51.726337540 -0700
       :

    The timing of the DB file update (modify kernel params as writing back per 1 seconds )

    sysctl vm.dirty_expire_centisecs=100
    sysctl vm.dirty_writeback_centisecs=100

    2013-03-13 00:28:44.298270234 -0700
    2013-03-13 00:28:45.353080480 -0700
    2013-03-13 00:28:46.409875411 -0700
    2013-03-13 00:28:47.467720763 -0700
    2013-03-13 00:28:47.467720763 -0700
    2013-03-13 00:28:48.527613888 -0700
    2013-03-13 00:28:49.583300450 -0700
    2013-03-13 00:28:50.640146646 -0700
    2013-03-13 00:28:51.697976987 -0700
    2013-03-13 00:28:52.754726703 -0700
    2013-03-13 00:28:53.809552595 -0700
      :



 Comments   
Comment by Ian Whalen (Inactive) [ 28/Mar/16 ]

The OS sync and mmap sync are slightly different - the latter includes journal syncing. We are not planning to change the defaults.

The OS sync is non-blocking and helps reduce the delay in syncing, so it is useful to have it happen more often than the mongodb syncing.

Comment by Hiroaki [ 13/Mar/13 ]

Sorry, I made a mistake.

The line of below
1. Start mongod with syncdelay=0.

should move to the first step of "how to confirm" section.

Generated at Thu Feb 08 03:18:56 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.