[SERVER-24746] High disk io on secondary servers on wal volume Created: 23/Jun/16  Updated: 14/Jul/16  Resolved: 23/Jun/16

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Question Priority: Major - P3
Reporter: Aleksey Shirokih Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Participants:

 Description   

Hi!
Several days ago i have configured classical replicaset with 3 servers.
Today i finished setting up monitoring and see that all my secondary servers are very busy.
On Primary server no matter what is it (i use rs.stepDown() several times) i got very little load

[db02] 12:33:19 /wal # iostat /dev/mapper/wal-wal 1
Linux 3.10.0-327.10.1.el7.x86_64 (db02)         06/23/2016      _x86_64_        (24 CPU)
 
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           3.59    0.00    0.70    0.18    0.00   95.52
 
Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
dm-2            514.33         0.00      1535.53       2824 1054280829
 
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           6.18    0.00    0.43    0.00    0.00   93.39
 
Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
dm-2             26.00         0.00       825.00          0        825
 
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           7.54    0.00    0.39    0.00    0.00   92.07
 
Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
dm-2             30.00         0.00       914.00          0        914
 
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           9.29    0.00    0.61    0.04    0.00   90.06
 
Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
dm-2             32.00         0.00      1027.00          0       1027
 
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          11.42    0.00    1.46    0.00    0.00   87.12
 
Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
dm-2             29.00         0.00       954.00          0        954
 
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          11.61    0.00    0.65    0.00    0.00   87.74
 
Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
dm-2             30.00         0.00      1078.00          0       1078
 
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          10.51    0.00    0.30    0.04    0.00   89.14
 
Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
dm-2             30.00         0.00       962.00          0        962

but in the same time on another server

[db03] 14:00:17 /home/shirokih #  iostat /dev/mapper/wal-wal 1
Linux 3.10.0-327.18.2.el7.x86_64 (db03)         06/23/2016      _x86_64_        (24 CPU)
 
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           2.85    0.00    0.70    0.31    0.00   96.14
 
Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
dm-3            549.64         0.79      2181.69       2821    7749610
 
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.34    0.00    0.63    0.42    0.00   97.62
 
Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
dm-3            632.00         0.00      2344.00          0       2344
 
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           3.01    0.00    0.46    0.29    0.00   96.24
 
Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
dm-3            511.00         0.00      2082.50          0       2082
 
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           3.05    0.00    0.54    0.29    0.00   96.12
 
Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
dm-3            522.00         0.00      2044.00          0       2044
 
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.50    0.00    0.63    0.25    0.00   97.62
 
Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
dm-3            651.00         0.00      2367.00          0       2367
 
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.55    0.00    2.38    0.33    0.00   95.73
 
Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
dm-3            647.00         0.00      2350.00          0       2350
 
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.84    0.00    0.88    0.17    0.00   97.12

i use Rhel 7.2 and server installed from repo.mongodb.org
my rs.config is simple as apple

:PRIMARY> rs.config()
{
        "_id" : "sova",
        "version" : 8,
        "protocolVersion" : NumberLong(1),
        "members" : [
                {
                        "_id" : 1,
                        "host" : "db01:27017",
                        "arbiterOnly" : false,
                        "buildIndexes" : true,
                        "hidden" : false,
                        "priority" : 1,
                        "tags" : {
 
                        },
                        "slaveDelay" : NumberLong(0),
                        "votes" : 1
                },
                {
                        "_id" : 2,
                        "host" : "db02:27017",
                        "arbiterOnly" : false,
                        "buildIndexes" : true,
                        "hidden" : false,
                        "priority" : 4,
                        "tags" : {
 
                        },
                        "slaveDelay" : NumberLong(0),
                        "votes" : 1
                },
                {
                        "_id" : 3,
                        "host" : "db03:27017",
                        "arbiterOnly" : false,
                        "buildIndexes" : true,
                        "hidden" : false,
                        "priority" : 1,
                        "tags" : {
 
                        },
                        "slaveDelay" : NumberLong(0),
                        "votes" : 1
                }
        ],
        "settings" : {
                "chainingAllowed" : true,
                "heartbeatIntervalMillis" : 2000,
                "heartbeatTimeoutSecs" : 10,
                "electionTimeoutMillis" : 10000,
                "getLastErrorModes" : {
 
                },
                "getLastErrorDefaults" : {
                        "w" : 1,
                        "wtimeout" : 0
                },
                "replicaSetId" : ObjectId("57238520fd55a3bec9c1030b")
        }
}

and all servers configured with the same config file

# mongod.conf
 
 
storage:
  dbPath: /var/lib/mongo/
  journal:
    enabled: true
  engine: wiredTiger
  wiredTiger:
    engineConfig:
      directoryForIndexes: true
 
systemLog:
  verbosity: 0
  destination: file
  logAppend: true
  path: /var/log/mongodb/mongod.log
 
net:
  port: 27017
  bindIp: 0.0.0.0
 
processManagement:
  fork: true
  pidFilePath: /var/run/mongodb/mongod.pid
 
security:
  keyFile: /opt/noc/var/etc/mongo/mongo.key
  clusterAuthMode: keyFile
  authorization: enabled
 
replication:
  replSetName: sova
 
#operationProfiling:
 
#sharding:

as seen in production notes i move journal to separate disk

/dev/mapper/wal-wal                  12G  344M   12G   3% /wal
/dev/mapper/vg00-mongo_storage_idx   20G  1.5G   19G   8% /data/mongo_index
/dev/mapper/vg00-mongo               50G  5.6G   45G  12% /data/mongo

and make symlinks in /var/lib/mongo directory to appropriate locations.



 Comments   
Comment by Ramon Fernandez Marina [ 23/Jun/16 ]

Thanks for your report freeseacher. Please note that the SERVER project is for reporting bugs or feature suggestions for the MongoDB server. For MongoDB-related support discussion please post on the mongodb-user group or Stack Overflow with the mongodb tag, where your question will reach a larger audience. A question like this involving more discussion would be best posted on the mongodb-user group. See also our Technical Support page for additional support resources.

Thanks,
Ramón.

Comment by Aleksey Shirokih [ 23/Jun/16 ]

according to mongostat i have not very big load

insert query update delete getmore command % dirty % used flushes vsize  res qr|qw ar|aw netIn netOut conn  set repl                      time
    *0    *0   *830     *1     126   208|0     0.4   11.8       0  7.7G 6.8G   0|0   0|0   93k   518k  163 sova  SEC 2016-06-23T14:23:19+03:00
    *2     2   *942     *1     126   235|0     0.4   11.8       0  7.7G 6.8G   0|0   0|0   96k   587k  163 sova  SEC 2016-06-23T14:23:20+03:00
    *0    *0   *845     *1     129   216|0     0.4   11.8       0  7.7G 6.8G   0|0   0|0   96k   564k  163 sova  SEC 2016-06-23T14:23:21+03:00
    *0    *0   *895     *2     121   215|0     0.4   11.8       0  7.7G 6.8G   0|0   0|0   93k   526k  163 sova  SEC 2016-06-23T14:23:22+03:00
    *0    *0   *881     *3     149   276|0     0.4   11.8       0  7.7G 6.8G   0|0   0|1  112k   593k  163 sova  SEC 2016-06-23T14:23:23+03:00
    *2    *0   *798     *1     126   227|0     0.4   11.8       0  7.7G 6.8G   0|0   0|0   95k   506k  163 sova  SEC 2016-06-23T14:23:24+03:00
    *3    *0   *882     *0     336   444|0     0.5   11.8       0  7.7G 6.8G   0|0   0|0  210k   624k  163 sova  SEC 2016-06-23T14:23:25+03:00
    *0    *0   *820     *0     363   515|0     0.5   11.8       0  7.7G 6.8G   0|0   0|0  236k   618k  163 sova  SEC 2016-06-23T14:23:26+03:00
    *0    *0   *879     *0     366   492|0     0.5   11.8       0  7.7G 6.8G   0|0   0|0  232k   633k  163 sova  SEC 2016-06-23T14:23:27+03:00
    *7    *0   *915     *2     378   510|0     0.5   11.8       0  7.7G 6.8G   0|0   0|0  241k   722k  163 sova  SEC 2016-06-23T14:23:28+03:00
insert query update delete getmore command % dirty % used flushes vsize  res qr|qw ar|aw netIn netOut conn  set repl                      time
    *0    *0   *860     *0     234   337|0     0.5   11.8       0  7.7G 6.8G   0|0   0|0  157k   582k  163 sova  SEC 2016-06-23T14:23:29+03:00
    *2     2   *831     *1     247   394|0     0.5   11.8       0  7.7G 6.8G   0|0   0|0  172k   577k  163 sova  SEC 2016-06-23T14:23:30+03:00

and my secondary servers are in sync without lag

sova:PRIMARY> rs.printSlaveReplicationInfo()
source: db01:27017
        syncedTo: Thu Jun 23 2016 14:28:02 GMT+0300 (MSK)
        1 secs (0 hrs) behind the primary
source: db03:27017
        syncedTo: Thu Jun 23 2016 14:28:02 GMT+0300 (MSK)
        1 secs (0 hrs) behind the primary

Generated at Thu Feb 08 04:07:18 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.