[SERVER-32252] Secondary Disk Utilization Higher Issue! Created: 11/Dec/17  Updated: 09/Feb/18  Resolved: 18/Jan/18

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: 3.4.3, 3.4.4, 3.4.10
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: kang Assignee: Mark Agarunov
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Microsoft Word AWS EC2 server for testing.xlsx     PNG File counters.png     PNG File ftdc.png     PNG File hidden secondary.png     Zip Archive mongos__mongo.zip     PNG File secondary1.png     File xxx_logd1rs1__diagnostic.data.tar.gz     Zip Archive xxx_logd1rs1__mongo.zip     File xxx_logd1rs2__diagnostic.data.tar.gz     Zip Archive xxx_logd1rs2__mongo.zip     File xxx_logd1rs3__diagnostic.data.tar.gz     Zip Archive xxx_logd1rs3__mongo.zip     File xxx_logd2rs1__diagnostic.data.tar.gz     Zip Archive xxx_logd2rs1__mongo.zip     File xxx_logd2rs2__diagnostic.data.tar.gz     Zip Archive xxx_logd2rs2__mongo.zip     File xxx_logd2rs3__diagnostic.data.tar.gz     Zip Archive xxx_logd2rs3__mongo.zip    
Operating System: ALL
Steps To Reproduce:

Always occur

Participants:

 Description   

The cancels value of the replication counters of the first secondary increases steadily.

And the disk utilization value of secondary server is 10% higher than primary and hidden secondary.

test langunge : java (mongo java driver 2.13.2 & 3.5), python (pymongo 3.4)
file format : xfs

config = {
_id : "rs1",
protocolVersion: 1,
writeConcernMajorityJournalDefault: true,
members: [
{ _id:0, host:"mongod1rs1:31001",priority:3 },
{ _id:1, host:"mongod1rs2:31001",priority:3 },
{ _id:2, host:"mongod1rs3:31001",priority:0,hidden:1 }
],
settings: {
chainingAllowed: 1
}
};
rs.initiate(config);

--> Even if the chainingAllowed option is set to 0, it is the same.



 Comments   
Comment by Mark Agarunov [ 18/Jan/18 ]

Hello dongho,

Thank you for the information. Unfortunately the tools we use to analyze the diagnostic data are not currently publicly available. Alternatively, this data can be collected in json format which can then be used with other tooling.

Please note that SERVER project is for reporting bugs or feature suggestions for the MongoDB server. For MongoDB-related support discussion please post on the mongodb-user group or Stack Overflow with the mongodb tag. A question like this involving more discussion would be best posted on the mongodb-user group.

Thanks,
Mark

Comment by kang [ 13/Dec/17 ]

Hello Mark

Is it normal to increase the cancel counters at protocol = 1? I wonder why.
And can you share the diagnostic.data directory analysis tool?

ps. Secondary disk usage of live servers equipment is high.

Thanks~

Comment by Mark Agarunov [ 12/Dec/17 ]

Hello dongho,

Thank you for providing this information. After looking over the diagnostic data, It seems that the disk utilization is about equal across all nodes, with the primaries both at a slightly lower utilization than the secondaries, but not meaningful difference between nodes of the same type:

Additionally, the cancel counters seem consistent among the nodes and within normal ranges:

Please let me know if there is something I missed in this and if this is causing any adverse effects for you.

Thanks,
Mark

Comment by kang [ 12/Dec/17 ]

mongod config file... information (xxx_logd1rs2)

storage:
dbPath: "/dbdata/dbfile_log1"
engine: "wiredTiger"
directoryPerDB: true
syncPeriodSecs: 60
journal:
enabled: true
wiredTiger:
engineConfig:
cacheSizeGB: 4
#configString: "cache_size=8GB,config_base=true,eviction_dirty_target=20,eviction_dirty_trigger=40,eviction_target=70,eviction_trigger=95,eviction=(threads_min=1,threads_max=2),checkpoint_sync=true,checkpoint=(wait=60,log_size=0),log=(enabled=false,archive=false)"
statisticsLogDelaySecs: 0
directoryForIndexes: true
collectionConfig:
blockCompressor: snappy
indexConfig:
prefixCompression: true
systemLog:
destination: syslog
logAppend: true
timeStampFormat: iso8601-utc
processManagement:
fork: false
operationProfiling:
slowOpThresholdMs: 100
mode: "slowOp"
sharding:
clusterRole : "shardsvr"
replication:
oplogSizeMB: 1024
replSetName: "rs1"
net:
port: 31001
wireObjectCheck: false
unixDomainSocket:
enabled: true
security:
keyFile: "/mongodb/dbconf/Key_xxx"
authorization: "enabled"

Comment by kang [ 12/Dec/17 ]

Newly tested.
See the Mongolian log in the attachment and "diagnostic.data".

*AWS EC2 server for testing.xlsx *

mongos__mongo.zip xxx_logd1rs1__mongo.zip xxx_logd1rs2__mongo.zip xxx_logd1rs3__mongo.zip xxx_logd2rs1__mongo.zip xxx_logd2rs2__mongo.zip xxx_logd2rs3__mongo.zip xxx_logd1rs1__diagnostic.data.tar.gz xxx_logd1rs2__diagnostic.data.tar.gz xxx_logd1rs3__diagnostic.data.tar.gz xxx_logd2rs1__diagnostic.data.tar.gz xxx_logd2rs2__diagnostic.data.tar.gz xxx_logd2rs3__diagnostic.data.tar.gz

Comment by Mark Agarunov [ 11/Dec/17 ]

Hello dongho,

Thank you for the report. To get a better idea of what may be causing this, could you please provide the following:

  • The complete logs from all affected mongod nodes
  • Please archive (tar or zip) the dbpath/diagnostic.data directory from all affected nodes and upload the archives to this ticket

This should provide some insight into why you're seeing increased disk usage on the secondary.

Thanks,
Mark

Comment by kang [ 11/Dec/17 ]

Reference:
If protocolVersion = 0, the "db.serverStatus().metrics.repl.executor.cancels" value is not incremented.

Generated at Thu Feb 08 04:29:42 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.