[SERVER-30660] Replica set fsync blocks secondary reads significantly Created: 15/Aug/17  Updated: 09/Oct/17  Resolved: 15/Sep/17

Status: Closed
Project: Core Server
Component/s: Performance
Affects Version/s: 3.4.6
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Denis Orlov Assignee: Kelsey Schubert
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Linux version 2.6.32-431.el6.x86_64 (gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC) )


Operating System: ALL
Steps To Reproduce:

Works fine steps:
1. Setup java client with read preferences "primaryPreferred"
2. Check read performance (aprox. 2200 rps with avg. time 0.3 ms)
3. During fsync operation (every minute) response time dropped to 0.5 ms

Change secondary and primary via rs.stepDown - time pattern is the same

Works strange steps:
1. Setup java client with read preferences "secondaryPreferred"
2. Check read performance (aprox. 2200 rps with avg. time 0.3 ms)
3. During fsync operation (every minute) response time dropped to 10 ms (20 times slower)

Change secondary and primary via rs.stepDown - time pattern is the same

Looks like secondary node works incorrect during fsync events and blocks most read operations

Participants:

 Description   

I have next mongo configuration:

1. 2 data nodes in replica set
2. 1 arbiter for the replica set
3. Java based client

Both data nodes are configured next way:
replication:
oplogSizeMB: 1024
replSetName: arb

storage:
dbPath: /mnt/raid10/mongo
journal:
enabled: true
commitIntervalMs: 500
directoryPerDB: true
syncPeriodSecs: 60
engine: wiredTiger
wiredTiger:
engineConfig:
directoryForIndexes: true



 Comments   
Comment by Kelsey Schubert [ 15/Sep/17 ]

Hi dorlov,

Sorry for the delay getting back to you. I've examined the diagnostic.data and believe that the behavior you're observing is expected. During checkpoints, the system becomes I/O bound and queries are impacted. Due to the nature of replication, this bottleneck may have larger impact as reads queue behind the oplog applier.

From the provided data, I do not see anything to indicate a bug in the MongoDB server. For MongoDB-related support discussion please post on the mongodb-user group or Stack Overflow with the mongodb tag. A question like this involving more discussion would be best posted on the mongodb-users group.

Kind regards,
Kelsey

Comment by Denis Orlov [ 16/Aug/17 ]

Files are uploaded

Comment by Kelsey Schubert [ 15/Aug/17 ]

I've created a secure upload portal for you to use. Files uploaded to this portal are only visible to MongoDB employees investigating this issue and are routinely deleted after some time.

Comment by Kelsey Schubert [ 15/Aug/17 ]

Hi dorlov,

Thanks for reporting this behavior. So we can continue to investigate, would you please provide an archive of the diagnostic.data in the $dbpath? Please be sure to include diagnostic.data for both primary and secondary nodes so we can compare.

Thank you,
Thomas

Generated at Thu Feb 08 04:24:35 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.