[SERVER-25249] High I/O load in secondary nodes with WiredTiger engine (caused by journaling) Created: 25/Jul/16 Updated: 08/Jan/21 Resolved: 19/Aug/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | WiredTiger |
| Affects Version/s: | 3.2.7 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | stronglee | Assignee: | Kelsey Schubert |
| Resolution: | Done | Votes: | 0 |
| Labels: | RF | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Operating System: | ALL | ||||||||||||
| Steps To Reproduce: | Simply insert data heavily. |
||||||||||||
| Participants: | |||||||||||||
| Description |
|
Hi, Primary node:
Secondary node:
From the above we know that pwrite calls in secondary node are nearly twice of primary. And fdatasync calls are as many as pwrite calls and are far more than primary's. Is this the reason why secondaries' I/O load increase 500% ? Is it a bug or a design? |
| Comments |
| Comment by Kelsey Schubert [ 19/Aug/16 ] |
|
Hi strlee, Thanks for the confirmation. We expect that that secondary would self-throttle if there was additional I/O load present. If secondaries are falling behind and you do not see the secondaries self-throttle, please open a new ticket and we will investigate. Kind regards, |
| Comment by stronglee [ 19/Aug/16 ] |
|
Hi Thomas Schubert, |
| Comment by Kelsey Schubert [ 18/Aug/16 ] |
|
Hi strlee, Thanks for the detailed report of this behavior. It is by design that replication on secondaries will sync the journal faster than it would for the same load on a primary. The syncing is done asynchronously, and so it should not affect replication throughput unless the disk is saturated. My understanding is that you are not observing any adverse affects as a result of the increased I/O load on the secondaries. Is this correct? Kind regards, |
| Comment by stronglee [ 27/Jul/16 ] |
|
I tried set journalCommitInterval=500, the pwrite/fdatasync calls didn't go down too much (approximately from 300 to 200). |