[SERVER-28733] running a 3-member replicaset with --nojournal and wired tiger makes operations that request (w:2 or w:3) and j=false very slow Created: 11/Apr/17 Updated: 21/Jun/17 Resolved: 24/May/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication, WiredTiger |
| Affects Version/s: | 3.2.12 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Tudor Aursulesei | Assignee: | Kelsey Schubert |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Operating System: | ALL | ||||||||
| Participants: | |||||||||
| Description |
|
I'm running this on a 3 member replicaset, on different machines.
If i run them all with --nojournal, the following operation takes a few seconds to run:
Running db.currentOp() at the right time, can get the following info:
However, if i remove the --nojournal flag, it runs instantly, even with w=3. I've also noticed that if the primary member is running with --nojournal and the other 2 secondaries without --nojournal it still runs ok. I haven't managed to replicate this behaviour on a single machine with 3 mongod instances on different ports, so i can't completely say it's not a network issue, but i'm not sure how to investigate further. |
| Comments |
| Comment by Kelsey Schubert [ 24/May/17 ] |
|
Hi thestick613, We haven’t heard back from you for some time, so I’m going to mark this ticket as resolved. Regards, |
| Comment by Kelsey Schubert [ 11/Apr/17 ] |
|
Hi thestick613, So we can confirm or rule out my hypothesis, would you please upload the diagnostic.data and complete log files of the affected primary and identify exactly when the slow operation is recorded? Thank you, |
| Comment by Tudor Aursulesei [ 11/Apr/17 ] |
|
Are you sure? I'm only doing two operations, and only the second one is slow. |
| Comment by Kelsey Schubert [ 11/Apr/17 ] |
|
Hi thestick613, When journaling is disabled, it is expected that the primary node executes a checkpoint whenever there are multiple write threads as this checkpoint ensures that writes to the replica set are durable. This behavior likely explains the slow behavior. Please review Thank you, |