[SERVER-28733] running a 3-member replicaset with --nojournal and wired tiger makes operations that request (w:2 or w:3) and j=false very slow Created: 11/Apr/17  Updated: 21/Jun/17  Resolved: 24/May/17

Status: Closed
Project: Core Server
Component/s: Replication, WiredTiger
Affects Version/s: 3.2.12
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Tudor Aursulesei Assignee: Kelsey Schubert
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-27546 Wiredtiger checkpoint is placed too o... Closed
Operating System: ALL
Participants:

 Description   

I'm running this on a 3 member replicaset, on different machines.

/usr/bin/mongod --replSet rs --bind_ip 192.168.1.101 --dbpath /ramcache --port 35001 --logpath /var/log/mongodb/minimongo.log --storageEngine wiredTiger --logappend --wiredTigerCacheSizeGB 1 --nojournal

If i run them all with --nojournal, the following operation takes a few seconds to run:

from pymongo import MongoClient
pm = MongoClient('192.168.1.101: 192.168.1.102:35001, 192.168.1.101:35001', w=3, wtimeout=480000, j=False)
pm.testdb.testcol.insert_one({"asdf": "123"})
# this one here
print pm.testdb.testcol.update_one({"asdf": "123"}, {"$set": {"value": 1}})

Running db.currentOp() at the right time, can get the following info:

{
...
        "secs_running" : 2,
...
        "msg" : "waiting for write concern",
...
}

However, if i remove the --nojournal flag, it runs instantly, even with w=3. I've also noticed that if the primary member is running with --nojournal and the other 2 secondaries without --nojournal it still runs ok. I haven't managed to replicate this behaviour on a single machine with 3 mongod instances on different ports, so i can't completely say it's not a network issue, but i'm not sure how to investigate further.



 Comments   
Comment by Kelsey Schubert [ 24/May/17 ]

Hi thestick613,

We haven’t heard back from you for some time, so I’m going to mark this ticket as resolved.

Regards,
Thomas

Comment by Kelsey Schubert [ 11/Apr/17 ]

Hi thestick613,

So we can confirm or rule out my hypothesis, would you please upload the diagnostic.data and complete log files of the affected primary and identify exactly when the slow operation is recorded?

Thank you,
Thomas

Comment by Tudor Aursulesei [ 11/Apr/17 ]

Are you sure? I'm only doing two operations, and only the second one is slow.

Comment by Kelsey Schubert [ 11/Apr/17 ]

Hi thestick613,

When journaling is disabled, it is expected that the primary node executes a checkpoint whenever there are multiple write threads as this checkpoint ensures that writes to the replica set are durable. This behavior likely explains the slow behavior. Please review SERVER-27546 for additional details.

Thank you,
Thomas

Generated at Thu Feb 08 04:18:55 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.