[SERVER-24429] Initial sync from a WiredTiger instance locks the server Created: 07/Jun/16  Updated: 15/Oct/16  Resolved: 15/Oct/16

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: 3.0.12
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Ralf Kistner Assignee: Ramon Fernandez Marina
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Zip Archive logs.zip    
Operating System: ALL
Participants:

 Description   

We're currently busy migrating all our servers to WiredTiger on MongoDB 3.0.12, and are running into performance / lock issues with the initial sync from a WT server.

Our setup:

  • A replicaset with 4 members (1 hidden, 1 with no votes).
  • All running MongoDB 3.0.12.
  • Database size is over 1TB (MMAP), around 300GB with WT.
  • No sharding.

We've found that performing the initial sync from a MMAP server to a new WT server completes without issue. However, the initial sync from a WT server to another WT server gives significant performance issues.

The issue seems to be specifically when it's doing a sync on large collections (around 100GB in size). After a while the server being synced from becomes completely unresponsive for multiple hours. During this time the replication lag on the server builds up if it's a secondary (it doesn't seem to be replicating at all anymore), most queries to it are completely unresponsive, and it's often not even possible to log into the mongo shell. It seems like the initial sync query is holding a global lock, and not yielding for a very long time. During this time there is also essentially no network traffic on the server.

Since most of our application does not actively use the secondaries (which we are performing the initial sync from), it does not affect the majority of our system. However, there are a few queries that we do run on our secondaries, which are affected by this.

When doing the initial sync from a MMAP server, we do not experience these issues at all.

We have not tested with MongoDB 3.2.x yet.



 Comments   
Comment by Ramon Fernandez Marina [ 15/Oct/16 ]

Thanks for the update ralf.kistner@gmail.com, glad to hear that 3.2.10 solves your issue.

The team has been working hard chasing down eviction-related issues in recent versions, and I'm happy to report those have been fixed in 3.2.10 – you can read the details in this comment. Sorry your setup was affected, and thanks for your patience while we investigated and fixed these.

I'm going to close this ticket now. If you find any other issues please feel free to open a new SERVER ticket.

Thanks,
Ramón.

Comment by Ralf Kistner [ 03/Oct/16 ]

Seems to be related to https://jira.mongodb.org/browse/SERVER-26055. The issue was in general caused by queries filling up the cache, which always happened when replicating, as well as some other cases. We had the same (or worse) issues with 3.2.9, but 3.2.10 seems to resolve it.

Comment by Ralf Kistner [ 08/Jun/16 ]

We managed to sync successfully from both primary and secondary MMAPv1 nodes as source for the initial sync. We're also having trouble with both primary and secondary WT nodes as source.

We also have three other similar replicasets in other datacenters that did not show the issue. The main differences are in the volume of data (the others have less), and the speed of the disks (others have SSD's, the replicaset with issues is still using spinning disks).

We'll need to do some work before we are able to upgrade to MongoDB 3.2 (mostly lots of testing to make sure everything is compatible), so we're not able to do that immediately, but I'll let you know when we do.

Comment by Ramon Fernandez Marina [ 08/Jun/16 ]

Thanks for the additional information. Can you specify what kind of node (primary, secondary) was the MMAPv1 node you successfully synced from? I'd like to use that as a control in a reproduction attempt...

Please note that, in my experience, some times an issue is triggered by a specific data distribution, so even if everything "looks right" in my reproducer I may not trigger the same behavior. Since you appear to be stuck because if this issue I'd recommend you consider the following way forward:

  • Upgrade a node to MongoDB 3.2.7 with WT
  • Sync that node from an MMAPv1 node
  • Upgrade a second node to MongoDB 3.2.7 and sync it from the WT node

There was a lot of work done in the WiredTiger engine for 3.2, and we've seen much more stable and performant behavior so far. However if the problem persists, these 3.2 nodes will have recorded diagnostic data that will make troubleshooting a lot easier. If you'd like to explore this route please let us know how it goes.

Thanks,
Ramón.

Comment by Ralf Kistner [ 08/Jun/16 ]

The source node is mostly unresponsive, and shows the replicating lag (the source is a secondary, and it doesn't seem to be able to replicate from the primary anymore). No issues on the sink node. The logs I attached are all from the source node.

It's been unresponsive for the last 30 hours or so, but we're waiting a little longer to see if the replication can complete.

Comment by Ramon Fernandez Marina [ 08/Jun/16 ]

Thanks for the logs ralf.kistner@gmail.com. Unfortunately they don't contain any useful information, since they don't include data from the beginning of an initial sync.

I want to confirm that the issue you're seeing is that, when a WT node initial syncs from another WT node:

  • the source node for the sync becomes unresponsive
  • the sink node shows increased replication lag

Is this correct?

But when the source node for the initial sync is running MMAPv1 this problem doesn't appear. The question is, is that MMAPv1 node a primary or also a secondary? I'm trying to see if the issue is related to the storage engine or to the state of the source node in the replica set.

Thanks,
Ramón.

Comment by Ralf Kistner [ 07/Jun/16 ]

Thanks for the response.

I attached the logs for the last 10 minutes or so. The server seems to be locked for the last 8 hours, so I don't have any stats from before that.

I also included the output of currentOp(). These are a few hundred ops, all of which seem to be waiting for the lock for hours.

I also noticed that I am able to log in with the mongo shell (and get these stats) using our monitoring user, but not our "root" user.

Comment by Ramon Fernandez Marina [ 07/Jun/16 ]

Thanks for your report ralf.kistner@gmail.com; the behavior you describe is not expected and hasn't been reported before, so we'll need to gather more information to investigate. There are two options:

  1. Collect diagnostic information on the WT node that locks while it happens; this can be done by running the following two lines from a bash shell (assuming you're using Linux):

    mongo --eval "while(true) {print(JSON.stringify(db.serverStatus({tcmalloc:1}))); sleep(1000)}" >ss.log &
    iostat -k -t -x 1 >iostat.log &
    

    You'd need to start the data collection process on the node used as a source for initial sync, then run the initial sync from another node, and continue to collect this information for 5-10 minutes after the source node becomes blocked. This will create two files, iostat.log and ss.log, that you'd need to upload here.

  2. Upgrade to 3.2.6 (3.2.7 is expected before noon EDT if you can wait) and try again. In MongoDB 3.2 the diagnostic data is collected automatically, so if the problem persists you'd need to upload the contents of the diagnostic.data directory inside the dbpath of the affected node.

In the meantime, if you have logs for the two nodes involved in the initial sync please upload them and we'll start looking for clues.

Thanks,
Ramón.

Generated at Thu Feb 08 04:06:20 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.