[SERVER-32703] Secondary can take a couple minutes to re-establish sync source after network partition Created: 15/Jan/18 Updated: 21/Feb/20 Resolved: 21/Feb/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | JavaScript, Replication |
| Affects Version/s: | 3.6.1 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Denis Rystsov | Assignee: | Evin Roesle |
| Resolution: | Duplicate | Votes: | 4 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||
| Steps To Reproduce: | Please see the description and the doc here: https://github.com/rystsov/perseus/tree/master/mongodb |
||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||||||||||
| Description |
|
Hi folks, I was testing how a 3-nodes cluster of MongoDB behaves (from a client perspective) when one of the nodes is isolated from the cluster and observed an anomaly. TL;DR: When a node is isolated, all clients hang for almost two minutes (unexpected behaviour, too long downtime). A client running on the isolated node continues being down (expected behaviour), but the clients on healthy nodes become unexpectedly slow (RPS drop to less than one operation per second from 282 before the isolation). Repo with docker+javascript scripts to reproduce the problem: https://github.com/rystsov/perseus/tree/master/mongodb Testing environment OS: Ubuntu 17.10 Three nodes (mongo1, mongo2, mongo3) form a MongoDB cluster running mongod in a single replica set. Besides mongod, each node has a NodeJS app which:
Also there is a fourth node (client1) which fetches the stat and dumps it every second in the following format:
The first column is the number of passed seconds since the beginning of the experiment; the next three columns contain counter of successful iterations of the clients located on each node per last second, following three columns are counters of errors and the last one is time. Client (db/remote-tester) Client is a NodeJS app using the following connection URL: mongodb://mongo1,mongo2,mongo3/?replicaSet=rs0&autoReconnect=false&socketTimeoutMS=10000&connectTimeoutMS=10000. Each write operation has writeConcern "majority" while each read operation has readConcern "linearizable". When a client runs into a problem it closes the current connection and opens a new one. Each client uses a different key to avoid collisions. Isolation The network isolation is simulated with iptables:
During the isolation, the traffic is blocked between mongo1 and (mongo2 & mongo3), but client1 can see all nodes. Results Let's take a look on the logs of the aggregating client (client1.log):
As you can see, clients on mongo1, mongo2, mongo3 stopped doing progress after the isolation. 117 seconds later mongo2 and mongo3 "recovered" but they became very slow. When the connection was restored it took 38s until mongo2, mongo3 recovered to their previous pace, but mongo1 remained silent (unexpected behaviour). Logs Please see the attached "slow-full.tar.gz" archive for logs. client1.log Log of the aggregating client. Its format was described above. mongo(1|2|3).tester.log Log of the actual client. In happy case it looks like:
Each successful iteration is represented by a sequence of lines (reading, creating|updating, ok). During a connection you'll see "connecting" and "ok" records. "stat" record corresponds to the fetch request from the aggregating client and helps to understand how long the previous operation hangs. If there are errors during the execution, then they also are dumped into this log like:
mongo(1|2|3).log Log of the mongod processes. Why do I think it's more a bug rather than a misconfiguration? The bug isn't always reproducible: sometimes the pace of mongo2 and mongo3 doesn't degrade and when the connection is restored the mongo1's client starts working. See the attached "good-full.tar.gz". I hope this report will help you! |
| Comments |
| Comment by Evin Roesle [ 21/Feb/20 ] |
|
Closing this ticket as we believe this issue has been fixed by
|
| Comment by Spencer Brody (Inactive) [ 17/Apr/18 ] |
|
We already don't propose a node as a sync source candidate if we believe it to be down: https://github.com/mongodb/mongo/blob/e0562dc07a6aaf2993f0236cc443ee9b83b5e522/src/mongo/db/repl/topology_coordinator.cpp#L303 The fix that is needed is that there is retry logic built into the OplogFetcher that continues to run even if the consensus system has determined the sync source to be down. |
| Comment by Siyuan Zhou [ 30/Jan/18 ] |
|
|
| Comment by Denis Rystsov [ 16/Jan/18 ] |
|
By the way, as host system I used: |
| Comment by Denis Rystsov [ 16/Jan/18 ] |
|
I managed to reproduce the good case: good-full.data.tar.gz |
| Comment by Denis Rystsov [ 16/Jan/18 ] |
|
Hi Bruce, I reproduced the "slow" case but the "good" case is more rare so I didn't managed to catch it, but I'll keep trying. Please find the logs and the data dir attached. slow-full.data.tar.gz |
| Comment by Bruce Lucas (Inactive) [ 15/Jan/18 ] |
|
Hi Denis, Thanks for the detailed report including logs. Would it be possible for you to reproduce the "good" and "slow" cases again and include both logs and the archived content of the $dbpath/diagnostic.data directory from each node? This will give us more detailed information about the internal operation of the server. Note: to ensure that all information has been written to diagnostic.data, please shut down all nodes before copying the files. Thanks, |