[SERVER-32034] Replica Set primary becomes unresponsive with adaptive Service Executor Created: 19/Nov/17  Updated: 30/Oct/23  Resolved: 05/Jan/18

Status: Closed
Project: Core Server
Component/s: Networking, Stability
Affects Version/s: 3.6.0-rc4
Fix Version/s: 3.6.4, 3.7.1

Type: Bug Priority: Major - P3
Reporter: Josef Ahmad Assignee: Jonathan Reams
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File app-output.txt     PNG File dd.png     Java Source File load.java     HTML File load.java.html     File metrics.2017-11-18T17-47-56Z-00000     File metrics.2017-11-18T17-48-01Z-00000     File metrics.2017-11-18T17-48-09Z-00000     Text File primary.log     Text File secondary1.log     Text File secondary2.log     PNG File successful adaptive.png     PNG File threads started succesful.png     PNG File threads started unsuccessful.png     PNG File unsuccessful.png    
Issue Links:
Backports
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v3.6
Steps To Reproduce:
  • Start a three-node replica set (PSS) with mongod option --transportLayer asio --serviceExecutor adaptive
  • $ java -cp load:mongo-java-driver/mongo-java-driver/build/libs/mongo-java-driver-3.5.0.jar load host=172.31.34.92 op=ramp_update docs=1000000 threads=500 tps=10 qps=1000 dur=150 size=100 updates=10
Sprint: Platforms 2017-12-18, Platforms 2018-01-01, Platforms 2018-01-15
Participants:
Linked BF Score: 0

 Description   

Reproduced on version r3.6.0-rc4-41-ge608b8b349

On a three-node replica set, I generated traffic via the load.java attached. (Note the original load.java comes from from SERVER-30613 - https://jira.mongodb.org/secure/attachment/163112/load.java. I've modified the script to set a write concern of 3).

$ java -cp load:mongo-java-driver/mongo-java-driver/build/libs/mongo-java-driver-3.5.0.jar load host=172.31.34.92 op=ramp_update docs=1000000 threads=500 tps=10 qps=1000 dur=150 size=100 updates=10

The above runs successfully with the default (synchronous) settings.
With asynchronous service executor enabled, the primary becomes unresponsive as the update workload ramps up. This results in the secondaries starting an election and the primary stepping down.

For the test I used four AWS EC2 m4.4xlarge, one per replica set member, the fourth one for the traffic generator.

The snapshot below shows that the update workload starts at A and then performance degrades as the number of connections increases.

From log of the secondary that transitions to primary:

2017-11-18T17:51:39.122+0000 I ASIO     [NetworkInterfaceASIO-RS-0] Ending connection to host 172.31.34.92:27017 due to bad connection status; 1 connections to that host remain open
2017-11-18T17:51:39.122+0000 I REPL     [replication-1] Restarting oplog query due to error: ExceededTimeLimit: error in fetcher batch callback: Operation timed out. Last fetched optime (with hash): { ts:
 Timestamp(1511027488, 17), t: 1 }[1005646539139698744]. Restarts remaining: 3
2017-11-18T17:51:39.122+0000 I ASIO     [NetworkInterfaceASIO-RS-0] Connecting to 172.31.34.92:27017
2017-11-18T17:51:39.122+0000 I REPL     [replication-1] Scheduled new oplog query Fetcher source: 172.31.34.92:27017 database: local query: { find: "oplog.rs", filter: { ts: { $gte: Timestamp(1511027488,
17) } }, tailable: true, oplogReplay: true, awaitData: true, maxTimeMS: 60000, batchSize: 13981010, term: 1, readConcern: { afterClusterTime: Timestamp(1511027488, 17) } } query metadata: { $replData: 1,
$oplogQueryData: 1, $readPreference: { mode: "secondaryPreferred" } } active: 1 findNetworkTimeout: 65000ms getMoreNetworkTimeout: 10000ms shutting down?: 0 first: 1 firstCommandScheduler: RemoteCommandRe
tryScheduler request: RemoteCommand 160994 -- target:172.31.34.92:27017 db:local cmd:{ find: "oplog.rs", filter: { ts: { $gte: Timestamp(1511027488, 17) } }, tailable: true, oplogReplay: true, awaitData:
true, maxTimeMS: 60000, batchSize: 13981010, term: 1, readConcern: { afterClusterTime: Timestamp(1511027488, 17) } } active: 1 callbackHandle.valid: 1 callbackHandle.cancelled: 0 attempt: 1 retryPolicy: R
etryPolicyImpl maxAttempts: 1 maxTimeMillis: -1ms
2017-11-18T17:51:39.632+0000 I REPL     [replexec-0] Starting an election, since we've seen no PRIMARY in the past 10000ms

Attached logs, diagnostic data and application output.



 Comments   
Comment by Githook User [ 01/Mar/18 ]

Author:

{'email': 'jbreams@mongodb.com', 'name': 'Jonathan Reams', 'username': 'jbreams'}

Message: SERVER-32034 Pending threads shouldn't block starvation response in controller

(cherry picked from commit d65dde869399fe13d440be986209517ceea9efa3)
Branch: v3.6
https://github.com/mongodb/mongo/commit/7684df3b050d5d0ecf353ac93ffb44922bdcdb44

Comment by Githook User [ 05/Jan/18 ]

Author:

{'name': 'Jonathan Reams', 'username': 'jbreams', 'email': 'jbreams@mongodb.com'}

Message: SERVER-32034 Fix lint
Branch: master
https://github.com/mongodb/mongo/commit/3c71b635958e2e769c47fcabc714546917c34d10

Comment by Githook User [ 05/Jan/18 ]

Author:

{'name': 'Jonathan Reams', 'username': 'jbreams', 'email': 'jbreams@mongodb.com'}

Message: SERVER-32034 Pending threads shouldn't block starvation response in controller
Branch: master
https://github.com/mongodb/mongo/commit/d65dde869399fe13d440be986209517ceea9efa3

Comment by Jonathan Reams [ 18/Dec/17 ]

I believe what's going on here is that, under heavy load, pending threads are preventing the starvation avoidance in the executor controller from ever kicking in. Also, spurious wakeups were causing stuck thread detection from running at its required interval. Therefore we aren't able to spawn threads quickly enough to deal with the many long-running tasks in a majority-write load test.

In an unsuccessful run, we see enough threads started to handle the insert load, and then quick thread starvation while trying to handle the update load, we also see the number of pending threads spike just before the failure:

In a successful run, we see many more threads started by starvation avoidance, no pending threads at all, and much fewer stuck threads during the update than insert portion of the test:

Generated at Thu Feb 08 04:28:58 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.