[SERVER-30475] Fix non-determinism in service_executor_adaptive_test Created: 02/Aug/17  Updated: 06/Dec/22  Resolved: 02/Nov/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Jonathan Reams Assignee: Backlog - Service Architecture
Resolution: Won't Do Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Assigned Teams:
Service Arch
Operating System: ALL
Backport Requested:
v4.0, v3.6
Sprint: Platforms 2017-09-11, Platforms 2017-11-13, Platforms 2017-12-04, Platforms 2017-12-18, Platforms 2018-01-01, Platforms 2018-04-23
Participants:
Linked BF Score: 15

 Description   

The service_executor_adaptive_test unittest's StuckThreadTest is flakey on windows - stuck thread detection does appear to happen, but it doesn't launch the right number of threads in time:

[cpp_unit_test:service_executor_adaptive_test] 2017-08-02T07:43:55.918+0000 2017-08-02T07:43:55.919+0000 I -        [main] 	 going to run test: TestStuckThreads
[cpp_unit_test:service_executor_adaptive_test] 2017-08-02T07:43:55.918+0000 2017-08-02T07:43:55.919+0000 I -        [main] wait for executor to finish starting
[cpp_unit_test:service_executor_adaptive_test] 2017-08-02T07:43:55.918+0000 2017-08-02T07:43:55.919+0000 I EXECUTOR [worker-1] Starting new database worker thread 1
[cpp_unit_test:service_executor_adaptive_test] 2017-08-02T07:43:55.918+0000 2017-08-02T07:43:55.919+0000 I -        [worker-1] Ran callback
[cpp_unit_test:service_executor_adaptive_test] 2017-08-02T07:43:55.918+0000 2017-08-02T07:43:55.919+0000 I -        [main] Scheduling 6 blocked tasks
[cpp_unit_test:service_executor_adaptive_test] 2017-08-02T07:43:55.918+0000 2017-08-02T07:43:55.919+0000 I -        [worker-1] waiting on blocked mutex
[cpp_unit_test:service_executor_adaptive_test] 2017-08-02T07:43:55.918+0000 2017-08-02T07:43:55.919+0000 I -        [main] Waiting for executor to start new threads
[cpp_unit_test:service_executor_adaptive_test] 2017-08-02T07:43:55.918+0000 2017-08-02T07:43:55.919+0000 I -        [worker-1] Ran callback
[cpp_unit_test:service_executor_adaptive_test] 2017-08-02T07:43:55.924+0000 2017-08-02T07:43:55.924+0000 I EXECUTOR [worker-controller] Starting worker thread to avoid starvation.
[cpp_unit_test:service_executor_adaptive_test] 2017-08-02T07:43:55.924+0000 2017-08-02T07:43:55.924+0000 I EXECUTOR [worker-2] Starting new database worker thread 2
[cpp_unit_test:service_executor_adaptive_test] 2017-08-02T07:43:55.924+0000 2017-08-02T07:43:55.924+0000 I -        [worker-2] waiting on blocked mutex
[cpp_unit_test:service_executor_adaptive_test] 2017-08-02T07:43:55.924+0000 2017-08-02T07:43:55.924+0000 I -        [worker-2] Ran callback
[cpp_unit_test:service_executor_adaptive_test] 2017-08-02T07:43:56.233+0000 2017-08-02T07:43:56.234+0000 I EXECUTOR [worker-controller] Detected blocked worker threads, starting new reserve threads to unblock service executor
[cpp_unit_test:service_executor_adaptive_test] 2017-08-02T07:43:56.233+0000 2017-08-02T07:43:56.234+0000 I EXECUTOR [worker-3] Starting new database worker thread 3
[cpp_unit_test:service_executor_adaptive_test] 2017-08-02T07:43:56.233+0000 2017-08-02T07:43:56.234+0000 I -        [worker-3] waiting on blocked mutex
[cpp_unit_test:service_executor_adaptive_test] 2017-08-02T07:43:56.233+0000 2017-08-02T07:43:56.234+0000 I -        [worker-3] Ran callback
[cpp_unit_test:service_executor_adaptive_test] 2017-08-02T07:43:56.434+0000 2017-08-02T07:43:56.434+0000 E -        [main] Throwing exception: Expected exec->threadsRunning() == waitFor + 1 (3 == 4) @src\mongo\transport\service_executor_adaptive_test.cpp:197
[cpp_unit_test:service_executor_adaptive_test] 2017-08-02T07:43:56.434+0000 2017-08-02T07:43:56.434+0000 I -        [worker-1] Ran callback
[cpp_unit_test:service_executor_adaptive_test] 2017-08-02T07:43:56.434+0000 2017-08-02T07:43:56.434+0000 I -        [worker-2] Ran callback
[cpp_unit_test:service_executor_adaptive_test] 2017-08-02T07:43:56.434+0000 2017-08-02T07:43:56.434+0000 I -        [worker-3] Ran callback
[cpp_unit_test:service_executor_adaptive_test] 2017-08-02T07:43:56.920+0000 2017-08-02T07:43:56.920+0000 I -        [main] FAIL: TestStuckThreads	Expected exec->threadsRunning() == waitFor + 1 (3 == 4) @src\mongo\transport\service_executor_adaptive_test.cpp:197

Currently the test ifdef's the failing assertion out on Windows. We need to figure out why this timing issue occurs and fix it.



 Comments   
Comment by Lauren Lewis (Inactive) [ 02/Nov/21 ]

The Service Arch team is in the process of cleaning up tickets in the backlog. This ticket has not been updated in two years so we are closing it. Please reopen if you think this change is valuable.

Comment by Benjamin Caimano (Inactive) [ 28/Mar/18 ]

It looks like maybe this is resolved?

Comment by Githook User [ 01/Mar/18 ]

Author:

{'email': 'henrik.edin@mongodb.com', 'name': 'Henrik Edin', 'username': 'henrikedin'}

Message: SERVER-30475 Fix race in service executor adaptive where we might terminate threads below the reserved threshold.
Fix so the controller routine in the adaptive executor is protected against spurious wakeups.

(cherry picked from commit 56362631390536f111f86a384e90e76631312f25)
Branch: v3.6
https://github.com/mongodb/mongo/commit/0954f10a878c2157a1ca325ddbcafd7b75a39dc3

Comment by Githook User [ 09/Jan/18 ]

Author:

{'email': 'henrik.edin@mongodb.com', 'name': 'Henrik Edin', 'username': 'henrikedin'}

Message: SERVER-30475 Improvements to service_executor_adaptive_test, tests are still non deterministic, timing dependant and failing from time to time so they are still disabled.
Branch: master
https://github.com/mongodb/mongo/commit/3bbd4109dd368f59280f174e37f77131f932b52b

Comment by Githook User [ 20/Dec/17 ]

Author:

{'name': 'Henrik Edin', 'email': 'henrik.edin@mongodb.com', 'username': 'henrikedin'}

Message: SERVER-30475 Fix race in service executor adaptive where we might terminate threads below the reserved threshold.
Fix so the controller routine in the adaptive executor is protected against spurious wakeups.
Branch: master
https://github.com/mongodb/mongo/commit/56362631390536f111f86a384e90e76631312f25

Comment by Githook User [ 09/Aug/17 ]

Author:

{'username': 'acmorrow', 'email': 'acm@mongodb.com', 'name': 'Andrew Morrow'}

Message: SERVER-30475 Disable the adaptive SE tests until they are deterministic
Branch: master
https://github.com/mongodb/mongo/commit/b1fd8b5773821dee8b62d1e9a4d2595b29849b54

Generated at Thu Feb 08 04:23:57 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.