[SERVER-24753] The balancer thread initialization is not interruptible Created: 23/Jun/16  Updated: 05/Jul/16  Resolved: 24/Jun/16

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.3.9
Fix Version/s: 3.3.9

Type: Bug Priority: Critical - P2
Reporter: Kaloian Manassiev Assignee: Kaloian Manassiev
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Sharding 17 (07/15/16)
Participants:
Linked BF Score: 0

 Description   

When the balancer thread is started, it tries to read the list of shards, talk to shards and acquire the balancer distributed lock. If any of these operations fails, it then sleeps for up to 60 seconds.

This sleep prevents replication stepdown from running and causes stepdown failures with the following error:

[js_test:replmonitor_bad_seed] 2016-06-16T20:19:19.945-0500 assert: command failed: {
[js_test:replmonitor_bad_seed] 2016-06-16T20:19:19.945-0500 	"ok" : 0,
[js_test:replmonitor_bad_seed] 2016-06-16T20:19:19.945-0500 	"errmsg" : "Could not acquire the global shared lock within the amount of time specified that we should step down for",
[js_test:replmonitor_bad_seed] 2016-06-16T20:19:19.945-0500 	"code" : 50
[js_test:replmonitor_bad_seed] 2016-06-16T20:19:19.945-0500 } : undefined

The call stacks show this thread:

 [2016/06/23 10:51:26.123] Thread 39 (Thread 0x7fcf8c9d5700 (LWP 4803)):
 [2016/06/23 10:51:26.123] #0  0x00007fcfc38162fd in pthread_join () from /lib64/libpthread.so.0
 [2016/06/23 10:51:26.123] #1  0x00007fcfc7ec9a37 in std::thread::join() ()
 [2016/06/23 10:51:26.123] #2  0x00007fcfc72745f8 in mongo::Balancer::joinThread() ()
 [2016/06/23 10:51:26.123] #3  0x00007fcfc6f73b50 in mongo::repl::ReplicationCoordinatorExternalStateImpl::shardingOnDrainingStateHook(mongo::OperationContext*) ()
 [2016/06/23 10:51:26.123] #4  0x00007fcfc6f894e0 in mongo::repl::ReplicationCoordinatorImpl::signalDrainComplete(mongo::OperationContext*) ()
 [2016/06/23 10:51:26.123] #5  0x00007fcfc6ffc9b8 in mongo::repl::SyncTail::oplogApplication() ()
 [2016/06/23 10:51:26.124] #6  0x00007fcfc6fe6d95 in mongo::repl::runSyncThread(mongo::repl::BackgroundSync*) ()
 [2016/06/23 10:51:26.124] #7  0x00007fcfc7ec9af0 in execute_native_thread_routine ()
 [2016/06/23 10:51:26.124] #8  0x00007fcfc3815aa1 in start_thread () from /lib64/libpthread.so.0
 [2016/06/23 10:51:26.124] #9  0x00007fcfc3562aad in clone () from /lib64/libc.so.6

waiting on the balancer initialization:

 [2016/06/23 10:51:33.020] Thread 7 (Thread 0x7f03bf243700 (LWP 5701)):
 [2016/06/23 10:51:33.020] #0  0x00007f03fd5af00d in nanosleep () from /lib64/libpthread.so.0
 [2016/06/23 10:51:33.020] #1  0x00007f040120c4d0 in mongo::sleepmicros(long long) ()
 [2016/06/23 10:51:33.021] #2  0x00007f040100d5af in mongo::Balancer::_mainThread() ()
 [2016/06/23 10:51:33.021] #3  0x00007f0401c5baf0 in execute_native_thread_routine ()
 [2016/06/23 10:51:33.021] #4  0x00007f03fd5a7aa1 in start_thread () from /lib64/libpthread.so.0
 [2016/06/23 10:51:33.021] #5  0x00007f03fd2f4aad in clone () from /lib64/libc.so.6



 Comments   
Comment by Githook User [ 24/Jun/16 ]

Author:

{u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}

Message: SERVER-24753 Make the balancer thread initialization interruptible
Branch: master
https://github.com/mongodb/mongo/commit/89bc462da9bd7c5edd6c54da58615a0cc8542ebd

Generated at Thu Feb 08 04:07:19 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.