[SERVER-27078] Race in ShardRegistry initialization causes it to not update the config server connection string Created: 16/Nov/16  Updated: 05/Apr/17  Resolved: 07/Dec/16

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.4.0-rc3
Fix Version/s: 3.4.1, 3.5.1

Type: Bug Priority: Major - P3
Reporter: Randolph Tan Assignee: Misha Tyulenev
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Completed:
Steps To Reproduce:

Build a custom mongos with this change: add a sleepsecs(5) right before this line:

https://github.com/mongodb/mongo/blob/r3.4.0-rc2/src/mongo/s/client/shard_registry.cpp#L187

  1. Deploy config server replica sets with 3 members
  2. Run mongos, but only pass one of the members in the configdb parameter. Example:

    ./mongos --port 20005 --configdb test-configRS/ren-desktop:20001
    

Logs like these will begin to show up:

2016-11-16T17:10:16.199-0500 I ASIO     [NetworkInterfaceASIO-ShardRegistry-0] Failed to connect to ren-desktop:20003 - ShardNotFound: No shard found for host: ren-desktop:20003
2016-11-16T17:10:16.199-0500 D -        [shard registry reload] User Assertion: 70:could not get updated shard list from config server due to No shard found for host: ren-desktop:20003 src/mongo/s/client/shard_registry.cpp 325

Sprint: Sharding 2016-12-12
Participants:
Linked BF Score: 0

 Description   

Description of race:

  1. Start mongos with just a single node specified in --configdb
  2. ShardRegistry::init() gets called.
  3. ShardFactory::createShard gets called for "config" and the initial config server seed string (ref).
  4. The ReplicaSetMonitor for the config replica sets gets created.
  5. ReplicaSetMonitor (RSM) reload thread discovers the other nodes in the CSRS.
  6. RSM tries to update the ShardRegistry via the synchronous update hook. However, it fails to update because the "config" entry hasn't been created yet (ref).
  7. ShardRegistry populates the "config" shard entry (ref).


 Comments   
Comment by Githook User [ 07/Dec/16 ]

Author:

{u'username': u'mikety', u'name': u'Misha Tyulenev', u'email': u'misha@mongodb.com'}

Message: SERVER-27078 fix race in ShardRegistry initialization

(cherry picked from commit 77147629b714b1e062c1b406e0aef193cfca36a8)
Branch: v3.4
https://github.com/mongodb/mongo/commit/a2f78bd51b6390f8e0f151b7a92f7f34567018cb

Comment by Githook User [ 07/Dec/16 ]

Author:

{u'username': u'mikety', u'name': u'Misha Tyulenev', u'email': u'misha@mongodb.com'}

Message: SERVER-27078 fix race in ShardRegistry initialization
Branch: master
https://github.com/mongodb/mongo/commit/77147629b714b1e062c1b406e0aef193cfca36a8

Generated at Thu Feb 08 04:14:06 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.