[SERVER-8786] Race condition when setting ShardingConnectionHook on mongod connection pools Created: 28/Feb/13  Updated: 11/Jul/16  Resolved: 04/Mar/13

Status: Closed
Project: Core Server
Component/s: Security, Sharding
Affects Version/s: 2.2.3
Fix Version/s: 2.2.4, 2.4.0-rc2

Type: Bug Priority: Major - P3
Reporter: Spencer Brody (Inactive) Assignee: Spencer Brody (Inactive)
Resolution: Done Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
Operating System: ALL
Participants:

 Description   

We've seen a few cases where customers bring up a sharded cluster running with authentication and the shard primaries get errors querying the config servers saying that they are unauthenticated. This causes the system to be unusable. It appears as though the mongods aren't even trying to authenticate to the config servers, even though they successfully authenticate to the other nodes in their replica set. The problem seems to be that the ShardingConnectionHook, which also handes authenticating all connections used by sharding, isn't being set on the pool. Restarting the mongods seems to resolve the issues, which further supports my theory that this is a race condition.

Investigation into the code brings us to the following function in d_state.cpp:

    void ShardedConnectionInfo::addHook() {
        static bool done = false;
        if (!done) {
            LOG(1) << "adding sharding hook" << endl;
            pool.addHook(new ShardingConnectionHook(false));
            shardConnectionPool.addHook(new ShardingConnectionHook(true));
            done = true;
        }
    }

This is the code that is used to set the connection hook on the pools. This code is not thread-safe and there's a potential race condition that could lead to 2 connections calling addHook at the same time. Since addHook is basically just an add to an stl::list, and stl isn't thread safe, this could potentially corrupt the connection hooks linked list structure. This is my current theory as to how the ShardingConnectionHook can fail to be set.



 Comments   
Comment by auto [ 27/Mar/13 ]

Author:

{u'date': u'2013-03-26T18:47:30Z', u'name': u'Spencer T Brody', u'email': u'spencer@10gen.com'}

Message: SERVER-8786 Backport stability fixes to sharding/authConnectionHook.js
Branch: v2.2
https://github.com/mongodb/mongo/commit/f1f70fbe0b515e7d868db0815cec822ae5515966

Comment by auto [ 04/Mar/13 ]

Author:

{u'date': u'2013-03-04T16:44:24Z', u'name': u'Spencer T Brody', u'email': u'spencer@10gen.com'}

Message: SERVER-8786 further stability fixes to sharding/authConnectionHook.js
Branch: master
https://github.com/mongodb/mongo/commit/04e96203721d49eb03777f63a3b085057656e313

Comment by auto [ 04/Mar/13 ]

Author:

{u'date': u'2013-03-04T16:23:55Z', u'name': u'Spencer T Brody', u'email': u'spencer@10gen.com'}

Message: SERVER-8786 Make sharding/authConnectionHook.js more stable
Branch: master
https://github.com/mongodb/mongo/commit/20359e07cdf15a41f65572f1fccb881d17f33563

Comment by auto [ 01/Mar/13 ]

Author:

{u'date': u'2013-02-28T22:57:51Z', u'name': u'Spencer T Brody', u'email': u'spencer@10gen.com'}

Message: SERVER-8786 Fix race condition when adding ShardingConnectionHook to connection pools
Branch: v2.2
https://github.com/mongodb/mongo/commit/b83eb6d8aa2d4e8729d4bb22076e19461ad74dc5

Comment by auto [ 01/Mar/13 ]

Author:

{u'date': u'2013-02-28T22:01:56Z', u'name': u'Spencer T Brody', u'email': u'spencer@10gen.com'}

Message: SERVER-8786 Make sure that the ShardingConnectionHook gets added to the connection pools anytime sharding is initialized
Branch: v2.2
https://github.com/mongodb/mongo/commit/a332782ad86a83021103fbf3d6f542cb93687975

Comment by auto [ 01/Mar/13 ]

Author:

{u'date': u'2013-02-28T22:57:51Z', u'name': u'Spencer T Brody', u'email': u'spencer@10gen.com'}

Message: SERVER-8786 Fix race condition when adding ShardingConnectionHook to connection pools
Branch: master
https://github.com/mongodb/mongo/commit/b19a031dd8558d96307d021cab84a2fe4bd7ee3a

Comment by auto [ 01/Mar/13 ]

Author:

{u'date': u'2013-02-28T22:01:56Z', u'name': u'Spencer T Brody', u'email': u'spencer@10gen.com'}

Message: SERVER-8786 Make sure that the ShardingConnectionHook gets added to the connection pools anytime sharding is initialized
Branch: master
https://github.com/mongodb/mongo/commit/e528ff70fcc75984331612d20b0496dbb4fcf365

Comment by Spencer Brody (Inactive) [ 28/Feb/13 ]

I have reproduced this issue locally by starting up a sharded cluster with authentication and after connecting and authenticating having the very first thing run be a moveChunk. This breaks the donor shard and after that all queries hitting that shard fail.

The problem is that moveChunk calls shardingState.enable and configServer.init, but doesn't set the ShardingConnectionHook. This prevents future setShardVersion calls from adding the ShardingConnectionHook, as it would think sharding is already initialized.

Generated at Thu Feb 08 03:18:27 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.