[SERVER-2377] Mongos sharding/failover behaves strangely Created: 19/Jan/11  Updated: 12/Jul/16  Resolved: 24/Jan/11

Status: Closed
Project: Core Server
Component/s: Replication, Sharding
Affects Version/s: 1.7.5
Fix Version/s: 1.7.5

Type: Bug Priority: Major - P3
Reporter: Aristarkh Zagorodnikov Assignee: Kristina Chodorow (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Linux x86/x86_64


Operating System: Linux
Participants:

 Description   

Summary:
mongos fails temporarily when replica set primary member goes down ("dbclient error communicating with server: <host>:<port>"), then fails semi-permanently until all replica set are up ("mongos connectionpool: connect failed <replicaSet>/<host>:<port>[,<host>:<port>...]") and ("not master and slaveok=false")
I wonder if mongos should:
1. Do auto-retry/auto-reconnect at least for read operations
2. Do not fail permanently until replica set has all servers running again

Configuration:
A set of 3 machines hosting replica set (named testRS), config servers and an instance of mongos. Sharding is enabled, no actual collections are sharded.
celestine-1: config1, rs1, mongos
celestine-2: config2, rs2
celestine-3: config3, rs3
One user database "test1", having one collection "items" with two documents (see session below).

Versions:
mongos 1.7.5 nightly (2011-01-18), used it because mongos 1.6.5/1.6.6 causes mongo shell to fail with assertion (ERROR: MessagingPort::call() wrong id got:XXX expect:YYY)
mongod 1.7.5 nightly (2011-01-18)
mongo shell 1.7.5 nightly (2011-01-18)

Mongos session:
> db.items.find()

{ "_id" : ObjectId("4d35bea8ba15dc15b0d3878e"), "value" : 123456789012345 } { "_id" : ObjectId("4d35c228abfec6a6bac2d04b"), "value" : 98 }

— bring down primary member of replica set (celestine-2 ATM) here —

> db.items.find()
error: {
"$err" : "dbclient error communicating with server: celestine-2:27100",
"code" : 10278
}
> db.items.find()
error: {
"$err" : "dbclient error communicating with server: celestine-2:27100",
"code" : 10278
}
> db.items.find()
error: {
"$err" : "mongos connectionpool: connect failed testRS/celestine-1:27100,celestine-3:27100,celestine-2:27100 : connect failed to set testRS/celestine-1:27100,celestine-3:27100,celestine-2:27100",
"code" : 11002
}
> db.items.find()
error:

{ "$err" : "not master and slaveok=false", "code" : 13435 }

Mongos log:
Wed Jan 19 12:31:46 [Balancer] ~ScopedDbConnection: _conn != null
Wed Jan 19 12:31:46 [Balancer] caught exception while doing balance: DBClientBase::findOne: transport error: celestine-1:27100 query:

{ features: 1 }

Wed Jan 19 12:32:36 [Balancer] ~ScopedDbConnection: _conn != null
Wed Jan 19 12:32:36 [Balancer] caught exception while doing balance: mongos connectionpool: connect failed testRS/celestine-1:27100,celestine-3:27100,celestine-2:27100 : connect failed to set testRS/celestine-1:27100,celestine-3:27100,celestine-2:27100
Wed Jan 19 12:33:06 [Balancer] ~ScopedDbConnection: _conn != null
Wed Jan 19 12:33:06 [Balancer] caught exception while doing balance: mongos connectionpool: connect failed testRS/celestine-1:27100,celestine-3:27100,celestine-2:27100 : connect failed to set testRS/celestine-1:27100,celestine-3:27100,celestine-2:27100



 Comments   
Comment by Kristina Chodorow (Inactive) [ 24/Jan/11 ]

Great!

Comment by Aristarkh Zagorodnikov [ 24/Jan/11 ]

Checked this today, works as you described.

Comment by Aristarkh Zagorodnikov [ 22/Jan/11 ]

Will check this on Monday, thanks

Comment by Kristina Chodorow (Inactive) [ 21/Jan/11 ]

Can you give it a try? It'll take a few seconds but will fail over:

> db.items.find()

{ "_id" : ObjectId("4d35bea8ba15dc15b0d3878e"), "value" : 123456789012345 } { "_id" : ObjectId("4d35c228abfec6a6bac2d04b"), "value" : 98 }

> // killed servers
> db.items.find()
error:

{ "$err" : "ReplicaSetMonitor no master found for set: foo", "code" : 10009 }

> db.items.find()
error: {
"$err" : "mongos connectionpool: connect failed foo/ubuntu:27017,ubuntu:27019,ubuntu:27018 : connect failed to set foo/ubuntu:27017,ubuntu:27019,ubuntu:27018",
"code" : 11002
}
> db.items.find()
error:

{ "$err" : "mongos connectionpool: connect failed foo/ubuntu:27017,ubuntu:27019,ubuntu:27018 : connect failed to set foo/ubuntu:27017,ubuntu:27019,ubuntu:27018", "code" : 11002 } > db.items.find() { "_id" : ObjectId("4d35bea8ba15dc15b0d3878e"), "value" : 123456789012345 } { "_id" : ObjectId("4d35c228abfec6a6bac2d04b"), "value" : 98 }
Comment by auto [ 21/Jan/11 ]

Author:

{u'login': u'kchodorow', u'name': u'Kristina', u'email': u'kristina@10gen.com'}

Message: rs throws out master on connection exception SERVER-2377
https://github.com/mongodb/mongo/commit/f0eb24f2245d438a47b677a3c33d3c02c73f7097

Comment by Kristina Chodorow (Inactive) [ 21/Jan/11 ]

You are correct. Will fix in a sec... (there's a simpler fix)

Comment by Aristarkh Zagorodnikov [ 21/Jan/11 ]

Below is the version of the patch with the proposed loop (please take note that slave case is completely untested and I removed the "if ( h != _slaveHost )" check in the slave version.
This leads to the following session after killing the primary:
With this patch, after killing primary I get the following on the mongos client:
> db.items.find()
error:

{ "$err" : "dbclient error communicating with server: celestine-3:27100", "code" : 10278 } > db.items.find() { "_id" : ObjectId("4d384a2061befa37946181af"), "v" : 19959 }


...
has more

— client/dbclient_rs.cpp.orig 2011-01-21 18:06:33.478345002 +0300
+++ client/dbclient_rs.cpp 2011-01-21 18:40:54.338344995 +0300
@@ -365,9 +365,26 @@
_monitor->notifyFailure( _masterHost );
}

  • _masterHost = _monitor->getMaster();
  • _master.reset( new DBClientConnection( true ) );
  • _master->connect( _masterHost );
    + size_t attempts = _monitor->getNodesCount();
    + while (true) {
    + _masterHost = _monitor->getMaster();
    + _master.reset( new DBClientConnection( true ) );
    + try
    + { + _master->connect( _masterHost ); + }

    + catch (ConnectException&)
    +

    Unknown macro: {+ _monitor->notifyFailure( _masterHost );+ if (attempts > 0)+ { + --attempts; + continue; + }+ throw;+ }

    + break;
    + }
    _auth( _master.get() );
    return _master.get();
    }
    @@ -379,13 +396,27 @@
    _monitor->notifySlaveFailure( _slaveHost );
    }

  • HostAndPort h = _monitor->getSlave();
  • if ( h != _slaveHost ) {
  • _slaveHost = h;
    + size_t attempts = _monitor->getNodesCount();
    + while (true) { + _slaveHost = _monitor->getSlave(); _slave.reset( new DBClientConnection( true ) ); - _slave->connect( _slaveHost ); - _auth( _slave.get() ); - }

    + try
    +

    { + _slave->connect( _slaveHost ); + }

    + catch (ConnectException&)
    +

    Unknown macro: {+ _monitor->notifySlaveFailure( _slaveHost );+ if (attempts > 0)+ { + --attempts; + continue; + }+ throw;+ }

    + break;
    + }
    + _auth( _slave.get() );
    return _slave.get();
    }

— client/dbclient_rs.h.orig 2011-01-21 18:27:11.490345008 +0300
+++ client/dbclient_rs.h 2011-01-21 18:27:46.702345006 +0300
@@ -83,6 +83,8 @@

string getServerAddress() const;

+ size_t getNodesCount() const

{ return _nodes.size(); }

+
private:
/**

  • This populates a list of hosts from the list of seeds (discarding the
Comment by Aristarkh Zagorodnikov [ 21/Jan/11 ]

I explored client/dbclient_rs.cpp a bit and found out that _nodes[..].ok won't get set to false in my scenario (connection errors aren't handled at all and ReplicaSetMonitorWatcher job executes only once for each replica set).
I concluded a small patch (mostly sure that it is a wrong way to solve this problem, but at least it appears to work, also I don't know if slaves should be handled in this way).
With this patch, after killing primary I get the following on the mongos client:
> db.items.find()
error: {
"$err" : "dbclient error communicating with server: celestine-2:27100",
"code" : 10278
}
> db.items.find()
error:

{ "$err" : "mongos connectionpool: connect failed testRS/celestine-1:27100,celestine-3:27100,celestine-2:27100 : connect failed to set testRS/celestine-1:27100,celestine-3:27100,celestine-2:27100", "code" : 11002 } > db.items.find() { "_id" : ObjectId("4d384a2061befa37946181af"), "v" : 19959 }


...
has more

So, looks like error does not become permanent after failure. I'm sure this can be improved, so there is at most one error per server failure (adding some kind of a loop in DBClientReplicaSet::checkMaster() that gives up after N attempts, where N is ReplicaSetMonitor._nodes.size(), might be feasible).

— client/dbclient_rs.cpp.orig 2011-01-21 18:06:33.478345002 +0300
+++ client/dbclient_rs.cpp 2011-01-21 17:58:03.874345000 +0300
@@ -367,7 +367,15 @@

_masterHost = _monitor->getMaster();
_master.reset( new DBClientConnection( true ) );

  • _master->connect( _masterHost );
    + try
    + { + _master->connect( _masterHost ); + }

    + catch (ConnectException&)
    +

    { + _monitor->notifyFailure( _masterHost ); + throw; + }

    _auth( _master.get() );
    return _master.get();
    }
    @@ -383,7 +391,15 @@
    if ( h != _slaveHost ) {
    _slaveHost = h;
    _slave.reset( new DBClientConnection( true ) );

  • _slave->connect( _slaveHost );
    + try
    + { + _slave->connect( _slaveHost ); + }

    + catch (ConnectException&)
    +

    { + _monitor->notifySlaveFailure( _slaveHost ); + throw; + }

    _auth( _slave.get() );
    }
    return _slave.get();

Comment by Aristarkh Zagorodnikov [ 21/Jan/11 ]

Looks like original problem isn't solved.
Built from git master (c31c7f322943034bd294993e832f51383799ee5f) mongos still behaves in the same way:

xm@celestine-1:~$ ./db/bin/mongo celestine-1/test1
MongoDB shell version: 1.7.5-pre-
connecting to: celestine-1/test1
> db.items.find()

{ "_id" : ObjectId("4d384a2061befa37946181af"), "v" : 19959 }

...
has more

— kill primary here —

> db.items.find()
error: {
"$err" : "dbclient error communicating with server: celestine-2:27100",
"code" : 10278
}
> db.items.find()
error: {
"$err" : "mongos connectionpool: connect failed testRS/celestine-1:27100,celestine-3:27100,celestine-2:27100 : connect failed to set testRS/celestine-1:27100,celestine-3:27100,celestine-2:27100",
"code" : 11002
}
> db.items.find()
error: {
"$err" : "mongos connectionpool: connect failed testRS/celestine-1:27100,celestine-3:27100,celestine-2:27100 : connect failed to set testRS/celestine-1:27100,celestine-3:27100,celestine-2:27100",
"code" : 11002
}

Mongos log:
Fri Jan 21 14:21:20 /home/xm/db/bin/mongos db version v1.7.5-pre-, pdfile version 4.5 starting (--help for usage)
Fri Jan 21 14:21:20 git version: nogitversion
Fri Jan 21 14:21:20 sys info: Linux celestine-1 2.6.35-22-generic-pae #34~lucid1-Ubuntu SMP Mon Oct 11 14:53:39 UTC 2010 i686 BOOST_LIB_VERSION=1_40
Fri Jan 21 14:21:20 SyncClusterConnection connecting to [celestine-1:27200]
Fri Jan 21 14:21:20 SyncClusterConnection connecting to [celestine-2:27200]
Fri Jan 21 14:21:20 SyncClusterConnection connecting to [celestine-3:27200]
Fri Jan 21 14:21:20 [mongosMain] waiting for connections on port 27017
Fri Jan 21 14:21:20 [Balancer] about to contact config servers and shards
Fri Jan 21 14:21:20 [websvr] web admin interface listening on port 28017
Fri Jan 21 14:21:20 [Balancer] updated set (testRS) to: testRS/celestine-1:27100,celestine-3:27100
Fri Jan 21 14:21:20 [Balancer] updated set (testRS) to: testRS/celestine-1:27100,celestine-3:27100,celestine-2:27100
Fri Jan 21 14:21:20 [Balancer] config servers and shards contacted successfully
Fri Jan 21 14:21:20 [Balancer] balancer id: celestine-1:27017 started at Jan 21 14:21:20
Fri Jan 21 14:21:20 [LockPinger] SyncClusterConnection connecting to [celestine-1:27200]
Fri Jan 21 14:21:20 [LockPinger] SyncClusterConnection connecting to [celestine-2:27200]
Fri Jan 21 14:21:20 [Balancer] SyncClusterConnection connecting to [celestine-1:27200]
Fri Jan 21 14:21:20 [LockPinger] SyncClusterConnection connecting to [celestine-3:27200]
Fri Jan 21 14:21:20 [Balancer] SyncClusterConnection connecting to [celestine-2:27200]
Fri Jan 21 14:21:20 [Balancer] SyncClusterConnection connecting to [celestine-3:27200]
Fri Jan 21 14:21:28 [mongosMain] connection accepted from 192.168.7.241:35334 #1
Fri Jan 21 14:21:32 [conn1] creating WriteBackListener for: testRS/celestine-1:27100,celestine-3:27100,celestine-2:27100,celestine-3:27100,celestine-2:27100
Fri Jan 21 14:21:40 checking replica set: testRS
Fri Jan 21 14:21:50 [Balancer] ~ScopedDbConnection: _conn != null
Fri Jan 21 14:21:50 [Balancer] caught exception while doing balance: mongos connectionpool: connect failed testRS/celestine-1:27100,celestine-3:27100,celestine-2:27100 : connect failed to set testRS/celestine-1:27100,celestine-3:27100,celestine-2:27100
Fri Jan 21 14:22:20 [Balancer] ~ScopedDbConnection: _conn != null
Fri Jan 21 14:22:20 [Balancer] caught exception while doing balance: mongos connectionpool: connect failed testRS/celestine-1:27100,celestine-3:27100,celestine-2:27100 : connect failed to set testRS/celestine-1:27100,celestine-3:27100,celestine-2:27100
Fri Jan 21 14:22:50 [Balancer] ~ScopedDbConnection: _conn != null
Fri Jan 21 14:22:50 [Balancer] caught exception while doing balance: mongos connectionpool: connect failed testRS/celestine-1:27100,celestine-3:27100,celestine-2:27100 : connect failed to set testRS/celestine-1:27100,celestine-3:27100,celestine-2:27100
Fri Jan 21 14:22:51 [LockPinger] SyncClusterConnection connecting to [celestine-1:27200]
Fri Jan 21 14:22:51 [LockPinger] SyncClusterConnection connecting to [celestine-2:27200]
Fri Jan 21 14:22:51 [LockPinger] SyncClusterConnection connecting to [celestine-3:27200]
Fri Jan 21 14:23:00 [conn1] end connection 192.168.7.241:35334

Comment by Kristina Chodorow (Inactive) [ 21/Jan/11 ]

It should now find the new master.

Comment by auto [ 21/Jan/11 ]

Author:

{u'login': u'kchodorow', u'name': u'Kristina', u'email': u'kristina@10gen.com'}

Message: make c++ client check that rs master is ok before returning it SERVER-2377
https://github.com/mongodb/mongo/commit/a756cf2375574732d77479fe3a4ed1c4138feb4e

Comment by Aristarkh Zagorodnikov [ 20/Jan/11 ]

My fault for the setSlaveOk() part, yes. I thought (don't know why, docs aren't saying that) that mongos defaults to reading from slaves.

Comment by Kristina Chodorow (Inactive) [ 20/Jan/11 ]

Only part of the problem, but

{ "$err" : "not master and slaveok=false", "code" : 13435 }

just means you're trying to query a slave without saying db.getMongo().setSlaveOk() first. Try:

> db.getMongo().setSlaveOk()
> db.items.find()

once all of the servers are up again.

Comment by Aristarkh Zagorodnikov [ 20/Jan/11 ]

~$ ./db/bin/mongo celestine-1/test1
MongoDB shell version: 1.7.5-pre-
connecting to: localhost/test1
> db.items.find()
error:

{ "$err" : "not master and slaveok=false", "code" : 13435 }

~$ ./db/bin/mongo celestine-1:27100
MongoDB shell version: 1.7.5-pre-
connecting to: celestine-1:27100/test
testRS:PRIMARY> rs.status()
{
"set" : "testRS",
"date" : ISODate("2011-01-20T08:27:40Z"),
"myState" : 1,
"members" : [
{
"_id" : 0,
"name" : "celestine-1:27100",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"optime" :

{ "t" : 1295443467000, "i" : 1 }

,
"optimeDate" : ISODate("2011-01-19T13:24:27Z"),
"self" : true
},
{
"_id" : 1,
"name" : "celestine-2:27100",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 58665,
"optime" :

{ "t" : 1295443467000, "i" : 1 }

,
"optimeDate" : ISODate("2011-01-19T13:24:27Z"),
"lastHeartbeat" : ISODate("2011-01-20T08:27:39Z")
},
{
"_id" : 2,
"name" : "celestine-3:27100",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 58668,
"optime" :

{ "t" : 1295443467000, "i" : 1 }

,
"optimeDate" : ISODate("2011-01-19T13:24:27Z"),
"lastHeartbeat" : ISODate("2011-01-20T08:27:39Z")
}
],
"ok" : 1
}

Comment by Eliot Horowitz (Inactive) [ 20/Jan/11 ]

I'm a little confused about what state things were in at various error messages.

When you got:
error:

{ "$err" : "not master and slaveok=false", "code" : 13435 }

What state was the replica set in?

Generated at Thu Feb 08 02:59:47 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.