Priority: Major - P3
Affects Version/s: 2.6.3
Fix Version/s: 2.7
UPDATE: Although the presentation of this issue was new to us, it's a known bug in PyMongo before 2.7. I didn't realize I'd fixed it, along with a large class of similar bugs related to replica set reconnection, when I rewrote MongoClient for
PyMongo 2's rapidly obsolescing MongoClient can get stuck trying to authenticate to a recovering member, even if a primary is available. There are a few ways this can happen, all intricate. The particular case in which this was reported was:
1. Replica set with a primary "A" and a resyncing member "B"
2. MongoClient started with connection string "A,B" and no "replicaSet" keyword (also note, not PyMongo 2's MongoReplicaSetClient)
3. MongoClient.database.authenticate("user", "password") succeeds against the primary
4. An operation ("find_one" or whatever) fails against the primary with network error
5. On the next operation, MongoClient attempts rediscovery by calling "ismaster" on A and B again. Since it has cached the "user" / "password" credentials, unfortunately, it attempts authentication against each node as it connects.
6. When it tries to reach host "B", B is resyncing and doesn't have the user's record yet, so auth fails.
7. MongoClient throws OperationFailure("auth fails"), and continues to do so even after the primary becomes available again.
This can be reproduced with MockupDB. First "pip install git+git://github.com/ajdavis/mongo-mockup-db.git". MockupDB requires PyMongo 3, so run it in a separate virtualenv. Start a mock replica set with two members:
Then connect a client with PyMongo 2 (this was tested with 2.6.3, but all recent PyMongo 2 versions will act the same):
The client's initial auth succeeds, then each find_one fails with an OperationFailure and characteristic traceback:
PyMongo 3's MongoClient, on the other hand, behaves as designed: it throws AutoReconnect once, then successfully reconnects to the primary.
This shows a couple bugs in MongoClient. First, it shouldn't attempt auth against a member in an unknown state during reconnection. PyMongo 3's MongoClient does not. Second, if there is any failure while rediscovering the state of a member, MongoClient shouldn't stay pinned to that member's host and port afterward. Again, this is fixed in PyMongo 3.
This bug has existed in PyMongo 2 for as long as PyMongo has supported authentication.
A possible solution is to update this code in PyMongo 2's MongoClient.__find_node:
This code was written assuming that an auth failure is a permanent and global condition that should be raised at once, rather than transient and particular to the RS member being tried, the way a network error might be. MongoClient might instead treat OperationFailure as it does other exceptions: keep trying more nodes.
I've briefly tested this and it fixes this bug, at the cost of backward compatibility: if we make the change, auth errors from MongoClient's constructor will raise AutoReconnect instead of the expected OperationFailure.
Best in my opinion to leave PyMongo as-is and encourage users to upgrade.