[SERVER-3763] when one shard goes down, mongos starts returning failure in getlasterror for all the shards, to already connected client. Created: 06/Sep/11  Updated: 11/Jul/16  Resolved: 24/Jan/12

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 1.8.1
Fix Version/s: 2.0.3, 2.1.0

Type: Bug Priority: Major - P3
Reporter: anurag berdia Assignee: Eric Milkie
Resolution: Done Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Mongo version 1.8.1 on 64-bit Debian machine


Issue Links:
Duplicate
is duplicated by SERVER-4643 one of the shard is down Closed
Operating System: Linux
Participants:

 Description   

MongoDB with 2 shards. When both the shards are up and running, our mongo driver using MongoS
inserts the objects properly and get proper codes in getLastError.
When one of the shard(Shard1) is down (MongoD process is crashed), MongoS
starts giving socket exception (with code *) for both shards. (Even though
it keeps inserting objects on one shard2)

At the same time if another mongo driver tries to connect to same MongoS, it
gives proper codes in getLastError i.e. for shard1 it gives socket exception
and for shard2 it gives success. (Also if we stop and start our application
again it works properly)



 Comments   
Comment by Brendan W. McAdams [ 07/Feb/12 ]

Sorry, mistakingly assigned wrong ticket to myself

Comment by Eric Milkie [ 24/Jan/12 ]

Backported to 2.0.3

Comment by auto [ 24/Jan/12 ]

Author:

{u'login': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: SERVER-3763 SERVER-4643 use correct connection

There were two automatic variables named "conn" in scope, with one occluding the other. I mistakenly changed the logic in my last commit, so I have now fixed it, and removed the overlapping scopes.
Branch: v2.0
https://github.com/mongodb/mongo/commit/937077b4c15b75411c661232236f3b2cae70ce81

Comment by auto [ 24/Jan/12 ]

Author:

{u'login': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: SERVER-3763 SERVER-4643 catch exceptions on connect to other shards when handling multiple shard case
Branch: v2.0
https://github.com/mongodb/mongo/commit/c202889e9c3d50f3e7557ceb2b1e505b519a7955

Comment by auto [ 24/Jan/12 ]

Author:

{u'login': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: SERVER-3763 SERVER-4643 avoid exceptions in getlasterror due to other shards being down

I had to move the creation of ShardConnection inside the try/catch because it is possible that it might throw a SocketException if the shard is down.
This change allows shard_gle_insert.js to pass on Windows, and may alleviate Linux failures with getLastError as well.
Branch: v2.0
https://github.com/mongodb/mongo/commit/078c791acd376d60c770c444b3fcd2f5a006baa1

Comment by auto [ 18/Jan/12 ]

Author:

{u'login': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: SERVER-3763 SERVER-4643 use correct connection

There were two automatic variables named "conn" in scope, with one occluding the other. I mistakenly changed the logic in my last commit, so I have now fixed it, and removed the overlapping scopes.
Branch: master
https://github.com/mongodb/mongo/commit/826d964c31f7a96c43f0467b85ff6b23f66fad68

Comment by Eric Milkie [ 18/Jan/12 ]

I believe the previous two commits now fix the issue.

Comment by Greg Studer [ 18/Jan/12 ]

@eric - does this actually fix the issue?

Comment by auto [ 17/Jan/12 ]

Author:

{u'login': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: SERVER-3763 SERVER-4643 catch exceptions on connect to other shards when handling multiple shard case
Branch: master
https://github.com/mongodb/mongo/commit/8a002c30c6d6749f87c618b61d3f58d54696bb7a

Comment by auto [ 16/Jan/12 ]

Author:

{u'login': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: SERVER-3763 SERVER-4643 avoid exceptions in getlasterror due to other shards being down

I had to move the creation of ShardConnection inside the try/catch because it is possible that it might throw a SocketException if the shard is down.
This change allows shard_gle_insert.js to pass on Windows, and may alleviate Linux failures with getLastError as well.
Branch: master
https://github.com/mongodb/mongo/commit/937e0d8e1afbc75ef475ec99196f1866f1c56679

Comment by anurag berdia [ 06/Jan/12 ]

It is also there in 2.0.2. Please confirm from your end.
Thanks!!

Comment by anurag berdia [ 05/Jan/12 ]

The issue is still open in 2.0.1. Please check link below:
http://groups.google.com/group/mongodb-user/browse_thread/thread/e142a2404b768f2b/819e87bbcda0816b?lnk=gst&q=anurag+berdia&pli=1

Please respond,
Thanks,
Anurag Berdia

Comment by Eliot Horowitz (Inactive) [ 30/Dec/11 ]

this was fixed in 2.0.1, so also in 2.0.2

Comment by anurag berdia [ 30/Dec/11 ]

This will be fixed in version 2.1.1. right? When this version is going to be available. Right now mongo db version 2.0.2 is available.

Comment by auto [ 25/Oct/11 ]

Author:

{u'login': u'gregstuder', u'name': u'gregs', u'email': u'greg@10gen.com'}

Message: more error handling and msgs for gle SERVER-3763
Branch: v2.0
https://github.com/mongodb/mongo/commit/0c382de5881ff04a88c3744ef8bb77cf20d94af2

Comment by auto [ 17/Oct/11 ]

Author:

{u'login': u'gregstuder', u'name': u'gregs', u'email': u'greg@10gen.com'}

Message: test for SERVER-3763
Branch: master
https://github.com/mongodb/mongo/commit/aff5b255dae46acd8aa462564a186c21e83ebd10

Comment by auto [ 17/Oct/11 ]

Author:

{u'login': u'gregstuder', u'name': u'gregs', u'email': u'greg@10gen.com'}

Message: more error handling and msgs for gle SERVER-3763
Branch: master
https://github.com/mongodb/mongo/commit/42d9b82e816d8de09eee3d3801abb0197c337deb

Comment by Greg Studer [ 17/Oct/11 ]

aff5b255dae46acd8aa462564a186c21e83ebd10 and 42d9b82e816d8de09eee3d3801abb0197c337deb

Comment by Eliot Horowitz (Inactive) [ 06/Sep/11 ]

@tony - can you write a test to reproduce

Comment by Tony Hannan [ 06/Sep/11 ]

I am able to reproduce this in version 2.0.0-rc0. To trigger, attempt to insert into down shard then insert into up shard. The insert succeeds but error 11002 is returned.

Generated at Thu Feb 08 03:03:59 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.