[SERVER-2777] moveChunk failed, socket exception Created: 17/Mar/11  Updated: 30/Mar/12  Resolved: 17/Mar/11

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 1.6.5
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: charso Assignee: Unassigned
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Ubuntu 10.04


Operating System: Linux
Participants:

 Description   

I run a move chunk operation with this command options:
{:moveChunk=>"localytics_production.events",
:find=>

{"_id"=><BSON::Binary:-610712958>}

,
:to=>"localyticsProdShard3"}

And I occasionally get an error like this:

/usr/lib/ruby/gems/1.8/gems/mongo-1.2.4/lib/../lib/mongo/db.rb:501:in `command': Database command 'moveChunk' failed: {"ok"=>0.0, "errmsg"=>"_recvChunkStatus error { cause: { active: false, ns: \"localytics_production.events\", from: \"localyticsProdShard1/ip-10-127-86-88.ec2.internal:27017,ip-10-126-165-...\", min:

{ _id: BinData }

, max:

{ _id: BinData }

, state: \"fail\", errmsg: \"socket exception\", counts:

{ cloned: 0, catchup: 0, steady: 0 }

, ok: 1.0 }, errmsg: \"_recvChunkStatus error\", ok: 0.0 }"} (Mongo::OperationFailure)

Any idea what this is all about and how I can go about diagnosing it?

One piece of information that might be helpful, I already started another moveChunk command, but it's for a different collection with a chunk on different from and to shards.



 Comments   
Comment by charso [ 17/Mar/11 ]

Thanks Eliot.

Comment by Eliot Horowitz (Inactive) [ 17/Mar/11 ]

Please let us know if get any more info.
Given just that, it seems like a pure networking issue.
Mongo should recover just fine though and this shouldn't cause any issue on the cluster.

Comment by charso [ 17/Mar/11 ]

I didn't shut anything down, but I am on EC2 and network connection drops aren't unheard of. I can't nail down exactly what the circumstances are when I see it. At least I know what to look for now.

Comment by Eliot Horowitz (Inactive) [ 17/Mar/11 ]

That means a socket was closed between the 2 shards.
So could be:

  • mongod shutdown or crash
  • server shutdown or crash
  • network flapping

Can you check for all of those?

Generated at Thu Feb 08 03:01:09 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.