Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Incomplete
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 2.6.4
Component/s: Sharding
Labels:
None

Operating System:
ALL
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Intro

We are running a sharded cluster of 7 shards. Every shard is a replicaset of 3 replicas.

We do pre-splitting in order to distribute evenly new inserted documents among all shards dependent on their amount of RAM. Our database must fit in RAM else it performance drops down significantly. The balancer is switched off because it thinks all servers have the same amount of RAM which would lead to overload of shards having less RAM.

From time to time, i.e. when adding a new shard, we need to balance manually. Our script executes first sh.moveChunk and when it returns ok, it deletes the moved documents from the source shards because we observed that sh.moveChunk does not always clean-up 100%.

In pseudo code it gives:

for all chunks from donor shard
  print("move " + (chunk++) + " of id: " +  chunk.min._id)
  move = sh.moveChunk(fromCollection, chunk.min._id, toShard)
  if(move.ok){
    del = fromCollection.remove({_id:{$gte:chunk.min._id, $lt:chunk.max._id}}, { writeConcern: { w: "majority"} })
    print("deleted: " + del.nRemoved)
  }

Problem

We observed that mongo may wait forever when we add the writeConcern "majority" to the remove command.

In the currentOp we see something like this:

{
  "opid" : 135910798,
  "active" : true,
  "secs_running" : 2362,
  "microsecs_running" : NumberLong("2362545153"),
  "op" : "query",
  "ns" : "",
  "query" : {
	  "delete" : "offer",
	  "deletes" : [
		  {
			  "q" : {
				  "_id" : {
					  "$gte" : NumberLong("2577460009"),
					  "$lt" : NumberLong("2577494734")
				  }
			  },
			  "limit" : 0
		  }
	  ],
	  "ordered" : true,
	  "writeConcern" : {
		  "w" : "majority"
	  }
  },
  "client" : "172.30.2.130:51426",
  "desc" : "conn4708952",
  "threadId" : "0x7f2035ca5700",
  "connectionId" : 4708952,
  "waitingForLock" : false,
  "msg" : "waiting for write concern",
  "numYields" : 0,
  "lockStats" : {
	  "timeLockedMicros" : {
		  
	  },
	  "timeAcquiringMicros" : {
		  
	  }
  }
}

While mongo was waiting, we verified the number of documents of this chunk on all replicaset members of the donor shard. It was 0! So all documents of this chunk were removed from the donor shard but mongo still waited for it. Why?

Also, still more weird, the output of the above pseudo script was like this:

move 1 of id: 11605025
deleted: 5623
move 2 of id: 183729058
deleted: 7152
move 3 of id: 2577460009
deleted: 3561
move 4 of id: 2977458123

Move 4 hung forever.
As you can see, mongo was endlessly waiting for the query $gte:2577460009. However, the result of the delete returned already since it printed the number of deleted documents (3561). Only the following chunk move #4 got stuck. Why?
Also, as you can see, our script had always to delete documents that should have been deleted by the sh.moveChunk already, shouldn't it?

Assignee:: Unassigned
Reporter:: Kay Agahd
Participants:: Andy Schwerin, Kay Agahd, Sam Kleinman
Votes:: 0 Vote for this issue
Watchers:: 6 Start watching this issue

Created:: Nov 25 2014 02:30:57 PM UTC
Updated:: Apr 08 2015 03:57:33 PM UTC
Resolved:: Apr 08 2015 03:57:28 PM UTC

Details

Description

Intro

Problem

Attachments

Forms

Activity

People

Dates