[SERVER-20105] movechunk failed after adding a new share to cluster Created: 25/Aug/15  Updated: 26/Aug/15  Resolved: 26/Aug/15

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 2.6.8
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: lilele Assignee: Sam Kleinman (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Participants:

 Description   

I have a MongoDB cluster, which already has two shares: share1 and share2.

When I add a new share3 to the cluster, data chunk can't migrate to share3.

This is from the mongos log:

2015-08-25T11:50:48.145+0800
[Balancer] balancer move failed:
{ cause: { ok: 0.0, errmsg: "migrate already in progress" }, ok: 0.0,
errmsg: "moveChunk failed to engage TO-shard in the data transfer: 
migrate already in progress" } from: share1 to: share3 
chunk:  min: { colkey: "51f7b17f56240isjduf2716b", colid: "111234322333" }
max: { colkey: "51f7b17f56240isjduf2716b", colid: "111234322333" }

this is the output by command:

db.changelog.find({ what: "moveChunk.to"}).sort({_id:-1}).limit(5).pretty()
{
        "_id" : "mongodb9.myhost.xx-2015-08-25T09:36:08-55dc3708c599c005ae1b500f",
        "server" : "mongodb9.myhost.xx",
        "clientAddr" : ":27017",
        "time" : ISODate("2015-08-25T09:36:08.442Z"),
        "what" : "moveChunk.to",
        "ns" : "tbplus.col",
        "details" : {
                "min" : {
                        "colkey" : "3ew7b17skfud7382jd70c533b",
                        "colid" : "13eswf432a2535abe68cfc236sf343sdfw1"
                },
                "max" : {
                        "colkey" : "51f7d2c856240bf5c607a3a8",
                        "colid" : "7283023d4fce5b6e51adfefbc7532ff5"
                },
                "step 1 of 5" : 0,
                "step 2 of 5" : 91631, "note" : "aborted",
                "errmsg" : "_migrateClone failed: { ok: 0.0, errmsg: \"not active\" }"
        }
}
{
        "_id" : "mongodb9.myhost.xx-2015-08-25T09:33:08-55dc3654c599c005ae1b500d",
        "server" : "mongodb9.myhost.xx",
        "clientAddr" : ":27017",
        "time" : ISODate("2015-08-25T09:33:08.949Z"),
        "what" : "moveChunk.to",
        "ns" : "tbplus.col",
        "details" : {
                "min" : {
                        "colkey" : "3ew7b17skfud7382jd70c533b",
                        "colid" : "13eswf432a2535abe68cfc236sf343sdfw1"
                },
                "max" : {
                        "colkey" : "51f7d2c856240bf5c607a3a8",
                        "colid" : "7283023d4fce5b6e51adfefbc7532ff5"
                },
                "step 1 of 5" : 0,
                "step 2 of 5" : 93149,
                "note" : "aborted",
                "errmsg" : "_migrateClone failed: { ok: 0.0, errmsg: \"not active\" }"
        }
}
{
        "_id" : "mongodb9.myhost.xx-2015-08-25T09:30:06-55dc359ec599c005ae1b500c",
        "server" : "mongodb9.myhost.xx",
        "clientAddr" : ":27017",
        "time" : ISODate("2015-08-25T09:30:06.317Z"),
        "what" : "moveChunk.to",
        "ns" : "tbplus.share",
        "details" : {
                "min" : {
                        "colkey" : "s2b7fe99562fed23wer317a3b",
                        "userid" : "75ad2ae7addfc1bbd6ea493399f1"
                },
                "max" : {
                        "colkey" : "s2b7fe99562fed23wer317a3b",
                        "userid" : "76e34be841b8b874a3341c0ad5b29a1"
                },
                "step 1 of 5" : 0,
                "step 2 of 5" : 61859,
                "note" : "aborted",
                "errmsg" : "_migrateClone failed: { ok: 0.0, errmsg: \"not active\" }"
        }
}
{
        "_id" : "mongodb9.myhost.xx-2015-08-25T09:27:13-55dc34f1c599c005ae1b500b",
        "server" : "mongodb9.myhost.xx",
        "clientAddr" : ":27017",
        "time" : ISODate("2015-08-25T09:27:13.842Z"),
        "what" : "moveChunk.to",
        "ns" : "tbplus.col",
        "details" : {
                "min" : {
                        "colkey" : "3ew7b17skfud7382jd70c533b",
                        "colid" : "13eswf432a2535abe68cfc236sf343sdfw1"
                },
                "max" : {
                        "colkey" : "51f7d2c856240bf5c607a3a8",
                        "colid" : "7283023d4fce5b6e51adfefbc7532ff5"
                },
                "step 1 of 5" : 1,
                "step 2 of 5" : 92962,
                "note" : "aborted",
                "errmsg" : "_migrateClone failed: { ok: 0.0, errmsg: \"not active\" }"
        }
}
{
        "_id" : "mongodb9.myhost.xx-2015-08-25T09:24:10-55dc343ac599c005ae1b500a",
        "server" : "mongodb9.myhost.xx",
        "clientAddr" : ":27017",
        "time" : ISODate("2015-08-25T09:24:10.936Z"),
        "what" : "moveChunk.to",
        "ns" : "tbplus.col",
        "details" : {
                "min" : {
                        "colkey" : "3ew7b17skfud7382jd70c533b",
                        "colid" : "13eswf432a2535abe68cfc236sf343sdfw1"
                },
                "max" : {
                        "colkey" : "51f7d2c856240bf5c607a3a8",
                        "colid" : "7283023d4fce5b6e51adfefbc7532ff5"
                },
                "step 1 of 5" : 0,
                "step 2 of 5" : 92102,
                "note" : "aborted",
                "errmsg" : "_migrateClone failed: { ok: 0.0, errmsg: \"not active\" }"
        }
}



 Comments   
Comment by lilele [ 26/Aug/15 ]

i'm sorry ,may be we forget the ntp process , the time of the new server is not correct

thank you for help ,you can close this issue now

Comment by Sam Kleinman (Inactive) [ 25/Aug/15 ]

I'm sorry to hear that you've run into this problem. Could you provide a bit more information on your deployment and configuration:

  1. Can you provide the output of sh.status() while connected in the mongo shell to a mongos instance?
  2. How many mongos instances are you running?
  3. Are all of the servers running mongod and mongos instances running ntp or some other clock synchronization service? \
  4. Do you have records of chunk migrations completing successfully?
  5. Are all shards and mongos located on the same network, or do the components of the cluster have disparate locations? I want to better understand the networking configuration to understand if a transient networking failure between the TO and FROM and receiving shards.
  6. Does this error continue repeatedly or is it intermittent?

Thanks for the information.

Regards,
sam

Generated at Thu Feb 08 03:53:10 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.