[SERVER-23975] migrating chunks from one shard to another which is also set to be removed Created: 28/Apr/16  Updated: 04/May/16  Resolved: 29/Apr/16

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.0.6
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Rob Reid Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-11328 Allow concurrent draining shards Closed
Related
is related to SERVER-24006 Allow multiple shards to be marked as... Closed
Operating System: ALL
Participants:

 Description   

We are migrating data from a legacy set of shards to a new set. Looking at the logs I see that it is moving chunks from one shard to another which is also in the set of shards to be removed. Of course this means that the contents of that chunk are going to be migrated more than 1 time, making the already painfully slow balancing process even worse.

Here's an excerpt from the log showing that it is moving a chunk from Shard_18 to Shard_83:

Apr 28 12:33:48 chron-queryb ip-10-43-48-90:  docker/mongowt[5237]:  2016-04-28T19:33:48.295+0000 I NETWORK  [conn1822] end connection 172.17.0.1:41782 (3 connections now open)
Apr 28 12:34:01 chron-queryb ip-10-43-48-90:  docker/shardstrap[5237]:  [INFO] - Shard(s) to remove: set([u'Shard_26', u'Shard_73', u'Shard_52', u'Shard_77', u'Shard_89', u'Shard_88', u'Shard_87', u'Shard_86', u'Shard_85', u'Shard_84', u'Shard_83', u'Shard_82', u'Shard_81', u'Shard_80', u'Shard_29', u'Shard_28', u'Shard_25', u'Shard_24', u'Shard_27', u'Shard_53', u'Shard_21', u'Shard_20', u'Shard_23', u'Shard_22', u'Shard_61', u'Shard_60', u'Shard_63', u'Shard_62', u'Shard_65', u'Shard_64', u'Shard_67', u'Shard_66', u'Shard_69', u'Shard_68', u'Shard_41', u'Shard_40', u'Shard_47', u'Shard_46', u'Shard_45', u'Shard_44', u'Shard_48', u'Shard_71', u'Shard_54', u'Shard_79', u'Shard_100', u'Shard_50', u'Shard_98', u'Shard_99', u'Shard_49', u'Shard_42', u'Shard_94', u'Shard_95', u'Shard_96', u'Shard_97', u'Shard_90', u'Shard_91', u'Shard_92', u'Shard_93', u'Shard_38', u'Shard_39', u'Shard_43', u'Shard_55', u'Shard_32', u'Shard_72', u'Shard_30', u'Shard_31', u'Shard_36', u'Shard_37', u'Shard_34', u'Shard_35', u'Shard_58', u'Shard_59', u'Shard_70', u'Shard_78', u'Shard_33', u'Shard_76', u'Shard_51', u'Shard_74', u'Shard_75', u'Shard_18', u'Shard_19', u'Shard_56', u'Shard_57'])
Apr 28 12:34:01 chron-queryb ip-10-43-48-90:  docker/shardstrap[5237]:  [ERROR] - Manual shard removal needed/in progress -OR- Critical error is occuring
Apr 28 12:34:01 chron-queryb ip-10-43-48-90:  docker/shardstrap[5237]:  [INFO] - Sleeping for 600
Apr 28 12:34:04 chron-queryb ip-10-43-48-90:  docker/mongowt[5237]:  2016-04-28T19:34:04.005+0000 I SHARDING [Balancer] ChunkManager: time to load chunks for google_us.review: 22ms sequenceNumber: 60 version: 8598|1||557fc21623a54038a6930db5 based on: 8597|1||557fc21623a54038a6930db5
Apr 28 12:34:04 chron-queryb ip-10-43-48-90:  docker/mongowt[5237]:  2016-04-28T19:34:04.025+0000 I SHARDING [Balancer] moving chunk ns: google_us.recommended moving ( ns: google_us.recommended, shard: Shard_18:Shard_18/10.37.28.238:27018,10.37.41.191:27018,10.43.26.234:27018, lastmod: 12|0||000000000000000000000000, min: { app_id: "com.amazesoft.collage.mania" }, max: { app_id: "com.anip.wallpaper.live.moonlight" }) Shard_18:Shard_18/10.37.28.238:27018,10.37.41.191:27018,10.43.26.234:27018 -> Shard_83:Shard_83/10.37.24.79:27018,10.37.44.232:27018,10.43.31.196:27018



 Comments   
Comment by Kevin Pulo [ 04/May/16 ]

As explained on SERVER-11328, a workaround is to use the "maxSize" option to prevent chunks from being migrated onto shards that you know are also going to be removed soon.

Comment by Rob Reid [ 03/May/16 ]

FYI, I've now come to understand that the source of my confusion was the intermingling of log data from 2 different sources. The output indicating that a set of shards is set to be removed was from a script by our OPS team.

Comment by Ramon Fernandez Marina [ 03/May/16 ]

Please see also SERVER-24006, which is open to add support for multiple draining shards when using Replica Set Config Servers.

Comment by Ramon Fernandez Marina [ 29/Apr/16 ]

I've opened DOCS-7777 to enhance the docs about draining state. If you feel there should be a feature to mark multiple shards for removal feel free to open a new ticket – but if I understand your use case correctly, the best way to do what you need to do is not by removing multiple shards but by upgrading them with the new hardware via the replication subsystem (which will be much faster).

Comment by Rob Reid [ 29/Apr/16 ]

I understand that only one shard can be in draining state at once. I was surprised that multiple shards could be marked for removal. I haven't seen that documented.

Comment by Ramon Fernandez Marina [ 29/Apr/16 ]

robreid, in a sharded cluster only one shard can be in draining state. Here's an example on a local setup:

mongos> db.runCommand({removeShard: "shard01"})
{
        "msg" : "draining started successfully",
        "state" : "started",
        "shard" : "shard01",
        "note" : "you need to drop or movePrimary these databases",
        "dbsToMove" : [
                "test"
        ],
        "ok" : 1
}
mongos> db.runCommand({removeShard: "shard02"})
{
        "ok" : 0,
        "errmsg" : "Can't have more than one draining shard at a time",
        "code" : 117
}

Other than the corner case described in SERVER-18352 (which is not what you're observing) this is expected behavior, so I'm going to resolve this ticket.

Please take a look at the documentation to migrate a sharded cluster to new hardware. Assuming your shards are replica set, the preferred approach is to replace replica set members, not complete shards; this will accomplish the upgrade via initial syncs as opposed to chunk migrations.

Regards,
Ramón.

Comment by Rob Reid [ 28/Apr/16 ]

These are from a PaperTrail presentation which includes log entries for many hosts. I don't believe I can access the logs directly.

My understanding is that the OPS team executed a loop to make removeShard calls for each of the specific shards. It was supposed to only submit one shard at a time, and block on submitting further shards until that one shard was removed. The config.shards collection shows only one shard in "draining" state. But the set in the log, "Shard(s) to remove", suggests that they were all submitted.

Comment by Ramon Fernandez Marina [ 28/Apr/16 ]

robreid, can you please elaborate on how are you removing the old shards and how are the non-mongod log lines above being generated? Are you running the balancer or manually moving chunks?

It is possible you're running into SERVER-18352, which can happen when the balancer has chosen a receiver shard before that shard has been marked as "draining", or on a manual chunk move. If you could provide full logs for this cluster we could take a look to make sure you haven't found a novel bug in chunk migration.

Thanks,
Ramón.

Generated at Thu Feb 08 04:05:00 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.