Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Done
Priority: Major - P3
Fix Version/s: 3.4.2, 3.5.2
Affects Version/s: 3.4.0
Component/s: Sharding
Labels:
None

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Backport Requested:

v3.4
Sprint:
Sharding 2017-01-02
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Balancer started to take 100% CPU after upgrading to 3.4.0 from 3.2.9 and enabling the balancer. This is a DB with 4 shards (rs1, rs2, rs3 and rs4), but before upgrading we removed one shard (rs5), waiting until the drain completed.

In the log I can see warnings like these for several collections:

2016-12-13T09:52:11.211+0000 W SHARDING [Balancer] Unable to enforce tag range policy for collection eplus.wifiCollection_20161008 :: caused by :: Location10181: not sharded:eplus.wifiCollection_20161008
2016-12-13T09:52:13.087+0000 W SHARDING [Balancer] Unable to enforce tag range policy for collection eplus.wifiCollection_20161009 :: caused by :: Location10181: not sharded:eplus.wifiCollection_20161009
2016-12-13T09:53:38.583+0000 W SHARDING [Balancer] Unable to balance collection eplus.wifiCollection_20161008 :: caused by :: Location10181: not sharded:eplus.wifiCollection_20161008
2016-12-13T09:53:40.360+0000 W SHARDING [Balancer] Unable to balance collection eplus.wifiCollection_20161009 :: caused by :: Location10181: not sharded:eplus.wifiCollection_20161009

Those collections are created and dropped after some days. And indeed those collections were dropped and are not shown in "db.getCollectionNames()".

I investigated a bit and found those collections in config DB:

{ "_id" : "eplus.wifiCollection_20161008", "lastmodEpoch" : ObjectId("000000000000000000000000"), "lastmod" : ISODate("2016-10-18T04:00:13.108Z"), "dropped" : true }
{ "_id" : "eplus.wifiCollection_20161009", "lastmodEpoch" : ObjectId("000000000000000000000000"), "lastmod" : ISODate("2016-10-19T04:00:48.158Z"), "dropped" : true }

And there are locks for many collections related to the removed shard (rs5):

{ "_id" : "eplus.wifiCollection_20160908", "state" : 0, "ts" : ObjectId("5837ee01c839440f1e70d384"), "who" : "wifi-db-05a:27018:1475838481:-1701389523:conn104", "process" : "wifi-db-05a:27018:1475838481:-1701389523", "when" : ISODate("2016-11-25T07:53:37.235Z"), "why" : "migrating chunk [{ lineId: 8915926302292949940 }, { lineId: MaxKey }) in eplus.wifiCollection_20160908" }
{ "_id" : "eplus.wifiCollection_20160909", "state" : 0, "ts" : ObjectId("5837ee01c839440f1e70d38b"), "who" : "wifi-db-05a:27018:1475838481:-1701389523:conn104", "process" : "wifi-db-05a:27018:1475838481:-1701389523", "when" : ISODate("2016-11-25T07:53:37.296Z"), "why" : "migrating chunk [{ lineId: 8915926302292949940 }, { lineId: MaxKey }) in eplus.wifiCollection_20160909" }

Not only there are locks for dropped collections but also for existant collections. Our guessing is that this is causing Balancer to continuously loop over all collections, and thus causing 100% CPU, but we are not sure how to work around.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

mongod.log.bz2
857 kB
Dec 13 2016 04:00:28 PM UTC

related to

SERVER-27474 Eliminate "dropped" collections from config server list of collections

Closed

SERVER-27475 mongos should request an update only of the collection not found

Closed

Assignee:: Nathan Myers (Inactive)
Reporter:: Isaac Cruz
Participants:: Githook User, Isaac Cruz, Kaloian Manassiev, Nathan Myers
Votes:: 0 Vote for this issue
Watchers:: 9 Start watching this issue

Created:: Dec 13 2016 10:15:18 AM UTC
Updated:: Apr 05 2017 11:41:45 AM UTC
Resolved:: Dec 21 2016 10:15:10 PM UTC
Confidence Status Last Update:: 21/Dec/16 10:14 PM

Details

Description

Attachments

Attachments

Issue Links

Forms

Activity

People

Dates