[SERVER-14126] View information about the migration cleanup tasks Created: 31/May/14  Updated: 12/Mar/20  Resolved: 02/Mar/20

Status: Closed
Project: Core Server
Component/s: Diagnostics, Sharding
Affects Version/s: 2.6.1
Fix Version/s: 4.3.4

Type: Bug Priority: Major - P3
Reporter: Alexander Komyagin Assignee: Randolph Tan
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Documented
is documented by DOCS-13510 Investigate changes in SERVER-14126: ... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Sharding 2020-03-09
Participants:

 Description   

Now that we don't print any status information about migration cleanups in the logs anymore, we should have a way to track at least the number of active cleanups.

One way is to expose getDeleter()>getStats()>getCurrentDeletes() in the server status.



 Comments   
Comment by Githook User [ 02/Mar/20 ]

Author:

{'name': 'Randolph Tan', 'username': 'renctan', 'email': 'randolph@10gen.com'}

Message: SERVER-14126 Add range deleter stats for serverStatus under shardingStatistics
Branch: master
https://github.com/mongodb/mongo/commit/fa63db9e59c9b45e448c00d5126c68b394ad7585

Comment by Sheeri Cabral (Inactive) [ 24/Feb/20 ]

Yeah, i think that's great - divide it by namespace so people see the "hot" areas, but only give a count on serverStatus. And like you say they can dig deeper looking at the config data.

Comment by Randolph Tan [ 24/Feb/20 ]

sheeri.cabral, what if we go back to your suggestion of just showing the total:

{
  rangeDeleterTasks: [
    { namespace1: <# of tasks> },
    ...
  ]
}

If someone need to inspect the actual ranges, they can always look at the shard's config collection.

Comment by Sheeri Cabral (Inactive) [ 24/Feb/20 ]

I think serverStatus should show something - a count of the tasks, the first 5 tasks, etc. That way users will know with serverStatus whether or not there are any tasks. Many MongoDB users, including us internally, use serverStatus to get baseline statistics.

 

Having something in serverStatus will help set expectations properly, so that if there's an issue, and rangeDeleter is running, a user won't automatically assume the issue was caused by rangeDeleter (because maybe it's often running at that time of day).

And then there can be a setting to show all the rangeDeleter tasks for folks that want to dig in.

Comment by Randolph Tan [ 21/Feb/20 ]

sheeri.cabral, garaudy.etienne Due to nature of the format, rangeDeleterTask theoretically can have an unbounded amount of tasks. I'm considering on making this section not to be displayed by default (meaning, user has to explicitly request for it when calling serverStatus). Does this sound good to you?

Comment by Sheeri Cabral (Inactive) [ 12/Feb/20 ]

I think your first format is fine, no need for aggregation (I think it will get too complicated, I didn't think through what aggregation would look like)

 

rangeDeleterTasks: [{
  ns: 'foo.sharded',
  min: { shardKey: 0 },
  max: { shardKey: 10 }
},{ ns: 'foo.sharded', min: { shardKey: 20 }, max: { shardKey: 30 } },{ ns: 'foo.another_sharded',
 min: { shardKey: 100 },
 max: { shardKey: 102 } 
}]

 

Comment by Randolph Tan [ 11/Feb/20 ]

Oh, the min and max come in pairs, they are the min and max boundaries of the orphaned chunks to be deleted. So an example would look like this:

{
  ns: 'foo.sharded',
  min: { shardKey: 0 },
  max: { shardKey: 10 }
}

If you think we should aggregate it, then we can probably make it look like this:

{
  'foo.sharded': 2, // 2 orphaned chunks for deletion
}

Comment by Sheeri Cabral (Inactive) [ 11/Feb/20 ]

OK, so that's:

 

ns - namespace, e.g. db.collection - or is it db.collection.field?

min - min range to delete - is this _id or some other field?

max - similar to min

 

Would it be hard to aggregate the count? e.g. 

{

rangeDeleterTaskCount: 100

rangeDeleterTasks: [

 

{      ns: test.deleteme._id     min: 12     max: 54 }

]

}

Comment by Randolph Tan [ 11/Feb/20 ]

sheeri.cabral, here's my propose format in server status under the "shardingStatistics" section for new range deleter project:

FTDC compact format:

{
  rangeDeleterTasks: <# of queued task ready for deletion>
}

serverStatus full format:

{
  rangeDeleterTasks: [
    {
      ns: <>,
      min: <>,
      max: <>,
    },
    ...
  ]
}

Note that my plan is to only include range deleter tasks that are active/queued for deletion. This means that it will not include tasks that are on pseudo standby. These are tasks we record on disk and we don't keep around in memory until we decide to actually do the delete and queue them. Sometimes these pseudo standby tasks gets cancelled and I don't think they are useful and could be noisy data that no one would care about. I'm still unsure whether this should be visible by default or keep it as an opt-in, just like the old lastDeleteStats.

Thanks!

Comment by Randolph Tan [ 10/Oct/14 ]

Current logs in master on the cleanup phase on default verbosity:

 m30001| 2014-10-10T11:41:12.078-0400 I SHARDING [RangeDeleter] Deleter starting delete for: test.user from { x: MinKey } -> { x: MaxKey }, with opId: 7
 m30001| 2014-10-10T11:41:12.078-0400 I SHARDING [RangeDeleter] Helpers::removeRangeUnlocked time spent waiting for replication: 0ms
 m30001| 2014-10-10T11:41:12.078-0400 I SHARDING [RangeDeleter] rangeDeleter deleted 0 documents for test.user from { x: MinKey } -> { x: MaxKey }

Also, a new server status section called "rangeDeleter" was added by SERVER-13648. It is not visible by default and the user has to explicitly specify it in the serverStatus command to make it appear. I't on the top most level and here's how it looks like:

	"rangeDeleter" : {
		"lastDeleteStats" : [
			{
				"deletedDocs" : NumberLong(0),
				"queueStart" : ISODate("2014-10-10T15:41:12.078Z"),
				"queueEnd" : ISODate("2014-10-10T15:41:12.078Z"),
				"deleteStart" : ISODate("2014-10-10T15:41:12.078Z"),
				"deleteEnd" : ISODate("2014-10-10T15:41:12.078Z"),
				"waitForReplStart" : ISODate("2014-10-10T15:41:12.078Z"),
				"waitForReplEnd" : ISODate("2014-10-10T15:41:12.078Z")
			}
		]
	}

Generated at Thu Feb 08 03:33:56 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.