[SERVER-44888] cleanupOrphaned would take 20 days, reimporting the data would take only 1-2 days Created: 01/Dec/19  Updated: 06/Dec/19  Resolved: 06/Dec/19

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 4.2.1
Fix Version/s: None

Type: Improvement Priority: Minor - P4
Reporter: Tudor Aursulesei Assignee: Eric Sedor
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Participants:

 Description   

I have a sharded cluster, and i inserted 1 billion documents into an unsharded collection.

I then sharded that collection, and the balancer distributed all the chunks to the other shards. Running a count() on the collection yields a wrong result; the first shard shows ~1 billion documents, and the other two show 333 million each, in total ~1.666 billion documents. I can see the count going down with 200-300 documents each second. This means it would take >20 days to complete the delete, but it would only take 2-3 days to drop the collection and reinsert the data. Is there any way to make this process faster?

I'm using mongo 4.2.1



 Comments   
Comment by Eric Sedor [ 06/Dec/19 ]

Understood thestick613; to really investigate this we would need the diagnostic data, the specific cleanupOrphaned arguments you passed, and the results of sh.status(). I am going to close this for now but I encourage you to comment here or open a new issue if this happens again and you can provide that full set of info.

But to be clear we do think a rate of 200-300 documents/s is suspicious.

Thank you very much!

Comment by Tudor Aursulesei [ 02/Dec/19 ]

Hello,

I've let the cluster rebalance itself for a while (1-2 days), and because progress was slow, i found out theĀ  cleanupOrphaned command, and i ran it for another 1-2 days. I then created this ticket. I since removed all the data, and i've restarted the reimport process from scratch, so no diagnostic.data.

Comment by Eric Sedor [ 02/Dec/19 ]

Hi thestick613, a couple things:

  • Can you confirm whether or not you have manually run a cleanupOrphaned command? I ask because it sounds like you are describing the activity of the RangeDeleter that removes documents after chunk migration, not the cleanupOrphaned command.
  • For the Primary shard that is deleting documents, would you please archive (tar or zip) the $dbpath/diagnostic.data directory (the contents are described here) and attach it to this ticket?

Gratefully,
Eric

Generated at Thu Feb 08 05:07:17 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.