[SERVER-50182] Open cursors can block cleanupOrphaned Created: 07/Aug/20  Updated: 27/Oct/23  Resolved: 13/Aug/20

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 4.0.2
Fix Version/s: None

Type: Question Priority: Major - P3
Reporter: Leo Be Assignee: Eric Sedor
Resolution: Works as Designed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-45367 When the Range Deleter is waiting for... Closed
Participants:

 Description   

We run a sharded cluster. When trying to run cleanupOrphaned on some of those, the primary instance only logs the following:

2020-08-06T08:19:03.481+0000 I SHARDING [conn25449020] Deletion of XXX range [{ _id: MinKey }, { _id: -XXXXXXXX }) will be scheduled after all possibly dependent queries finish

 

We verified that no other operations are running on this shard with the following query:

{{ db.currentOp().inprog.map(function (o) { if (o.ns === "admin.$cmd") return

{ opid: o.opid, secs: o.secs_running, ns: o.ns, command: o.command }

}).filter(Boolean)}}

Even after waiting for over an hour (the normal timeout for cleanupOrphaned) there is no feedback whatsoever.
{{Is there anything we can do to successfully clean orphaned documents from the affected shards? }}

Thank you.



 Comments   
Comment by Eric Sedor [ 13/Aug/20 ]

Thanks Leo, glad to hear. I'm going to close this ticket as Working as Designed. There are tickets out there for improving cursor behavior but it's intentional that cleanupOrphaned and the RangeDeleter (mentioned in SERVER-45367) block in this way.

Comment by Leo Be [ 13/Aug/20 ]

Hello,

restarting the primary did indeed help. The process is progressing now, thank you!

Leo

Comment by Eric Sedor [ 07/Aug/20 ]

Hi leo@jodel.com,

This issue is related to open cursors, not necessarily running operations on those cursors. Briefly, the range deleter blocks on open cursors to avoid conflict with those cursors. To get around this we suggest:

1) Checking your code for "noCursorTimeout" and either avoiding its use or implementing guarantees that the app will consume all results from the cursor (so as to allow its closure).
2) Restart the node(s) that show this issue to ensure cursors are closed.
3) Try running cleanupOrphaned again.

It's worth noting we improved this log line in MongoDB version 4.0.19 in SERVER-45367, so upgrading to that version is also a good idea. That version will also have many other fixes and improvements for the 4.0 series.

Can you let us know if this addresses your issue?

Eric

Generated at Thu Feb 08 05:22:00 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.