[JAVA-3753] GetMore not closing cursor after exception leading mongos to OOM Created: 02/Jun/20 Updated: 27/Oct/23 Resolved: 29/Jul/20 |
|
| Status: | Closed |
| Project: | Java Driver |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Vinicius Grippa | Assignee: | Ross Lawley |
| Resolution: | Gone away | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Description |
|
If we analyze the code:
The killCursor function is not inside a try..catch/finally so the connection is being closed in case of an exception but the cursor is not. In certain situations, the application can start pilling up cursors on mongos leading to OOM. |
| Comments |
| Comment by Ross Lawley [ 01/Jul/20 ] |
|
Hi, I've added If the kill Cursor command fails then there is potential for the cursors to still exist in MongoS and they will eventually be killed by the cursor timeout process. I'm not sure if there is anything the driver could do here, but I'll await to see if we can reproduce the issue here. Ross |
| Comment by Vinicius Grippa [ 20/Jun/20 ] |
|
Hi Ross,
Sorry for the late response. I'm trying to work on a reproducible case to give to you. At this moment I don't have one. What I could observe is that the abandoned cursors on the mongoS are being killed by the cursorTimeoutMillis setting (10 minutes). The consequence is that I have several cursors opened for minutes and the application is creating more.
Let me know if this helps otherwise I will keep working to see if I can get a reproducible case. |
| Comment by Ross Lawley [ 19/Jun/20 ] |
|
Just chasing this up to see if you could provide any more information. Kind Regards, Ross |
| Comment by Ross Lawley [ 10/Jun/20 ] |
|
Thanks for the ticket. I'm not sure I fully understand. Are you seeing errors where the killCursor errors? Do you have any logs? I can see that if killCursor(connection) throws an error then the connectionSource isn't released do you think that is the cause of the OOM errors you are seeing? Ross |