[SERVER-21095] Using tailable cursor on capped collection memory leak Created: 23/Oct/15  Updated: 07/Apr/23  Resolved: 05/Jul/16

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: 2.4.7, 3.0.2
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Frank Pfattheicher Assignee: Kelsey Schubert
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File IncreasingOpenCursors.png     Zip Archive TailableCursorLegacy.zip     File metrics.2016-04-04T07-28-52Z-00000     File mongod-21095.log     PNG File screenshot.png    
Operating System: ALL
Steps To Reproduce:

Create capped collection.
Insert one record - save id,
Create tailable cursor with query id > saved id.
tyr to get next record in a loop.
Every loop ther is an amount of memory leaked in the server.

Participants:

 Description   

Using a tailable cursor on a capped collection leads to memory leak in the server if the first cursor MoveNext does not get a record.

BsonValue lastId = collection.FindAll().SetSortOrder(SortBy.Descending("_id")).SetLimit(1).First()["_id"];
 
cursor = collection.Find(Query.GT("_id", lastId))
              .SetFlags(QueryFlags.AwaitData | QueryFlags.TailableCursor | QueryFlags.NoCursorTimeout);
 
using (var enumerator = new MongoCursorEnumerator<BsonDocument>(cursor))
{
	while (triggerThread.IsAlive)
	{
		if (enumerator.MoveNext())
		{
			var document = enumerator.Current;
			if (document != null)
			{
				HandleNextData(document);
			}
		}
		else
		{
			if (enumerator.IsDead)
			{
				break;
			}
			if (!enumerator.IsServerAwaitCapable)
			{
				Thread.Sleep(TimeSpan.FromMilliseconds(100));
			}
		}
	}
}

If the cursor query is changed to "GTE"

cursor = collection.Find(Query.GTE("_id", lastId))

so as the first result the last record is returned all works fine.



 Comments   
Comment by Ramon Fernandez Marina [ 05/Jul/16 ]

Thanks for the update fpf_baden – sorry it took so long to get to the bottom of this but glad to hear your use case is working.

Regards,
Ramón.

Comment by Frank Pfattheicher [ 04/Jul/16 ]

Now works with provided code for new 2.x driver.
No need to fix anything in 1.x driver.

Comment by Robert Stam [ 24/May/16 ]

fpf@ict-baden.de Thank you for providing the sample code to reproduce this. Can you use the 2.x version of the C# driver? The previous comment has sample code.

Comment by Robert Stam [ 24/May/16 ]

There is a bug in the the 1.x C# driver's MongoCursorEnumerator class when used with tailable cursors. The MongoCursorEnumerator class is deprecated and no longer used by the rest of the driver (though we seem to have missed marking it with the [Obsolete] attribute).

When using the 2.x driver tailable cursors are supported differently. At a low level, calling MoveNext blocks until either a new document is available or the cursor dies. When iterating over cursor results in a foreach loop, each iteration of the loop will block until a new document is available or the cursor dies. If you exit a foreach loop it means the cursor died.

Here's some sample code to tail a collection using the 2.x API:

private static void TailCollection(IMongoCollection<BsonDocument> collection)
{
    Console.WriteLine($"Tailing collection: {collection.CollectionNamespace.CollectionName}.");
    var lastId = ObjectId.Empty;
    while (true)
    {
        Console.WriteLine($"Restarting the tailable cursor after: {lastId}.");
        var filter = Builders<BsonDocument>.Filter.Gt("_id", lastId);
        var options = new FindOptions { CursorType = CursorType.TailableAwait };
        foreach (var document in collection.Find(filter, options).ToEnumerable())
        {
            Console.WriteLine(document);
            lastId = document["_id"].AsObjectId;
        }
        // if we reach here the tailable cursor died and needs to be restarted
        Console.WriteLine("Cursor died.");
        Thread.Sleep(TimeSpan.FromSeconds(1));
    }
}

I tested this code and don't see any evidence of cursors being leaked.

Note: an easy way to see how many cursors the server currently has is using this shell command:

> db.serverStatus().metrics.cursor
{
        "timedOut" : NumberLong(1),
        "open" : {
                "noTimeout" : NumberLong(0),
                "pinned" : NumberLong(0),
                "total" : NumberLong(0)
        }
}
>

When I used the sample code provided in TailableCursorLegacy.zip I saw hundreds of thousands of cursors created.

Comment by Frank Pfattheicher [ 19/May/16 ]

Very simple console application showing the problem.
The fact is that event the cursor is created with AwaitData, the enumerator returns immediately if there is initially no record.

I uses driver 1.11 in this version - i did not manage to get the sample working with newer driver versions dur ther is no enumerator any more !?

The version of the database itself doesn't matter.

Comment by Kelsey Schubert [ 04/Apr/16 ]

Hi fpf_baden,

Thank you for uploading the diagnostic.data. The graph below indicates that while your script is running (from A to B) the number of open cursors increases to 250,000. Since these cursors have noTimeout they likely explain the increase in memory usage that you are observing.

Now that we know what we are looking for, we would like to make another attempt to reproduce this issue. Can you please specify which specific version of the MongoDB .NET Driver you are using? Would it be possible for you to provide a complete standalone Program.cs which reproduces this issue?

Thank you again for your help,
Thomas

Comment by Frank Pfattheicher [ 04/Apr/16 ]

memory leak

Comment by Frank Pfattheicher [ 04/Apr/16 ]

diagnostic.data added

Comment by Ramon Fernandez Marina [ 28/Mar/16 ]

fpf_baden, we haven't heard back from you for some time. Do you think you can upload the diagnostic.data directory as requested by Thomas above?

Thanks,
Ramón.

Comment by Kelsey Schubert [ 10/Mar/16 ]

Hi fpf_baden,

MongoDB 3.2 introduced a diagnostic data collection mechanism for logging server statistics to diagnostic files at periodic intervals. Can you please attach the contents the diagnostic.data directory to this ticket?

To aid our investigation, when you upload the diagnostic.data files please note when the operation causing this behavior began and ended.

Thank you,
Thomas

Comment by Frank Pfattheicher [ 07/Mar/16 ]

Hello Ramon,

as described in my last comment i set up a server to show the problem.
Now i updated to V3.2.3, the problem still exists!

RDP: mongodb-21095.cloudapp.net:55449
User: mongo
Pwd: mongodb-21095

I also added a screenshot.

Comment by Ramon Fernandez Marina [ 05/Mar/16 ]

Hi fpf_baden, looks like after our failed attempts to reproduce this ticket fell through the cracks – sorry about that. Are you still seeing this behavior? Have you tried a newer version (e.g.: 3.0.9 or 3.2.3) by any chance?

Thanks,
Ramón.

Comment by Frank Pfattheicher [ 04/Nov/15 ]

For your convienience i set up a virtual machine to show the error.

New installed Windows Server 2008 R2, MongoDB V3.0.7

local TestDb with a single "events" collection with a single record...

RDP: mongodb-21095.cloudapp.net:55449
User: mongo
Pwd: mongodb-21095

On the desktop is a link to a TestApp ...

Hope this helps..

Comment by Daniel Pasette (Inactive) [ 29/Oct/15 ]

Hi Frank,
We tested this code on Windows 10, which is akin to Windows Server 2016. We can't reproduce this at all with either our old driver or our new one using server 2.4, 3.0, and 3.2.0-rc1. Is there any other special considerations we should keep in mind while trying your reproduction steps?

Comment by Frank Pfattheicher [ 27/Oct/15 ]

Log with memory leak

Comment by Frank Pfattheicher [ 27/Oct/15 ]

I missed tu specify the platform: Windows Server 2008 R2 64 bit, same on Win7x64

How are you measuring "memory leaked"?
no need for special tools total system memory wasted in a few minutes... (TaskManager)

Can you also include your measured numbers for the leak, and server logs please?
attached log.

Also, I assume you are not using wired tiger as the storage engine in 3.0.2 for your tests, correct?
I think not, simply updatet the database

Comment by Scott Hernandez (Inactive) [ 23/Oct/15 ]

How are you measuring "memory leaked"?

Can you also include your measured numbers for the leak, and server logs please?

Also, I assume you are not using wired tiger as the storage engine in 3.0.2 for your tests, correct?

Generated at Thu Feb 08 03:56:17 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.