[CSHARP-1061] Mongo InsertBatch and IO read performance degradaion Created: 10/Sep/14  Updated: 05/Apr/19  Resolved: 11/Sep/14

Status: Closed
Project: C# Driver
Component/s: Performance
Affects Version/s: 1.9
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Nilanjan Dutta Assignee: Unassigned
Resolution: Done Votes: 1
Labels: Performance, question
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Windows 64 bit. 8GB RAM. i5 2540M 2,6 GHz


Attachments: PNG File Capture.PNG     File Logfile.PML     File ReadProfile-latest.app8results    

 Description   

So here is the use case: I have a locally saved large log file (>500 MB) which I read in predefined chunks in c# and every time i get a chunk I insert that to a localhost Mongo instance. Next chunk I append it to the same collection. I use "InsertBatch" method from MongoCollection.cs.
Previously I was using SqlLite for the same mechanism and every chunk of approx 10 MB size had following numbers in my machine:
1. Chunk Read time = ~ 0.5 s
2. Chunk write to SqlLite = ~ 3s

Now with Mongo I have gained better performance in write but my reading performance has degraded drastically. For general chunks numbers are now like:
1. Chunk Read Time = ~ 3s . For few chunks (although the size is same) it shoots up like a spike and goes beyond 10-15 s)
2. Write time = ~2s

So if I consider "Total time", the performance has degraded with MongoDB because of the log reading time. How can this be related is my first question. Second is, how to get rid of this?



 Comments   
Comment by Nilanjan Dutta [ 11/Sep/14 ]

Thanks Craig, that'd help. Yes I'm running 2.6.4 server in my machine. Thanks for your input, hoping to get a resolution soon from the other forum.

Comment by Craig Wilson [ 11/Sep/14 ]

Hi Nilanjan,

As this is not a driver problem, having this discussion here isn't going to get the right people looking at it to help solve your issue. So, I'm going to close this ticket. However, I have a few suggestions for you.

1. I'd repost your question here: https://groups.google.com/forum/#!forum/mongodb-user. Include the information we discovered during our conversations, including that this is transient data, the mongo versions you are using, the fact that you are having IO contention between reading on disk and mongod, how big these files and your data is, etc...

2. Make sure you are running server 2.6.4. There has been significant improvements related to IO and background flushes on windows.

Sorry I couldn't be more helpful. I'll look for your post on the groups site and contribute information that may be helpful for a more general audience to diagnose your problem and provide a solution.

Comment by Nilanjan Dutta [ 11/Sep/14 ]

sc.exe create MongoDB binPath= "\"D:\MongoDB \bin\mongod.exe\" --service --config=\"D:\MongoDB\mongod.conf\"" DisplayName= "MongoDB" start= "auto"

this is how I created the service

Comment by Nilanjan Dutta [ 11/Sep/14 ]

Yes. it is temporary going to get stored in the DB. once from UI the log window gets closed we plan to drop the table. I created a win service for Mongo and I read from the following config file setting:

dbpath = D:\CAT\MongoDB\data\db
logpath = D:\CAT\MongoDB\log\mongo.log
logappend = false

auth = true
quiet = true
nojournal = true

Still I could observe the spikes coming up while read. Could you please guide if something is missing?

Comment by Craig Wilson [ 11/Sep/14 ]

Ok. One more question, is this just transient data that can be reloaded, or does it need to persist for a long time. I'm assuming it's just transient, so no need to care about failover or redundancy.

If it is just transient data (essentially throw away data), then we can do some things. The first thing to try would be to turn off journaling. You can do this by starting mongod with --nojournal.

I'd also still like to see a very verbose log file during one of these runs.

Comment by Nilanjan Dutta [ 11/Sep/14 ]

Yes. This is how it going to be in production. Mongo is going to get packaged in every client's laptop along with the software and client would have the log files also dumped in his/her laptop to analyze.

Comment by Craig Wilson [ 11/Sep/14 ]

Hi Nilanjan, I thought this might be what was happening, but you've confirmed it. You are simply competing with the io resources with the actual mongo database. I can't tell you what this read is off the top of my head as it could be many things from the 60 second flush to disk or the 100ms journal write to simply allocating some space as it sees you are writing a lot of data and wants to pre-allocate some more space. Perhaps you could provide a very verbose log file (-vvvvv) from the server and it would give some insight (some of these can be disabled or tweaked).

But rather than trying to disable things, let's first see if it matters. Is this how you are going to run in production - reading large files off your local disk and putting them in a database also running off the local disk?

Comment by Nilanjan Dutta [ 11/Sep/14 ]

Please take a look from time stamp 2:30.. there are evidence of overlapping read and write process. Wondering if that adds up to any additional read time.

Comment by Nilanjan Dutta [ 11/Sep/14 ]

Today morning IST I used ProcMon to give me a map of what happens when the file is read. I observed in between the file read, the calls which take larger time, there is read operation by mongod.exe on DB file.This read time of DB adds upto the total file read time. If you could please explain why this read is required and also if we can disable this read somehow. I think this may well be the root of the problem. I'm attaching a snapshot from the proc mon for your reference too.

Comment by Nilanjan Dutta [ 10/Sep/14 ]

ANTS performance profiling result. Shows higher log reading time towards the later chunks.

Comment by Craig Wilson [ 10/Sep/14 ]

Certainly interested. Feel free to upload it to the ticket.

Comment by Nilanjan Dutta [ 10/Sep/14 ]

I see. I was speculating about the IO lock. I did a performance profiling of the flow with ANTS performance profiler, and found that the "Waiting for I/O operation" is the one while reading the log file takes more time than expected or compared to Sql Lite.

Total write (insert batch) of 500 MB log file in Mongo = 95s
Wherein SqlLite it is about 130s

Total read time on similar fashion for the log file in Mongo environment = >100s
whereas in SqlLite environment = ~45s.

Would you be interested in the profiling result I have saved in mongo environment?

Comment by Craig Wilson [ 10/Sep/14 ]

No. There is no reason that inserting documents would cause the reading of a file to be slow. That being said, the driver doesn't do asynchronous socket IO. We also don't talk to the file system, so there should be no overlap. (we are moving to async socket IO in our 2.0 release).

Also, the InsertBatch call is blocking. So, if it takes a while to insert a lot of documents and your timer is running for reading the files, then it is going to include the time during writing to mongo.

Comment by Nilanjan Dutta [ 10/Sep/14 ]

I would like to know if C# driver issues any locks until it completes inserting async. Is there any event which we can listen to, to find out if the async insertion is complete?

Comment by Nilanjan Dutta [ 10/Sep/14 ]

I am bringing in the reading part of the file because, with Mongo DB when I do insert batch the subsequent call to read the file takes more time. This looks like InsertBatch is happening in async and holding hogging up the IO and limiting the resources for reading of the file on the same hard drive.
We can accept this if it doesn't take too much time, but sometimes it spikes upto 30 to 40 seconds which normally would only take 2 seconds to read. I tried adding lock to the reading, but it didn't help.

Comment by Craig Wilson [ 10/Sep/14 ]

Thanks. I'm confused about why you are referring to this code in the context of the C# Driver. None of these classes (FileStream, BufferedStream, StreamReader) have anything to do with the C# driver. So, what part of the "Chunk Read Time" could be slowed down by the C# driver?

Comment by Nilanjan Dutta [ 10/Sep/14 ]

LogEvent class is like a data structure with properties compliant with the log file. e.g.
[BsonDefaultValue(0), BsonIgnoreIfNull, BsonIgnoreIfDefault]
public long Index
{
get

{ return this.relatedEventIndex; }

private set

{ this.relatedEventIndex = value; }

}

[BsonIgnoreIfNull, BsonDefaultValue(0), BsonIgnoreIfDefault]
public int SecondTimeFraction
{
get

{ return this.secondTimeFraction; }

private set

{ this.secondTimeFraction = value; }

}

[BsonIgnoreIfNull]
public DateTime SecondTimeStamp
{
get

{ return this.secondTimeStamp.AddMilliseconds((double) this.secondTimeFraction); }

private set

{ this.secondTimeStamp = value; }

}

And though we have a complex code to read the file, here is the main snippet which would be helpful. Also please note that these remain same in SqlLite environment.:

FileStream reader = null;
reader.Seek(this.fileIndex -= bufferSize, SeekOrigin.End);
BufferedStream bs = new BufferedStream(reader);
StreamReader sr = new StreamReader(bs);
sr.ReadBlock(bytes, 0, bufferSize + incompleteLogBytes);

Comment by Craig Wilson [ 10/Sep/14 ]

Could you provide the code for the reading as well as what your LogEvent class looks like.

Comment by Nilanjan Dutta [ 10/Sep/14 ]

Yes craig that risk we have in consideration since we need faster insert performance. By "chunk read" I meant file I/O. It's not DB read. I read it through StreamReader of C#

Comment by Craig Wilson [ 10/Sep/14 ]

I asked about the query because you said "Chunk Read Time"... Perhaps you could clarify what you mean.

Also, by using WriteConcern.Unacknowledged, you are stating that you don't care if the chunks were successfully uploaded.

Comment by Nilanjan Dutta [ 10/Sep/14 ]

Thanks Craig for you response. I fail to understand the "query" part though. However I'm posting the code snippet through which I call the InsertBatch. I used the c sharp driver to give me these wrappers:

MongoDatabase db;
MongoServer mongoServer;
 
var mongo = new MongoClient();
mongoServer = mongo.GetServer();
 
db = mongoServer.GetDatabase("logDB");
db.Drop();
db = mongoServer.GetDatabase("logDB");
 
MongoCollection<LogEvent> logCollection = db.GetCollection<LogEvent>("LogEvent" + repositoryId);
 
MongoInsertOptions options = new MongoInsertOptions();
options.WriteConcern = WriteConcern.Unacknowledged;
options.CheckElementNames = false;
            
using (mongoServer.RequestStart(db))
{
       var result = logCollection.InsertBatch(logEvents, options);
}

Here "LogEvents" is a custom type which represents the logs read from the log file.

Comment by Craig Wilson [ 10/Sep/14 ]

Hi Nilanjan,

Sorry you are having issues. Could you run an explain on the query from the shell and put the results here? That would look something like this:

{{db.chunks.find(

{x: 1}

).explain();}}

where chunks is your collection name and replace the query with whatever it is you are running.

Generated at Wed Feb 07 21:38:33 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.