[CSHARP-1061] Mongo InsertBatch and IO read performance degradaion Created: 10/Sep/14 Updated: 05/Apr/19 Resolved: 11/Sep/14 |
|
| Status: | Closed |
| Project: | C# Driver |
| Component/s: | Performance |
| Affects Version/s: | 1.9 |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Nilanjan Dutta | Assignee: | Unassigned |
| Resolution: | Done | Votes: | 1 |
| Labels: | Performance, question | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Windows 64 bit. 8GB RAM. i5 2540M 2,6 GHz |
||
| Attachments: |
|
| Description |
|
So here is the use case: I have a locally saved large log file (>500 MB) which I read in predefined chunks in c# and every time i get a chunk I insert that to a localhost Mongo instance. Next chunk I append it to the same collection. I use "InsertBatch" method from MongoCollection.cs. Now with Mongo I have gained better performance in write but my reading performance has degraded drastically. For general chunks numbers are now like: So if I consider "Total time", the performance has degraded with MongoDB because of the log reading time. How can this be related is my first question. Second is, how to get rid of this? |
| Comments |
| Comment by Nilanjan Dutta [ 11/Sep/14 ] | ||||||||||||||||||||
|
Thanks Craig, that'd help. Yes I'm running 2.6.4 server in my machine. Thanks for your input, hoping to get a resolution soon from the other forum. | ||||||||||||||||||||
| Comment by Craig Wilson [ 11/Sep/14 ] | ||||||||||||||||||||
|
Hi Nilanjan, As this is not a driver problem, having this discussion here isn't going to get the right people looking at it to help solve your issue. So, I'm going to close this ticket. However, I have a few suggestions for you. 1. I'd repost your question here: https://groups.google.com/forum/#!forum/mongodb-user. Include the information we discovered during our conversations, including that this is transient data, the mongo versions you are using, the fact that you are having IO contention between reading on disk and mongod, how big these files and your data is, etc... 2. Make sure you are running server 2.6.4. There has been significant improvements related to IO and background flushes on windows. Sorry I couldn't be more helpful. I'll look for your post on the groups site and contribute information that may be helpful for a more general audience to diagnose your problem and provide a solution. | ||||||||||||||||||||
| Comment by Nilanjan Dutta [ 11/Sep/14 ] | ||||||||||||||||||||
|
sc.exe create MongoDB binPath= "\"D:\MongoDB \bin\mongod.exe\" --service --config=\"D:\MongoDB\mongod.conf\"" DisplayName= "MongoDB" start= "auto" this is how I created the service | ||||||||||||||||||||
| Comment by Nilanjan Dutta [ 11/Sep/14 ] | ||||||||||||||||||||
|
Yes. it is temporary going to get stored in the DB. once from UI the log window gets closed we plan to drop the table. I created a win service for Mongo and I read from the following config file setting: dbpath = D:\CAT\MongoDB\data\db auth = true Still I could observe the spikes coming up while read. Could you please guide if something is missing? | ||||||||||||||||||||
| Comment by Craig Wilson [ 11/Sep/14 ] | ||||||||||||||||||||
|
Ok. One more question, is this just transient data that can be reloaded, or does it need to persist for a long time. I'm assuming it's just transient, so no need to care about failover or redundancy. If it is just transient data (essentially throw away data), then we can do some things. The first thing to try would be to turn off journaling. You can do this by starting mongod with --nojournal. I'd also still like to see a very verbose log file during one of these runs. | ||||||||||||||||||||
| Comment by Nilanjan Dutta [ 11/Sep/14 ] | ||||||||||||||||||||
|
Yes. This is how it going to be in production. Mongo is going to get packaged in every client's laptop along with the software and client would have the log files also dumped in his/her laptop to analyze. | ||||||||||||||||||||
| Comment by Craig Wilson [ 11/Sep/14 ] | ||||||||||||||||||||
|
Hi Nilanjan, I thought this might be what was happening, but you've confirmed it. You are simply competing with the io resources with the actual mongo database. I can't tell you what this read is off the top of my head as it could be many things from the 60 second flush to disk or the 100ms journal write to simply allocating some space as it sees you are writing a lot of data and wants to pre-allocate some more space. Perhaps you could provide a very verbose log file (-vvvvv) from the server and it would give some insight (some of these can be disabled or tweaked). But rather than trying to disable things, let's first see if it matters. Is this how you are going to run in production - reading large files off your local disk and putting them in a database also running off the local disk? | ||||||||||||||||||||
| Comment by Nilanjan Dutta [ 11/Sep/14 ] | ||||||||||||||||||||
|
Please take a look from time stamp 2:30.. there are evidence of overlapping read and write process. Wondering if that adds up to any additional read time. | ||||||||||||||||||||
| Comment by Nilanjan Dutta [ 11/Sep/14 ] | ||||||||||||||||||||
|
Today morning IST I used ProcMon to give me a map of what happens when the file is read. I observed in between the file read, the calls which take larger time, there is read operation by mongod.exe on DB file.This read time of DB adds upto the total file read time. If you could please explain why this read is required and also if we can disable this read somehow. I think this may well be the root of the problem. I'm attaching a snapshot from the proc mon for your reference too. | ||||||||||||||||||||
| Comment by Nilanjan Dutta [ 10/Sep/14 ] | ||||||||||||||||||||
|
ANTS performance profiling result. Shows higher log reading time towards the later chunks. | ||||||||||||||||||||
| Comment by Craig Wilson [ 10/Sep/14 ] | ||||||||||||||||||||
|
Certainly interested. Feel free to upload it to the ticket. | ||||||||||||||||||||
| Comment by Nilanjan Dutta [ 10/Sep/14 ] | ||||||||||||||||||||
|
I see. I was speculating about the IO lock. I did a performance profiling of the flow with ANTS performance profiler, and found that the "Waiting for I/O operation" is the one while reading the log file takes more time than expected or compared to Sql Lite. Total write (insert batch) of 500 MB log file in Mongo = 95s Total read time on similar fashion for the log file in Mongo environment = >100s Would you be interested in the profiling result I have saved in mongo environment? | ||||||||||||||||||||
| Comment by Craig Wilson [ 10/Sep/14 ] | ||||||||||||||||||||
|
No. There is no reason that inserting documents would cause the reading of a file to be slow. That being said, the driver doesn't do asynchronous socket IO. We also don't talk to the file system, so there should be no overlap. (we are moving to async socket IO in our 2.0 release). Also, the InsertBatch call is blocking. So, if it takes a while to insert a lot of documents and your timer is running for reading the files, then it is going to include the time during writing to mongo. | ||||||||||||||||||||
| Comment by Nilanjan Dutta [ 10/Sep/14 ] | ||||||||||||||||||||
|
I would like to know if C# driver issues any locks until it completes inserting async. Is there any event which we can listen to, to find out if the async insertion is complete? | ||||||||||||||||||||
| Comment by Nilanjan Dutta [ 10/Sep/14 ] | ||||||||||||||||||||
|
I am bringing in the reading part of the file because, with Mongo DB when I do insert batch the subsequent call to read the file takes more time. This looks like InsertBatch is happening in async and holding hogging up the IO and limiting the resources for reading of the file on the same hard drive. | ||||||||||||||||||||
| Comment by Craig Wilson [ 10/Sep/14 ] | ||||||||||||||||||||
|
Thanks. I'm confused about why you are referring to this code in the context of the C# Driver. None of these classes (FileStream, BufferedStream, StreamReader) have anything to do with the C# driver. So, what part of the "Chunk Read Time" could be slowed down by the C# driver? | ||||||||||||||||||||
| Comment by Nilanjan Dutta [ 10/Sep/14 ] | ||||||||||||||||||||
|
LogEvent class is like a data structure with properties compliant with the log file. e.g. private set { this.relatedEventIndex = value; }} [BsonIgnoreIfNull, BsonDefaultValue(0), BsonIgnoreIfDefault] private set { this.secondTimeFraction = value; }} [BsonIgnoreIfNull] private set { this.secondTimeStamp = value; }} And though we have a complex code to read the file, here is the main snippet which would be helpful. Also please note that these remain same in SqlLite environment.: FileStream reader = null; | ||||||||||||||||||||
| Comment by Craig Wilson [ 10/Sep/14 ] | ||||||||||||||||||||
|
Could you provide the code for the reading as well as what your LogEvent class looks like. | ||||||||||||||||||||
| Comment by Nilanjan Dutta [ 10/Sep/14 ] | ||||||||||||||||||||
|
Yes craig that risk we have in consideration since we need faster insert performance. By "chunk read" I meant file I/O. It's not DB read. I read it through StreamReader of C# | ||||||||||||||||||||
| Comment by Craig Wilson [ 10/Sep/14 ] | ||||||||||||||||||||
|
I asked about the query because you said "Chunk Read Time"... Perhaps you could clarify what you mean. Also, by using WriteConcern.Unacknowledged, you are stating that you don't care if the chunks were successfully uploaded. | ||||||||||||||||||||
| Comment by Nilanjan Dutta [ 10/Sep/14 ] | ||||||||||||||||||||
|
Thanks Craig for you response. I fail to understand the "query" part though. However I'm posting the code snippet through which I call the InsertBatch. I used the c sharp driver to give me these wrappers:
Here "LogEvents" is a custom type which represents the logs read from the log file. | ||||||||||||||||||||
| Comment by Craig Wilson [ 10/Sep/14 ] | ||||||||||||||||||||
|
Hi Nilanjan, Sorry you are having issues. Could you run an explain on the query from the shell and put the results here? That would look something like this: {{db.chunks.find( {x: 1}).explain();}} where chunks is your collection name and replace the query with whatever it is you are running. |