[SERVER-15131] Insert performance to a fulltext-enabled collection Created: 03/Sep/14 Updated: 07/Oct/14 Resolved: 25/Sep/14 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Performance |
| Affects Version/s: | 2.6.2 |
| Fix Version/s: | None |
| Type: | Question | Priority: | Major - P3 |
| Reporter: | Jaan Paljasma | Assignee: | Ramon Fernandez Marina |
| Resolution: | Done | Votes: | 0 |
| Labels: | fulltext, text_index | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
OSX 10.9.4, MongoDB 2.6.0 |
||
| Participants: |
| Description |
|
I am experiencing slowdown when inserting documents into a collection with more than 650,000 documents in it. The data used is wikipedia content dump. It started fast as always, and I was getting decent ~1000 documents/second, however the performance degraded after every batch of 3000 documents, and now at the 666000 documents the speed is less than 10 documents per second. My goal was to insert 1M documents to test full text speed, which seems now forever at this pace. Is this by design? Here are the indexes:
and stats
|
| Comments |
| Comment by Ramon Fernandez Marina [ 07/Oct/14 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
After some further investigation my colleage adinoyi.omuya@10gen.com has found the cause of the behavior you describe: the records in your corpus get bigger and bigger, so it is not surprising it takes longer to insert them. The JSON string representing the first record in your corpus is 221 bytes long:
while the last one is 57231 bytes long. I created a corpus of 1M records, each 221 bytes long as the one above, and the insert speed is constant as shown in the log below:
When I create a corpus of records like the bigger ones in your corpus I do get a lower insert speed as expected, but it's also constant:
In summary, the change in insert speed you're observing is inversely proportional to the size of the records you're inserting. Regards, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Jaan Paljasma [ 30/Sep/14 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Ramon, I assume that "works as designed" may sound comfortable, but this particular scenario is an excellent test case revealing room for improvement in MongoDB full text. Suggestion to drop index, insert document(s) and then recreate index on production environment is not an option to anyone: tested and it took about 11 minutes to regenerate full text index for only 500,000 documents. Questions: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 25/Sep/14 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
jaan@hebsdigital.com, after running some tests on our end it seems that updating the index is killing insert performance. With this many documents, you may be better off inserting the data first and then building the index. Every time you need to insert another large number of documents, performance may be better if you drop the index before inserting documents and rebuild it immediately afterwards. If you want to discuss other options please post on the mongodb-user group or Stack Overflow with the mongodb tag, where you'll reach a larger audience. A question like this involving more discussion would be best posted on the mongodb-user group. Regards, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 08/Sep/14 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thanks for the additional information and detailed instructions jaan@hebsdigital.com, we'll take a look and let you know what we find. Regards, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Jaan Paljasma [ 08/Sep/14 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Update: I have exported data from MySQL to JSON dump file and using mongoimport to import the data to a collection with full text index already enabled. Furthermore, I posted new mediawiki.json.tar.gz to you. You can extract it, and replay the scenario by following: 1) first create full text index on collection "content":
2) run mongoimport:
3) take time.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Jaan Paljasma [ 04/Sep/14 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Here you go,
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 04/Sep/14 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thanks jaan@hebsdigital.com. The mongostat info you sent shows hundreds of page faults per second. Could you run vm_stat 1 on a terminal and post the output after a few iterations? Something like this (but with some more iterations):
Given the large size of the index I'm wondering whether your system is busy handling page faults. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Jaan Paljasma [ 04/Sep/14 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Ramon, this is a standalone machine 16GB RAM, OSX 10.9.4, db.version() reports 2.6.0, no replication. I can give you a dataset I use once current application finishes; it will be MySQL dump (tar.gz) containing 1M rows. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 04/Sep/14 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
jaan@hebsdigital.com, is this on a sharded setup or in a stand-alone machine? Are you using replication? How much memory is installed in the machine(s) where mongod is running? You mention 2.6.0 and 2.6.2, can you clarify which version you're using? Also it could be useful to get the exact same dataset you're using so we can try to reproduce this behavior on our end. If you would consider sharing it please let us know and we can send upload instructions. Thanks, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Jaan Paljasma [ 04/Sep/14 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Here's the sample of mongostat revealing that database is locked for most of the time during inserts. At the moment the process is still at 925,000 documents, insert performance is now ~ 2 documents / sec. |