[SERVER-18512] Performance drop after migrating from 2.4 to 2.6 Created: 18/May/15  Updated: 28/Oct/15  Resolved: 28/Oct/15

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 2.6.8, 2.6.9
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Marzio Rizzo Assignee: Sam Kleinman (Inactive)
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File 500k_populator.js     File 500k_updates_benchRun.js     Microsoft Word iostats.xlsx     File mongo-2.4.5-secondary.tar.gz     File mongo-2.6.9-primary.tar.gz     File mongo-2.6.9-secondary.tar.gz     PNG File mongo24.png     PNG File mongo26.png     Microsoft Word mongo_replicaset_test_results.xlsx     File mongod.conf     PNG File mongostat.png     File replicaset-configure.js    
Operating System: ALL
Participants:

 Description   

Immediately after migrating a mongodb replica set from 2.4.5 to 2.6.9 we observed an increased lock % on all members, even on secondaries that were only synchronizing without serving queries.
Our replica set is composed by 4 nodes -> 1 primary, 2 slaves and an hidden one. We do not use sharding. We rolled back all nodes except a slave in order to troubleshoot and diagnose the issue.
We attached iostats output and mongo mms (please consider interval 09-11 AM) graphs collected at the same time on the two slave nodes with different mongo versions.
As you can see the node with mongo 2.6 shows increased lock % and background flush avg. Also the amount of data written to the recovery log and database datafiles per second are much more bigger on mongo 2.6.



 Comments   
Comment by Ramon Fernandez Marina [ 28/Oct/15 ]

Hi mrizzo, apologies for the radio silence. We have been unable to reproduce this behavior on our end after trying in different environments, and we're not sure what may be different about your environment that causes the performance drop you report, so unfortunately there's not much more investigation we can do on our end.

That being said, MongoDB 3.0 includes numerous improvements over 2.6, including collection-level locking for MMAPv1, so I'd encourage you to try out 3.0.7 (the latest stable release at the time of this writing) and see if it performs better than 2.4/2.6 for your use case.

Regards,
Ramón.

Comment by Marzio Rizzo [ 15/Jun/15 ]

Hi ramon.fernandez,
for easily reproduce this behaviour you can proceed this way:

1. Install mongo-2.4.5 on all 3 nodes
2. Run "500k_populator.js" on the primary in order to populate db
3. Run "mongo test 500k_updates_benchRun.js" for load generation on the primary
4. Run mongostat on every node and see the results (in particular the number of updates and the locked db %)
5. Upgrade to mongo-2.6.9 all 3 nodes
6. Run "mongo test 500k_updates_benchRun.js" for load generation on the primary
7. Run mongostat on every node and see the results (in particular the number of updates and the locked db %)

We attached the mongo_replicaset_test_results.xlsx report with our results.
As you can see the nodes with mongo 2.4.5 have more updates/sec with less locked db % than the nodes with mongo 2.6.9.
It is normal this behaviour?

Comment by Marzio Rizzo [ 05/Jun/15 ]

Hi ramon.fernandez, could you please provide us the mongodb configuration that you're using during your tests?
It could be very usefull for us to understand how our and your results can be so different.
We attached our mongodb.conf and our replicaset configuration for comparison.

Thanks,
Marzio

Comment by Ramon Fernandez Marina [ 29/May/15 ]

mrizzo, I have not been able to reproduce this behavior on my end, and in fact I'm observing the opposite effect: in 2.4.5 the lock% hovers around 70% and I get around 75K updates/second, while in 2.6.9 the lock% hovers around 50% and I get around 155K updates/second.

I looked at the logs you sent and they just contain startup data, so not much information there that can help us understand what your replica set is doing. I also looked at your MMS graphs and I don't see anything strange happening from 9-11am (except for the lock% difference). While I do see the data from mongostat that you obtained I haven't been able to observe the same behavior.

Is there other information you may be able to provide that can be of help here? Are you using any other options for your {{mongod}}s like powerOf2Sizes, noprealloc or smallfiles by any chance? Anything else you can think of?

Thanks,
Ramón.

Comment by Ramon Fernandez Marina [ 28/May/15 ]

mrizzo, we haven't been able to reproduce this behavior as of yet but we continue to investigate. We'll post updates to this ticket when we have more information.

Cheers,
Ramón.

Comment by Marzio Rizzo [ 28/May/15 ]

Hi guys, do you need any further information/material? Did you find useful the attached inputs? Thanks, best regards

Comment by Marzio Rizzo [ 22/May/15 ]

We forgot to attach mongostat output in which you can see the lock % of all mongod during the load test (ended at 16:53:07).

Comment by Marzio Rizzo [ 22/May/15 ]

We attached the requested logs. Let us know if you need more details.
Thanks for your support.

Comment by Ramon Fernandez Marina [ 21/May/15 ]

Can you please upload the log files for the primary and the 2.6.9 secondary from startup until you run your load test? The "from startup" part is to see how are your mongod configured, but if the logs are too large you can upload the part corresponding to the load test and the configuration files used for each mongod.

Thanks,
Ramón.

Comment by Marzio Rizzo [ 19/May/15 ]

OS is Centos 6.5 64 bit with kernel 2.6.32-431.el6.x86_64.
We easily reproduced the issue on a test environment componsed by 3 physical nodes (a master and two slaves, one of the slaves runs mongodb 2.4.5, all other nodes run mongodb 2.6.9). Follow this steps:

  1. Run "mongo test 500k_populator.js" on the primary in order to populate db
  2. Run "mongo test 500k_updates_benchRun.js" for load generation on the primary
  3. Run mongostat on both slaves and see the "locked db" column. Values on 2.6.9 slave are at least double
  4. Other comparison can be made with running commands "iostat -xmt 1" or "sar -b 1 100"
Generated at Thu Feb 08 03:47:55 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.