[SERVER-3663] Mongod on windows performance degrades over time Created: 22/Aug/11  Updated: 17/Mar/16  Resolved: 17/Mar/16

Status: Closed
Project: Core Server
Component/s: Performance, Stability
Affects Version/s: 1.8.2, 2.0.0-rc0
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Spencer Brody (Inactive) Assignee: Unassigned
Resolution: Done Votes: 2
Labels: Windows
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Windows 64 bit
MongoDB 64 bit 1.8.2 and 2.0
Python 2.6.1
pymongo 2.0.1


Attachments: File manyConnectionsTest.py    
Operating System: Windows
Participants:

 Description   

Started mongod in windows and ran manyConnectionsTest.py from my mac in 3 different shells.
Read and write performance would initially fluctuate but eventually level off a couple minutes after all client threads were up and running. There would then periodically be periods of time where the read and write rate fluctuated wildly, before eventually re-stabilizing with worse performance. After a few of those, the periods of performance fluctuation would go away, and instead the performance would degrade gradually.

For one such run, performance initially stabilized at roughly 2000 reads/sec and 180 writes/sec. After a period of fluctuating read and write rates, it re-stabilized at 2000 reads/sec with only 128 writes/sec. After another burst of unstable performance, it re-stabilized at roughly 700 reads/sec and 83 writes/sec. Then there was no more great variability in performance - instead the write rate went slowly but steadily down as the read rate slowly went up. I checked in periodically and saw QPSs of 718 reads/sec w/ 61 writes/sec, 733 reads/sec w/ 46 writes/sec, 755 reads/sec w/ 23 writes/sec, and 763 reads/sec w/ 163 writes/sec. At around this point, performance suddenly fell away to zero.

At this point, the python processes running manyConnectionsTest froze, printing an error message saying the find_one operation timed out.

At this point, however, it was still possible to create new connections to the mongod, and it could process queries from those new connections.



 Comments   
Comment by Dwight Merriman [ 07/Jan/12 ]

for non-SRM, if we wrapped everything with a semaphore allowing say, 100 masximum concurrent actors, probably probably goes away.
this might be a good idea anyway : once all cores and disks are saturdated more interleaving is not helpful but rather hurtful for performance

Comment by Dwight Merriman [ 04/Sep/11 ]

this seems to be because of thread contention with ~1000 threads and slow writes. it then behaves poorly.

using SlimReaderWriterLock solves on windows. so we need to have a built that way.

also need to test on a couple of other platforms may happen elsewhere. if the rwlock is done by the OS then it is probably ok as the scheduler will be aware – so probably ok on Linux.

Comment by Eliot Horowitz (Inactive) [ 23/Aug/11 ]

@dwight - any ideas? if not - can re-assign for a deeper dive

Comment by Spencer Brody (Inactive) [ 22/Aug/11 ]

Tried to reproduce on linux and failed.

Generated at Thu Feb 08 03:03:41 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.