[SERVER-6004] Intensive reading/writing causes reader/writer starvation Created: 05/Jun/12 Updated: 10/Dec/14 Resolved: 24/Jan/14 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Performance, Stability |
| Affects Version/s: | 2.0.5 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Pierre Ynard | Assignee: | Ben Becker |
| Resolution: | Cannot Reproduce | Votes: | 1 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Linux x64, boost 1.41 |
||
| Attachments: |
|
| Operating System: | ALL |
| Participants: |
| Description |
|
While trying some new stuff on our sharded and replicated production cluster, involving heavy bursts of writing (e.g. 6000 writes/s), we saw a severe performance degradation: big replication lag, lots of timeouts on reads... We tried to reproduce the issue in tests and got results similar to We think that one possible cause is that the read-write locking is not fair. I've seen that the lock classes are made of layers of encapsulation around one of several read-write lock back ends: is there any special fairness logic that I missed, implemented in these mongodb layers? In our case, the back end used is the shared_mutex of boost. We've been experimenting some changes to it to improve its fairness and it gives some significant results on the behavior of mongodb as mentioned above. I've read other tickets related to this kind of issue ( |
| Comments |
| Comment by Ben Becker [ 24/Jan/14 ] |
|
Thanks for the feedback. I ran a similar workload in javascript using 16 concurrent shells, but I'm not able to reproduce the issue against v2.0.9. |
| Comment by Pierre Ynard [ 20/Jan/14 ] |
|
Hello, I don't really remember but I suppose so, why else would I have attached a test sample? Anyway I believe the problem is not relevant anymore since we migrated to 2.2 with the new locking logic. |
| Comment by Ben Becker [ 14/Jan/14 ] |
|
Hi Pierre, It seems that the attached script generates 100,000 documents, each with a random 'ii' key, and then generates queries for another 100,000,000 queries, each with a random 'ii' value between 0 and 100,000. This means that some queries should not find any documents, but exactly how many depends on the implementation of Random.next(). Does the attached script produce the reported behavior for you? |
| Comment by Eliot Horowitz (Inactive) [ 05/Jun/12 ] |
|
That doesn't totally make sense to me. |