[SERVER-1966] Don't lock reads on background flush (MemoryMappedFile::flushAll) Created: 18/Oct/10  Updated: 30/Mar/12  Resolved: 10/Sep/11

Status: Closed
Project: Core Server
Component/s: Concurrency
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Scott Hernandez (Inactive) Assignee: Eliot Horowitz (Inactive)
Resolution: Cannot Reproduce Votes: 5
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Participants:

 Description   

It seems like when the periodic flush (syncdelay) happens that reads are slowed done (because of locking for the flush).



 Comments   
Comment by Eliot Horowitz (Inactive) [ 10/Sep/11 ]

can't reproduce - a test case would help

Comment by Eliot Horowitz (Inactive) [ 27/Oct/10 ]

Yes - once a write is blocking - then reads pile up behind it

Comment by Benedikt Waldvogel [ 27/Oct/10 ]

Okay, but isn't it true that reads are blocked once the writes are blocked?

Comment by Eliot Horowitz (Inactive) [ 27/Oct/10 ]

Not entirely that's what happened - but one thing to try is decreasing --syncdelay.
It could increase overall disk load - but may make things smoother.

If the sync takes longer than snycdelay, it will just keep syncing (still 1 thread at a time)

Comment by Benedikt Waldvogel [ 26/Oct/10 ]

Actually I was the guy who spoke to Scott and mentioned the locked reads while a flush was happening.
In the meantime I invested some more time to investigate the problem. I found out that the background flush wasn't really reason, though it could be part of the problem.

I'm running a benchmark that inserts/updates items in a collection as fast as possible (roughly 5-10mb/s). At the same time a couple of threads regularly fetch documents and measure the latency.
I carefully monitor the machines. The machines have enough RAM to hold the entire collection in memory.

My assumption was that the reads will never ever block (ie. have a low latency) since there's no swapping. But actually, this assumption was wrong. I had latency spikes with over 1s delay. My inserts and updates caused so many dirty pages that the OS started writing the blocks to the disk. However, the disks are slower than the inserts and updates so the total sum of dirty pages (see /proc/meminfo) exceeded the dirty_ratio limit. The OS then blocked writes and since they hold the rw locks, the reads were also blocked. Or is this implication wrong here?

Is this a known issue? And what is the best way to avoid that problem? (rate-limiting the updates or setting dirty_ratio to 100%?)

BTW: what happens if the background sync takes for instance 6s but the syncdelay is set to 5s?

Comment by Eliot Horowitz (Inactive) [ 19/Oct/10 ]

This should not happen.
The flushes are totally in the background.
If you've seen - please send log files to correspond.

Generated at Thu Feb 08 02:58:35 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.