[SERVER-28589] Reduce how often threads wait before flushing sizeStorer with WiredTiger Created: 03/Apr/17 Updated: 29/Jan/18 Resolved: 06/Nov/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Storage, WiredTiger |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Alexander Gorrod | Assignee: | Xiangyu Yao (Inactive) |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | bkp | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Backport Requested: |
v3.4
|
||||||||
| Sprint: | Storage 2017-11-13 | ||||||||
| Participants: | |||||||||
| Description |
|
With the current implementation of the size storer There is one thread in:
i.e: It holds the WiredTigerSizeStorer cursor lock, and is blocked waiting on space in the WiredTiger cache. There are 34 other threads waiting for the WiredTigerSizeStorer cursor lock:
So we are essentially serializing threads through size storer sync. The thread holding the lock is waiting for space in the cache - which may block for an extended time. Ideally in this case the 34 threads waiting for the lock would skip flushing size storer - they don't need it to be flushed for correctness. |
| Comments |
| Comment by Alexander Gorrod [ 26/Oct/17 ] | ||||||
|
Thanks for the detailed description xiangyu.yao I agree that the way the elapsed time tracker works makes it difficult to see a path where all threads could end up waiting. A possibility is that the timer is being checked a lot in some workloads. You say:
I believe what actually needs to happen is that intervalHasElapsed needs to be called 100,000 times. Is it possible that a path is calling that a surprising number of times? If you can rule that out via code inspection I'd be happy for this ticket to be closed. It's a long time since 3.2, and queued drops have had a number of changes since this ticket was opened. | ||||||
| Comment by Xiangyu Yao (Inactive) [ 26/Oct/17 ] | ||||||
|
I have some updates on this ticket. Here is what I discovered today: Experiment #1: I changed syncCache()'s lock call on cursorLock to be a tryLock instead. If the trylock fails, it means another thread is already syncing the cache,
I started the test with the newly compiled mongod and used mongostat to monitor the server. It seemed after a while, the 'cache used' stat in mongostat stayed Result: By checking the mongod log, I cannot see any occurrence of the log "Someone is I investigated further... The backtrace in the ticket's description indicates that syncSizeInfo() was called from WiredTigerKVEngine::haveDropsQueued(). We call syncSizeInfo either after it hasn't be called for 60 seconds or 100,000 "pings", whichever comes earlier. (numbers So I setup Experiment #2: Result: "Someone is doing the job" got logged a lot. (13640 occurrences in 5-minute period). This means that a lot of threads were skipping This left me confused as to how the situation in the ticket arose. If _hitsBetweenMarks is 100,000, what would have to happen for 34 threads to be blocked in syncSizeInfo() at the same time? I assume that the JournalFlusher is holding cursorLock. For the first thread to enter, all of the (presumably 34) threads would, in total, have to do 100,000 operations. Then that thread would become blocked, waiting for the cursorLock. The (presumably, 33) remaining threads would then have to do 100,000 more operations for another thread to get blocked. This would repeat until there was only one unblocked thread, which would have to do 100,000 operations by itself in order to call syncSizeInfo and also get blocked waiting for the cursorLock. (It's also possible that there were originally many more than 34 threads, and that the stack trace was taken only after the first 34 of them became blocked, but the situation would be similar). Either way, it seems like this scenario would take far too long to actually happen. Maybe on a much faster machine? I also did some other experiments and tried to use a benchmark to show the performance improvement but it seems stat such as throughput is not suitable here because the full cache bottlenecks it. But since Experiment #2 conceptually proves that it's possible that some threads would wait on this lock, I think a change would be worthy. | ||||||
| Comment by Alexander Gorrod [ 23/Oct/17 ] | ||||||
|
The key to reproducing this will be in putting enough pressure on the cache so that threads are waiting for space to be available. The most effective way to do that is to induce a disk bottleneck. Using a write load is the best way to do that, so I'd recommend:
Once you have setup the workload, look at the server statistics application thread time waiting for cache - it should be high if your workload is achieving the desired bottleneck. | ||||||
| Comment by Eric Milkie [ 03/Apr/17 ] | ||||||
|
Interestingly, the threads are blocked at the destruction of the OperationContext that was created to handle the request. At this point all message processing has finished and we are about to send back the response over the socket. The only reason we destruct the OperationContext first before sending back the data is because we want to update the CurrentOp data so that the operation no longer appears once the client has the response. I wonder if we could rearchitect this to push more of the heavy destructing until after the message is returned. |