[SERVER-17495] Stand alone mongod throughout dropped by 40% after 17 hours of insert only workload Created: 06/Mar/15  Updated: 19/Jan/17  Resolved: 19/Jan/17

Status: Closed
Project: Core Server
Component/s: Storage, WiredTiger
Affects Version/s: 3.0.0-rc11, 3.1.6
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: Eitan Klein Assignee: DO NOT USE - Backlog - Platform Team
Resolution: Incomplete Votes: 1
Labels: 28qa
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File 317profiler.png     PNG File Process_IO_Data_Operations_sec0.png     PNG File Process_IO_Write_Operations_sec0.png     PNG File Processor_Information_Percent_Processor_Time0.png     HTML File Re httpsjira.mongodb.orgbrowseSERVER-17495.htm     Text File ServerStats.txt     Microsoft Word mongodb_08172211.csv     Microsoft Word perf_test_data.csv     JPEG File profile-output.jpg     PNG File repro-317.png     PNG File repro-317.png     Text File stat.txt    
Issue Links:
Depends
Duplicate
Related
related to SERVER-18079 Large performance drop with documents... Closed
is related to SERVER-17421 WiredTiger b-tree uses much more memo... Closed
is related to SERVER-17424 WiredTiger uses substantially more me... Closed
Backwards Compatibility: Fully Compatible
Operating System: Windows
Sprint: Platform 4 06/05/15, Platform 5 06/26/16, Platform 6 07/17/15, Platform 7 08/10/15
Participants:

 Description   

Version –
Mongod RC11binaries

Environment:

• Single mongod with wiredtiger as storage engine
• Windows 2012
• WriedTiger configured with 1GB cache
• EC2 machine c3.large

Workload:

• Used hammer.mongo to do insert only workload
• The machine throughput dropped by 40% after 17 hours of execution

Will continue and investigate this issue

At this stage

  • This is the 2nd repro of the same problem, when I analyzed the dump, I found a huge internal heap fragmentation (see more details on the comment)
  • No indication that client stress tool slow down (I observed steady amount of incoming TCP connection per/sec and TCP connection established remained stable)

Next plan – Keep it running to death and debug it when the throughput is significant low.



 Comments   
Comment by Ramon Fernandez Marina [ 19/Jan/17 ]

The symptoms in this ticket are very similar to known cache eviction related issues in WiredTiger that we've been addressing in MongoDB 3.2 and 3.4.

Since the testing in this ticket was conducted MondoDB has released three major versions and numerous patch releases, so I believe that the best way forward is to resolve this ticket now – users running into similar behaviors are welcome to create new tickets, and the diagnostic data collection feature introduced in MongoDB should help us determine if there are still bugs around this area of the product.

Thanks,
Ramón.

Comment by Alessandro Gherardi [ 19/Jan/17 ]

Was this issue fixed in 3.4.1? If not, what's the target release. Thanks.

Comment by Nick Judson [ 21/Sep/16 ]

Last I heard it might be fixed in 3.4, although I haven't checked.

Comment by Alessandro Gherardi [ 21/Sep/16 ]

It looks like all related tickets have been closed or resolved. I guess that means that memory management on Windows has been straightened out.

So can this issue be closed? It's been open for 1 year and a half.

Comment by Nick Judson [ 18/Dec/15 ]

It's a lot better. I'm guessing it will be fixed once the memory management on Windows is straightened out.

Comment by Alessandro Gherardi [ 18/Dec/15 ]

Is this still an issue in 3.2.0/3.2.1?

Comment by Eitan Klein [ 18/Aug/15 ]

adding CPU profiler output

Comment by Eitan Klein [ 18/Aug/15 ]

add missing title top for left graph

Comment by Eitan Klein [ 18/Aug/15 ]

This issue also observed with

db version v3.1.7-pre-
git version: afd0f15913e95c5e530f25272e60254770350c89

Environment: Same setup, machine EC2 I2 machine (running w/o the known windows timer issues SERVER-18613)

On this case, it's appear that increase in latency over time (see client view on this issue) is responsible for the performance drop, latency started to raise after 8 hours of execution. I will attached profiler

Comment by Michael Cahill (Inactive) [ 26/Jun/15 ]

eitan.klein, fixes are being tracked in SERVER-18875: can you please retest against the code review linked there (https://mongodbcr.appspot.com/6570003/)? We expect some variant of those changes to make 3.1.6 and maybe 3.1.5 (martin.bligh?)

Comment by Eitan Klein [ 25/Jun/15 ]

michael.cahill I initially thought this is not a release blocker but the most recent results seems that performance dropped in min see compression (we looks worse then 3.0.0 from this prescriptive)
https://docs.google.com/document/d/1hPgLeTxnWLNVtQDusfaRXjd_ovnC9WU3g2C2D1ScQm0/edit

Comment by Eitan Klein [ 24/Jun/15 ]

michael.cahill See data point from today, related to this issue.
I don't think it's a 3.1.6 release blocker but I think it's tight to memory issues I observed w/ the tcmalloc that has not been addressed (we solved only the part related to WT heap, the other heap used the windows heap)

acm should have the next steps.

https://docs.google.com/a/10gen.com/document/d/1UqJxjCBBEi1EvpjuIY1ze677r8PdnSfy7LfxN_flv_c/edit?usp=sharing

Comment by Eitan Klein [ 21/Apr/15 ]

The windows heap contention duplicated over many different tickets, I'm closing this one a duplicate of others that become the primary SERVER-18079

Will re-open if this issue will repro after the fix

Comment by Nick Judson [ 16/Mar/15 ]

Possible dup: SERVER-17386

Generated at Thu Feb 08 03:44:41 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.