[SERVER-21792] Performance regression with directio enabled on Windows with WiredTiger Created: 08/Dec/15  Updated: 08/Jan/24  Resolved: 15/Dec/15

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: 3.2.0
Fix Version/s: 3.2.1, 3.3.0

Type: Bug Priority: Major - P3
Reporter: Bruce Lucas (Inactive) Assignee: Mark Benvenuto
Resolution: Done Votes: 0
Labels: code-only
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File 3.0.7.png     PNG File 3.2.0-rc6-nodirectio.png     PNG File 3.2.0-rc6-perf.png     PNG File 3.2.0-rc6.png    
Issue Links:
Depends
Related
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Completed:
Sprint: Platforms E (01/08/16)
Participants:

 Description   
Issue Status as of Jan 05, 2016

ISSUE SUMMARY
In SERVER-20991, direct_io was enabled by default on Windows machines. This change reduced the memory consumption of mongod processes, but may cause a degradation in performance. This change to the default setting of direct_io was reverted in this ticket.

USER IMPACT
With direct_io enabled, when the WiredTiger cache becomes full, the insertion rate may drop significantly.

WORKAROUNDS
To correct the performance regression in MongoDB 3.2.0, start mongod with

--wiredTigerEngineConfigString="direct_io="

To maintain to the performance characteristics of MongoDB 3.2.0 in later versions, start mongod with

--wiredTigerEngineConfigString="direct_io=(data)"

This setting may also be configured in the mongod configuration file:

storage:
    wiredTiger:
        engineConfig:
            configString: direct_io=(data)

AFFECTED VERSIONS
MongoDB 3.2.0

FIX VERSION
The fix is included in the 3.2.1 production release.

Original description
  • AWS Instance - c4.4xlarge (16 CPU, 30GB RAM)
  • journal and data files configured on separate drives
  • 5 threads inserting small records in batches of 1000
  • default cache settings
  • syncdelay 10 (probably not relevant as checkpoints are very long anyway)
  • test run under rc6, same code as ga

3.0.7

3.2.0

  • initially 3.2.0 has somewhat higher insert rate
  • after cache fills 3.0.7 insert rate remains constant but 3.2.0 drops by about 5x, to about 25% of rate under 3.0.7

Averages insert rates during initial and cache full portions of run:

            initial   cache full
 
3.0.7       152 k/s    135 k/s
3.2.0       178 k/s     33 k/s



 Comments   
Comment by Mark Benvenuto [ 19/Jan/16 ]

The Yaml configuration setting is hidden:

storage:
    wiredTiger:
        engineConfig:
            configString: direct_io=(data)

Comment by Michael Dolinsky [ 18/Jan/16 ]

How can we configure this setting: wiredTigerEngineConfigString="direct_io=(data)" in the cfg file?

Comment by Nick Judson [ 22/Dec/15 ]

@Michael Dolinsky - the original fix (which this reverts) was to fix that exact problem. Running with direct_io gives poor query performance, running without direct_io will cause all the system memory to be consumed. In 3.2, this wasn't the case (for my workload). I had good query performance and constant memory usage. I didn't have any regressions as described above.

Now it sounds like the caching strategy for windows may be revisited for the next major release.

Comment by Michael Dolinsky [ 20/Dec/15 ]

So with 3.2.1 if there are a lot of writes (more then the system can handle) essentially the system cache will take up all memory and would cause an OS issue / MongoDB issue
Have you run this stress test to see this for yourself?

Comment by Daniel Pasette (Inactive) [ 17/Dec/15 ]

The nightly development releases can be found here:
https://www.mongodb.org/downloads/#development

From there, you have to select your os version and mongodb version

Comment by Michael Dolinsky [ 16/Dec/15 ]

Where can we download the 3.2.1 bits to test this?

Comment by Mark Benvenuto [ 16/Dec/15 ]

In 3.2.1, the old behavior which reduces memory usage, but has a performance regression in some scenarios can be achieved as followed: --wiredTigerEngineConfigString="direct_io=(data)".

Comment by Mark Benvenuto [ 15/Dec/15 ]

Master: https://github.com/mongodb/mongo/commit/439a56d7af3ce4bad983f5829b3485bb0af7f6c3

v3.2: https://github.com/mongodb/mongo/commit/c7b065227470a27c40c45f07a8c967b7aa7af9db

Comment by Ramon Fernandez Marina [ 10/Dec/15 ]

stephen.jannin@sgcib.com, your statement is incorrect: SERVER-19795 is not a memory leak. As per the documentation:

The storage.wiredTiger.engineConfig.cacheSizeGB only limits the size of the WiredTiger cache, not the total amount of memory used by mongod. The WiredTiger cache is only one component of the RAM used by MongoDB. MongoDB also automatically uses all free memory on the machine via the filesystem cache (data in the filesystem cache is compressed).

In addition, the operating system will use any free RAM to buffer filesystem blocks.

To accommodate the additional consumers of RAM, you may have to decrease WiredTiger cache size. Avoid increasing the WiredTiger cache size above its default value.

Comment by Stephen JANNIN [ 10/Dec/15 ]

Warning, if you use direct_io, you have very poor performances on windows.
If you don't use direct_io, you have a severe memory leak that has never been fixed. See https://jira.mongodb.org/browse/SERVER-19795

Comment by Bruce Lucas (Inactive) [ 09/Dec/15 ]

A run with Windows performance counters, with default settings (direct i/o enabled).

  • performance drops at A
  • coinciding with this we see memory starting to be paged out
  • but, as expected, system "cache bytes" is small because mongod is doing direct i/o instead of going through the system file cache, and available memory is still about half - so why the paging activity?
  • no significant paging in, presumably because we don't actually need the stuff that's been paged out or because it hasn't actually been evicted from memory (since there's no memory pressure)

Theory: when memory usage reaches a certain point, the o/s starts spilling memory to disk proactively (not in response to memory pressure as such), and for some reason tbd this in itself has a negative performance impact. Note that the following do not appear to be the reason for the negative performance impact:

  • having to page in the memory that has been paged out - we don't see page read activity
  • i/o contention at the volume level with mongod writes - pagefile, data, and journal are all on separately provisioned volumes.

Why do we not see this effect with direct i/o disabled? We're doing a run now with direct i/o disabled to see how the stats differ. Theory: as long as there is dirty file data in the file cache Windows preferentially writes that instead of paging process memory out, so by in effect disabling the file cache we've given Windows the green light to page process memory out, and that for reasons tbd is having a negative performance impact.

Comment by Githook User [ 09/Dec/15 ]

Author:

{u'name': u'Ramon Fernandez', u'email': u'ramon@mongodb.com'}

Message: DOCS-6758 Add SERVER-21792 to errata

Signed-off-by: kay <kay.kim@10gen.com>
Branch: master
https://github.com/mongodb/docs/commit/a3b4bee303ff56480524fb23bed37e93949e5f66

Comment by Bruce Lucas (Inactive) [ 08/Dec/15 ]

This seems to be a result using direct i/o, which was enabled for Windows in rc3 under SERVER-20991; here's a comparable run with direct i/o disabled. Initial throughput is the same as or better than initial throughput with direct i/o enabled, and remains consistent throughout the run.

Adding this result to the table above:

                       initial   cache full
 
3.0.7                  152 k/s    135 k/s
3.2.0, default         178 k/s     33 k/s
3.2.0, no direct i/o   186 k/s    187 k/s

Direct i/o can be disabled by specifying --wiredTigerEngineConfigString "direct_io=()" (or equivalent config file option), and this should be a useful workaround for now.

Generated at Thu Feb 08 03:58:26 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.