[SERVER-12886] Make non-fastmod updates perform work proportional to the update size Created: 25/Feb/14  Updated: 06/Dec/22  Resolved: 14/Sep/18

Status: Closed
Project: Core Server
Component/s: MMAPv1, Performance, Write Ops
Affects Version/s: 2.6.0-rc0
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: James Blackburn Assignee: Backlog - Storage Execution Team
Resolution: Won't Fix Votes: 2
Labels: mmapv1, pull-request
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File mongostat.py     Text File perf-after.txt     Text File perf-before.txt     File version_store.py    
Issue Links:
Related
Assigned Teams:
Storage Execution
Participants:
Case:

 Description   

Currently non-fastmod updates generate I/O proportional to the size of the document (which is pathological for large documents and multi-updates). See the linked and other related tickets to high write I/O cause by update amplification.

A proposed fix is two-fold:

  1. Only write back dirty pages (i.e. implement SERVER-66)
  2. Move Object / Array types which change size to the end of their parent Document
  1. should ensure that we minimize I/O at little cost to clients
  2. should ensure that we don't constantly re-write unchanged fields in a document.

I believe 2 is OK, because:

  • drivers use HashMaps (unless using an explicit SON type) which doesn't guarantee order
  • Mongo re-orders fields when moving an expanded document

This is related, but different, to:
SERVER-12732
SERVER-12578

I've attempted implementations of 1 + 2 which seem to give 2 orders of magnitude improvement to performance of the benchmarks on the linked ticket:

  1. https://github.com/jamesblackburn/mongo/commit/54b5f6b56374630e4a872021e424e6c476c02cff
  2. https://github.com/jamesblackburn/mongo/commit/54b5f6b56374630e4a872021e424e6c476c02cff


 Comments   
Comment by Benety Goh [ 16/Jun/14 ]

Hi James,

We've taken another look at your pull request. Unfortunately, due to recent changes in the storage layer, your commit no longer rebases cleanly on top of master (commit 82fadf3). Even after resolving the merge conflicts, we encountered a crash in the dbtest test suite:

Command line for building and running test:

scons --dbg=on --opt=on dbtest && ./dbtest

Test output

2014-06-16T10:29:47.828-0400 [testsuite] Fatal Assertion 16199
2014-06-16T10:29:47.836-0400 [testsuite] 
 0x100b8841a 0x100b377bd 0x100b24558 0x1003ff9a3 0x1004034d1 0x10070064a 0x1006fd714 0x10070b567 0x100701e4b 0x10070cf9a 0x10068dc1a 0x10068ad5d 0x100690e7a 0x10041f0e4 0x10041eddf 0x1003a5a1d 0x1003a553a 0x100b1f1c6 0x100b20632 0x10095aed0
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"100000000","o":"B8841A"},{"b":"100000000","o":"B377BD"},{"b":"100000000","o":"B24558"},{"b":"100000000","o":"3FF9A3"},{"b":"100000000","o":"4034D1"},{"b":"100000000","o":"70064A"},{"b":"100000000","o":"6FD714"},{"b":"100000000","o":"70B567"},{"b":"100000000","o":"701E4B"},{"b":"100000000","o":"70CF9A"},{"b":"100000000","o":"68DC1A"},{"b":"100000000","o":"68AD5D"},{"b":"100000000","o":"690E7A"},{"b":"100000000","o":"41F0E4"},{"b":"100000000","o":"41EDDF"},{"b":"100000000","o":"3A5A1D"},{"b":"100000000","o":"3A553A"},{"b":"100000000","o":"B1F1C6"},{"b":"100000000","o":"B20632"},{"b":"100000000","o":"95AED0"}],"processInfo":{ "mongodbVersion" : "2.7.3-pre-", "gitVersion" : "17e4397b6d1d97c64abe8a5eada47702b4763c5d", "uname" : { "sysname" : "Darwin", "release" : "13.1.0", "version" : "Darwin Kernel Version 13.1.0: Wed Apr  2 23:52:02 PDT 2014; root:xnu-2422.92.1~2/RELEASE_X86_64", "machine" : "x86_64" }, "somap" : [ { "path" : "/Users/benety/src/mongodb/mongo/./dbtest", "machType" : 2, "b" : "100000000", "buildId" : "BCC1F559729533A5859261CCFE34706C" }, { "path" : "/usr/lib/libSystem.B.dylib", "machType" : 6, "b" : "7FFF8AB88000", "buildId" : "BFC0DC9746C63BE0998354A98734897A" }, { "path" : "/usr/lib/libstdc++.6.dylib", "machType" : 6, "b" : "7FFF8554B000", "buildId" : "0241E6A4136833BE950BD0A175C41F54" }, { "path" : "/usr/lib/system/libcache.dylib", "machType" : 6, "b" : "7FFF8DF53000", "buildId" : "BDC1E65B72A13DA3A57CB23159CAAD0B" }, { "path" : "/usr/lib/system/libcommonCrypto.dylib", "machType" : 6, "b" : "7FFF86E91000", "buildId" : "8C4F0CA0389C3EDCB155E62DD2187E1D" }, { "path" : "/usr/lib/system/libcompiler_rt.dylib", "machType" : 6, "b" : "7FFF8D79C000", "buildId" : "4CD916B21B17362AB403EF24A1DAC141" }, { "path" : "/usr/lib/system/libcopyfile.dylib", "machType" : 6, "b" : "7FFF8EBCE000", "buildId" : "5A881779D0D63029B371E3021C2DDA5E" }, { "path" : "/usr/lib/system/libcorecrypto.dylib", "machType" : 6, "b" : "7FFF91E4C000", "buildId" : "F3973C2814B63006BB2B00DD7F09ABC7" }, { "path" : "/usr/lib/system/libdispatch.dylib", "machType" : 6, "b" : "7FFF90618000", "buildId" : "F3CBFE1BFCE83F33A53D9092AB382DBB" }, { "path" : "/usr/lib/system/libdyld.dylib", "machType" : 6, "b" : "7FFF876C3000", "buildId" : "CF03004F58E43BB6B3FDBE4E05F128A0" }, { "path" : "/usr/lib/system/libkeymgr.dylib", "machType" : 6, "b" : "7FFF884DA000", "buildId" : "3AA8D85DCF003BD3A5A0E28E1A32A6D8" }, { "path" : "/usr/lib/system/liblaunch.dylib", "machType" : 6, "b" : "7FFF86196000", "buildId" : "38D1AB2CA476385F8EA87AB604CA1F89" }, { "path" : "/usr/lib/system/libmacho.dylib", "machType" : 6, "b" : "7FFF85AA4000", "buildId" : "1D2910DFC0363A82A3FD44FF73B5FF9B" }, { "path" : "/usr/lib/system/libquarantine.dylib", "machType" : 6, "b" : "7FFF8E8A7000", "buildId" : "7A1A2BCBC03D3A25BFA43E569B2D2C38" }, { "path" : "/usr/lib/system/libremovefile.dylib", "machType" : 6, "b" : "7FFF8AF97000", "buildId" : "3543F917928E3DB2A2F47AB73B4970EF" }, { "path" : "/usr/lib/system/libsystem_asl.dylib", "machType" : 6, "b" : "7FFF8C838000", "buildId" : "655FB34352CF3E2FB14DBEBF5AAEF94D" }, { "path" : "/usr/lib/system/libsystem_blocks.dylib", "machType" : 6, "b" : "7FFF8AB36000", "buildId" : "FB856CD12AEA39078E9B1E54B6827F82" }, { "path" : "/usr/lib/system/libsystem_c.dylib", "machType" : 6, "b" : "7FFF8AE53000", "buildId" : "6FD3A4004BB23B95B90CBE6E9D0D78FA" }, { "path" : "/usr/lib/system/libsystem_configuration.dylib", "machType" : 6, "b" : "7FFF86D4F000", "buildId" : "B51C8C22C45536AC952DA319B6545884" }, { "path" : "/usr/lib/system/libsystem_dnssd.dylib", "machType" : 6, "b" : "7FFF90709000", "buildId" : "A0B7CF19D9F233D48107A62184C9066E" }, { "path" : "/usr/lib/system/libsystem_info.dylib", "machType" : 6, "b" : "7FFF8F246000", "buildId" : "7D41A156D2853849A2C3C04ADE797D98" }, { "path" : "/usr/lib/system/libsystem_kernel.dylib", "machType" : 6, "b" : "7FFF8F461000", "buildId" : "3F6499637FA132018FF68438A52B9973" }, { "path" : "/usr/lib/system/libsystem_m.dylib", "machType" : 6, "b" : "7FFF85DA0000", "buildId" : "B7F0E2E4277733FCA787D6430B630D54" }, { "path" : "/usr/lib/system/libsystem_malloc.dylib", "machType" : 6, "b" : "7FFF8DF9B000", "buildId" : "A695B4E438E9332EA77229D31E3F1385" }, { "path" : "/usr/lib/system/libsystem_network.dylib", "machType" : 6, "b" : "7FFF91849000", "buildId" : "8B1E1F1DA5CC3BAE8B1EABC84337A364" }, { "path" : "/usr/lib/system/libsystem_notify.dylib", "machType" : 6, "b" : "7FFF9060E000", "buildId" : "52571EC3689437E4946E064B021ED44E" }, { "path" : "/usr/lib/system/libsystem_platform.dylib", "machType" : 6, "b" : "7FFF853AA000", "buildId" : "3C3D3DA832B9324398ECD89B9A1670B3" }, { "path" : "/usr/lib/system/libsystem_pthread.dylib", "machType" : 6, "b" : "7FFF8EB85000", "buildId" : "AB498556B555310E9041F67EC9E00E2C" }, { "path" : "/usr/lib/system/libsystem_sandbox.dylib", "machType" : 6, "b" : "7FFF8D79A000", "buildId" : "5E5A6E0933A9391AAB34E57D93BB1551" }, { "path" : "/usr/lib/system/libsystem_stats.dylib", "machType" : 6, "b" : "7FFF884DB000", "buildId" : "1A55AF8AB6C43163B5573AD25DA643A8" }, { "path" : "/usr/lib/system/libunc.dylib", "machType" : 6, "b" : "7FFF8DF58000", "buildId" : "62682455186236FE8A047A6B91256438" }, { "path" : "/usr/lib/system/libunwind.dylib", "machType" : 6, "b" : "7FFF8EB52000", "buildId" : "78DCC3582FC1302EB3950155B47CB547" }, { "path" : "/usr/lib/system/libxpc.dylib", "machType" : 6, "b" : "7FFF91E12000", "buildId" : "AB40CD57F4543FD4B41563B3C0D5C624" }, { "path" : "/usr/lib/libobjc.A.dylib", "machType" : 6, "b" : "7FFF87DF0000", "buildId" : "AD7FD984271E30F4A3616B20319EC73B" }, { "path" : "/usr/lib/libauto.dylib", "machType" : 6, "b" : "7FFF8AB38000", "buildId" : "F45C36E8B6063886B5B1B6745E757CA8" }, { "path" : "/usr/lib/libc++abi.dylib", "machType" : 6, "b" : "7FFF89C74000", "buildId" : "21A807D367323455B77F743E9F916DF0" }, { "path" : "/usr/lib/libc++.1.dylib", "machType" : 6, "b" : "7FFF86C2F000", "buildId" : "4F68DFC5207739A8A449CAC5FDEE7BDE" }, { "path" : "/usr/lib/libDiagnosticMessagesClient.dylib", "machType" : 6, "b" : "7FFF897B4000", "buildId" : "4CDB0F7BC0AF3424BC39495696F0DB1E" } ] }}
 dbtest(_ZN5mongo15printStackTraceERSo+0x3A) [0x100b8841a]
 dbtest(_ZN5mongo10logContextEPKc+0xCD) [0x100b377bd]
 dbtest(_ZN5mongo13fassertFailedEi+0xA8) [0x100b24558]
 dbtest(_ZN5mongo11mutablebson7Element8addChildES1_b+0x433) [0x1003ff9a3]
 dbtest(_ZN5mongo11mutablebson7Element8pushBackES1_+0x11) [0x1004034d1]
 dbtest(_ZN5mongo11pathsupport12createPathAtERKNS_8FieldRefEmNS_11mutablebson7ElementES5_+0x9AA) [0x10070064a]
 dbtest(_ZNK5mongo11ModifierSet5applyEv+0x2A4) [0x1006fd714]
 dbtest(_ZN5mongo12UpdateDriver6updateERKNS_10StringDataEPNS_11mutablebson8DocumentEPNS_7BSONObjEPNS_11FieldRefSetE+0x2A7) [0x10070b567]
 dbtest(_ZN5mongo6updateEPNS_16OperationContextEPNS_8DatabaseERKNS_13UpdateRequestEPNS_7OpDebugEPNS_12UpdateDriverEPNS_14CanonicalQueryE+0x12CB) [0x100701e4b]
 dbtest(_ZN5mongo14UpdateExecutor7executeEPNS_16OperationContextEPNS_8DatabaseE+0x9A) [0x10070cf9a]
 dbtest(_ZN5mongo14receivedUpdateEPNS_16OperationContextERNS_7MessageERNS_5CurOpE+0x61A) [0x10068dc1a]
 dbtest(_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0x236D) [0x10068ad5d]
 dbtest(_ZN5mongo14DBDirectClient3sayERNS_7MessageEbPSs+0x9A) [0x100690e7a]
 dbtest(_ZN5mongo12DBClientBase6updateERKSsNS_5QueryENS_7BSONObjEi+0x244) [0x10041f0e4]
 dbtest(_ZN5mongo12DBClientBase6updateERKSsNS_5QueryENS_7BSONObjEbb+0x6F) [0x10041eddf]
 dbtest(_ZN11UpdateTests28TwoModsWithinDuplicatedField3runEv+0x4AD) [0x1003a5a1d]
 dbtest(_ZN5mongo8unittest5Suite13runTestObjectIN11UpdateTests28TwoModsWithinDuplicatedFieldEEEvv+0x1A) [0x1003a553a]
 dbtest(_ZN5mongo8unittest5Suite3runERKSsi+0x696) [0x100b1f1c6]
 dbtest(_ZN5mongo8unittest5Suite3runERKSt6vectorISsSaISsEERKSsi+0x7B2) [0x100b20632]
 dbtest(_ZN5mongo7dbtests10runDbTestsEiPPc+0xA0) [0x10095aed0]
-----  END BACKTRACE  -----

We are still interested in the changes proposed in the pull request and will keep this SERVER ticket open. The existing pull request #646 will be closed. A new pull request rebased on a more recent commit would have to be submitted that does not contain the re-ordering logic.

Regards,
Ben

Comment by James Blackburn [ 03/Mar/14 ]

Thanks for taking the time to review Andrew!

Regarding your points:

  1. I hadn't realised SERVER-2592 had been fixed. It's a laudable goal. That said I think re-ordering a document (or sub-document's) element's when it is resized isn't too big a deal. As the range query, and unique index concerns in SERVER-2592 relate to SON document equality which doesn't make sense to me in the presence of resize-able fields.
    In our schema, and I suspect this is generally true, people use hash maps rather than SON objects in the application so we haven't up till now had control of the field ordering.
    If it's a requirement to not optimize document ordering, we could:
    1. Only apply this optimisation to 'larger' documents size > 2 pages, say
    2. Or add a collection level config option allowFieldReorder (like usePowerOf2Sizes)
  2. The original patch had a TODO for page alignment of the destination. I was assuming that there would be few changed (damaged) sections in the document which would lead to 2*N additional damaged pages, and this would be dwarfed by the total size of the changed range. I guess for lots of small updates this could be large. I'll put a fix in to reduce this overhead.

Would you be willing to re-run your benchmarks without the document reordering behavior, and post your results?

Removing the re-order breaks the optimization and returns the performance to the baseline. This is straightforward to see in the attached version_store.py: by moving the parent field above or below the 4MB data. As the 4MB chunk is made or random data, having the parent array before the data causes a full-rewrite, whereas having it after causes just the last page to be re-written.

I think not optimising in the DB will require people to know / decide in advance which fields are likely to change, and forces them to use SON objects with fixed ordering - arrays, and nested documents at the end. This exposes an implementation detail: changing the size of an early field causes the remainder of the document to be re-written. Adding additional config options would also seem unnecessary if there really isn't a use-case for switching this option off.

Comment by Andrew Morrow (Inactive) [ 28/Feb/14 ]

Hi -

Thank you for the pull requests. We are currently in code freeze for the upcoming 2.6.0 release, so this change (or something like it) cannot be merged until the master branch re-opens for 2.7 changes and the 2.6 branch opens for 2.6.1 changes.

Here are some comments on your submission:

  • For the change to mutable bson, we actually made it an explicit goal of the 2.6 release that fields were not re-ordered. So we would need to consider carefully whether or not we wanted to back away from that goal in a subsequent release. See SERVER-2592 for additional details. That said, your change is quite interesting and looks correct, and the way the move to back bubbles up is interesting.
  • For the memory writing changes, these are interesting as well, and reducing memory traffic for updates can only be a good thing. You should know that records are not necessarily page-aligned, so your page-stepping algorithm might be improved by a preamble in which the data up to the first page boundary is handled separately.

Would you be willing to re-run your benchmarks without the document reordering behavior, and post your results?

Overall, these are good ideas and we appreciate your taking the time to share your implementation of them with us.

Thanks,
Andrew

Comment by James Blackburn [ 28/Feb/14 ]

Running the python mongostat.py --journal script, you see:

MongoDB:

insert  query update delete getmore command flushes mapped  vsize    res faults      locked db idx miss %     qr|qw   ar|aw  netIn netOut  conn       time 
     1      1      1     *0       0     2|0       0  1.03g  2.87g   905m      0 vs_test:147.0%          0       1|0     0|1   838b     6k     3   11:55:20 
    *0      1      1      2       0     3|0       0  1.03g  2.87g   820m      0 vs_test:160.8%          0       1|0     0|1   993b     7k     3   11:55:22 
     1      1      1     *0       0     2|0       0  1.03g  2.87g   855m      0 vs_test:177.3%          0       0|1     0|1   902b     9k     3   11:55:23 
    *0     *0     *0      2       0     2|0       0  1.03g  2.87g   804m      0 vs_test:108.7%          0       0|0     0|1   291b     6k     3   11:55:24 
 
commits   journaledMB   writeToDataFilesMB   compression   commitsInWriteLock   earlyCommits   dt   prepLogBuffer   writeToJournal   writeToDataFiles   remapPrivateView   time    
6         94.208        483.049056           0.19502712262 0                    0              3336 175             2279             171                29                 11:55:20
6         94.208        483.049056           0.19502712262 0                    0              3336 175             2279             171                29                 11:55:22
6         94.208        483.049056           0.19502712262 0                    0              3336 175             2279             171                29                 11:55:23
6         86.286336     442.367874           0.19505497043 0                    0              3027 160             2109             131                29                 11:55:24

SERVER-12886:

insert  query update delete getmore command flushes mapped  vsize    res faults      locked db idx miss %     qr|qw   ar|aw  netIn netOut  conn       time 
     6     13     13     14       0     3|0       0  1.03g  2.85g   890m      0 vs_test:128.0%          0       0|0     0|1    10k    13k     3   11:53:24 
     6     12     12     12       0     3|0       0  1.03g  2.85g   844m      0 vs_test:133.0%          0       0|0     0|1     9k    12k     3   11:53:25 
     6     12     12     12       0     2|0       0  1.03g  2.85g   854m      0 vs_test:143.9%          0       0|0     0|1     9k    12k     3   11:53:26 
     7     14     14     14       0     3|0       0  1.03g  2.85g   896m      0 vs_test:126.6%          0       0|1     0|1    10k    13k     3   11:53:27 
 
commits   journaledMB   writeToDataFilesMB   compression   commitsInWriteLock   earlyCommits   dt   prepLogBuffer   writeToJournal   writeToDataFiles   remapPrivateView   time    
25        1.982464      8.976778             0.2199573569  0                    0              3010 3               68               3                  8                  11:53:24
25        1.982464      8.976778             0.2199573569  0                    0              3010 3               68               3                  8                  11:53:25
26        2.121728      9.71555              0.21751320598 0                    0              3041 3               69               4                  17                 11:53:26
26        2.121728      9.71555              0.21751320598 0                    0              3041 3               69               4                  17                 11:53:27

You see up to 46 operations happening per second vs 2-4.

Comment by James Blackburn [ 28/Feb/14 ]

Benchmark with 4MB documents:

python ~/workspaces/scratch/mongo/CS-8941/version_store.py
Mongo:         TPS: avg: 0.377610404199 curr: 0.673707800341 cum: 25
Toku:          TPS: avg: 0.540124653158 curr: 0.904476402403 cum: 32
SERVER-12886:  TPS: avg: 4.48938135632  curr: 4.86169965405  cum: 32
 
 
python ~/workspaces/scratch/mongo/CS-8941/version_store.py --index-parent
Mongo:         TPS: avg: 0.38500711589 curr: 0.550002688183 cum: 32
Toku:          TPS: avg: 0.33341550285 curr: 0.865195624416 cum: 32
SERVER-12886:  TPS: avg: 5.99320924995 curr: 7.00250575204  cum: 35

In this benchmark this patch makes MongoDB 10-15x faster for updates than the default build and Toku build.

Comment by James Blackburn [ 27/Feb/14 ]

Attached is a python mongostat which can be used for printing journal and DB files write stats

python mongostat.py --journal

These changes also improves the sysbench benchmark by 50% for large documents:
https://github.com/tmcallaghan/sysbench-mongodb/pull/2

(and doesn't seem to penalize small document updates).

Comment by James Blackburn [ 26/Feb/14 ]

Added a performance test demonstrating the difference.

Before the changes altering an indexed field, or an array field causes up to 10MB of I/O.
After the change I/O is 1-30k.

As well as reducing I/O this appears to reduce time in the writeLock by ~30%.

Pull request submitted:
https://github.com/mongodb/mongo/pull/646

Comment by James Blackburn [ 26/Feb/14 ]

Patches related to this change are in the following branch:
https://github.com/jamesblackburn/mongo/tree/SERVER-12886

Difference to master:
https://github.com/jamesblackburn/mongo/compare/SERVER-12886

Generated at Thu Feb 08 03:29:56 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.