[SERVER-7973] Mongo should be able to reply to client requests for data in RAM while flushing to disk Created: 18/Dec/12  Updated: 13/Jul/18  Resolved: 22/Nov/13

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 2.5.5

Type: Bug Priority: Major - P3
Reporter: Mark porter Assignee: Eliot Horowitz (Inactive)
Resolution: Done Votes: 6
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File Screen Shot 2014-01-21 at 9.56.38 AM.png     File heatmap_with_7973.svg     File heatmap_without_7973.svg    
Issue Links:
Depends
Duplicate
is duplicated by SERVER-12216 Database freezes during flushing mmap... Closed
is duplicated by SERVER-6789 Improve performance of flushing data ... Closed
Related
related to SERVER-12880 Server pauses on requests every 60 se... Closed
is related to SERVER-5831 server stops processing requests duri... Closed
Operating System: ALL
Participants:
Case:

 Description   

When MongoDB flushes to disk, the rate of inserts fall considerably with at least a 30% reduction.

All the data is in memory and there are no page faults.

The issues persists despite the o/s being highly modified with recommended MongoDB production settings - noatime, ext4, high ulimit, journal on separate disk,ssd and low readahead.

mongostat snippet below showing the drop in query rates when a flush occurs:

insert  query update delete getmore command flushes mapped  vsize    res non-mapped faults  locked db idx miss %     qr|qw   ar|aw  netIn netOut  conn    set repl       time 
    73   6713   6715      0    4859    6806       0  6.47g  13.6g  4.41g      7.12g      0 test:56.7%          0       1|0     0|1    31m    93m    50 rs1  PRI   15:00:00 
    54   4392   4381      0    3320    4431       0  6.47g  13.6g  4.42g      7.12g      0 test:58.6%          0      39|1     0|1    20m    60m    50 rs1  PRI   15:00:01 
    64   6078   6113      0    4569    6183       1  6.47g  13.6g   4.4g      7.12g      0 test:67.4%          0       0|0     0|0    29m    84m    50 rs1  PRI   15:00:02 
    65   5967   5954      0    4549    6020       0  6.47g  13.6g  4.38g      7.12g      0 test:52.3%          0       0|0     0|0    28m    82m    50 rs1  PRI   15:00:03 
    58   6192   6177      0    4748    6227       0  6.47g  13.6g   4.4g      7.12g      0 test:53.2%          0      39|0     0|1    29m    85m    50 rs1  PRI   15:00:04 
    65   6212   6246      0    4704    6321       0  6.47g  13.6g  4.38g      7.12g      0 test:53.9%          0       0|0     0|0    29m    86m    50 rs1  PRI   15:00:05 
    66   6355   6348      0    4957    6412       0  6.47g  13.6g  4.41g      7.12g      0 test:54.8%          0       1|0     0|1    30m    88m    50 rs1  PRI   15:00:06 
    61   6504   6487      0    4920    6539       0  6.47g  13.6g  4.39g      7.12g      0 test:55.3%          0      39|0     0|1    30m    90m    50 rs1  PRI   15:00:07 
    63   6654   6683      0    5006    6756       0  6.47g  13.6g  4.37g      7.12g      0 test:57.6%          0       0|0     0|0    31m    92m    50 rs1  PRI   15:00:08 
    65   6754   6746      0    4928    6811       0  6.47g  13.6g   4.4g      7.12g      0 test:58.0%          0       0|0     0|1    31m    93m    50 rs1  PRI   15:00:09 
insert  query update delete getmore command flushes mapped  vsize    res non-mapped faults  locked db idx miss %     qr|qw   ar|aw  netIn netOut  conn    set repl       time 
    46   6166   6154      0    4524    6187       0  6.47g  13.6g  4.38g      7.12g      0 test:51.5%          0      39|0     0|1    29m    85m    50 rs1  PRI   15:00:10 
    40   3658   3679      0    2882    3734       1  6.47g  13.6g  4.38g      7.12g      0 test:43.8%          0       0|0     0|1    17m    51m    50 rs1  PRI   15:00:11 
    53   5821   5821      0    4196    5872       0  6.47g  13.6g  4.39g      7.12g      0 test:50.7%          0       4|0     1|1    27m    80m    50 rs1  PRI   15:00:12 
    64   6370   6366      0    4694    6432       0  6.47g  13.6g  4.38g      7.12g      0 test:55.0%          0       0|0     0|0    30m    88m    50 rs1  PRI   15:00:13 
    70   6332   6332      0    4794    6403       0  6.47g  13.6g  4.41g      7.12g      0 test:55.0%          0       0|0     0|1    30m    87m    50 rs1  PRI   15:00:14 
    66   6219   6209      0    4576    6265       0  6.47g  13.6g  4.39g      7.12g      0 test:53.7%          0      39|0     0|1    29m    86m    50 rs1  PRI   15:00:15 
    75   6151   6172      0    4823    6257       0  6.47g  13.6g  4.37g      7.12g      0 test:53.6%          0       0|0     0|1    29m    85m    50 rs1  PRI   15:00:16 
    65   6189   6189      0    4697    6255       0  6.47g  13.6g  4.39g      7.12g      0 test:53.3%          0       0|0     0|1    29m    85m    50 rs1  PRI   15:00:17 
    68   5998   5982      0    4509    6040       0  6.47g  13.6g  4.38g      7.12g      0 test:52.2%          0      39|0     0|1    28m    83m    50 rs1  PRI   15:00:18 
    62   5975   6001      0    4395    6074       0  6.47g  13.6g   4.4g      7.12g      0 test:54.9%          0       0|0     0|1    28m    83m    50 rs1  PRI   15:00:19 
insert  query update delete getmore command flushes mapped  vsize    res non-mapped faults  locked db idx miss %     qr|qw   ar|aw  netIn netOut  conn    set repl       time 
    60   6213   6208      0    4744    6268       0  6.47g  13.6g  4.39g      7.12g      0 test:54.0%          0       1|0     0|1    29m    86m    50 rs1  PRI   15:00:20 
    43   3738   3732      0    2931    3776       1  6.47g  13.6g  4.38g      7.12g      0 test:37.9%          0      10|0     0|1    17m    51m    50 rs1  PRI   15:00:21 
    71   6220   6231      0    4783    6302       0  6.47g  13.6g  4.41g      7.12g      0 test:53.5%          0       1|0     1|0    29m    86m    50 rs1  PRI   15:00:22 
    70   6073   6073      0    4737    6144       0  6.47g  13.6g  4.39g      7.12g      0 test:59.9%          0       8|0     0|1    28m    84m    50 rs1  PRI   15:00:23 
    57   6608   6619      0    4928    6675       0  6.47g  13.6g  4.37g      7.12g      0 test:58.2%          0       0|0     0|1    31m    91m    50 rs1  PRI   15:00:24 
    70   6749   6742      0    5221    6814       0  6.47g  13.6g  4.38g      7.12g      0 test:58.2%          0       1|0     0|1    31m    93m    50 rs1  PRI   15:00:25 
    59   6423   6428      0    4897    6487       0  6.47g  13.6g  4.41g      7.12g      0 test:55.5%          0       6|1     1|1    30m    89m    50 rs1  PRI   15:00:26 
    72   6294   6306      0    4729    6379       0  6.47g  13.6g  4.38g      7.12g      0 test:57.5%          0       0|0     0|0    29m    87m    50 rs1  PRI   15:00:27 
    59   6706   6699      0    5002    6756       0  6.47g  13.6g   4.4g      7.12g      0 test:56.6%          0       0|0     0|0    31m    92m    50 rs1  PRI   15:00:28 
    61   6678   6682      0    4900    6743       0  6.47g  13.6g  4.39g      7.12g      0 test:57.6%          0       1|0     1|0    31m    92m    50 rs1  PRI   15:00:29 



 Comments   
Comment by Ben Becker [ 21/Jan/14 ]

For future reference, the attached SVG files show query response times for a mixed read/write workload over the span of ~300 seconds. Each column of the heatmap represents one second. Mouse over a point for count and time in nanoseconds.

The supplied patch decreases the max query response time during flush, but the impact of flushing is still noticeable. The attached screenshot compares the two runs; flushes should be aligned.

Comment by Githook User [ 22/Nov/13 ]

Author:

{u'username': u'erh', u'name': u'Eliot Horowitz', u'email': u'eliot@10gen.com'}

Message: SERVER-7973: when fsyncing the data files to disk, do not hold any locks
Branch: master
https://github.com/mongodb/mongo/commit/34f76adc0fb9d771c044466a7e6dabfa16db4047

Generated at Thu Feb 08 03:16:12 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.