[DOCS-13207] Mongod Journaling docs are incomplete and/or incorrect Created: 07/Nov/19  Updated: 30/Oct/23  Resolved: 12/Nov/19

Status: Closed
Project: Documentation
Component/s: manual, Server
Affects Version/s: None
Fix Version/s: Server_Docs_20231030

Type: Bug Priority: Critical - P2
Reporter: Paul Done Assignee: Kay Kim (Inactive)
Resolution: Fixed Votes: 0
Labels: docs-administration, docs-query
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to DOCS-12096 Primaries journal much more frequently Closed
Participants:
Days since reply: 4 years, 13 weeks, 1 day ago
Epic Link: DOCSP-1769

 Description   

Description

Journaling docs at https://docs.mongodb.com/manual/core/journaling/ says "MongoDB syncs the buffered journal data to disk every 50 milliseconds (Starting in MongoDB 3.2)"

However this is incorrect (or at least not completely the answer) and is potentially causing users to think MongoDB can't provide a highly available data solution (ie. write concern majority) and also yield response times in far less than 50 milliseconds at the same time. 

However, for example: I am seeing in my own tests, with write concern = majority for an Atlas hosted replica-set across 3 availability zones in one region, an average response time of around 5ms and maximum response time of around 10ms.

I've just been informed that actually for write concern majority at least, the journal behaviour is: 

  • write is journaled and flushed to disk immediately on the primary - no waiting for next journal batch write - mongod does this on a separate thread so that multiple writes can be part of the same flush its not one flush per write
  • once journaled to disk on primary disk then the change is available to be replicated
  • each secondary listening to the primary's oplog does the same and flushes journal to disk as soon as it received the change before then acknowledging back

Therefore the latency of a client performing a write to a 3 node replica set using a write concern of majority is: 2Xjournal-flush + 1X network roundtrip, which will be in the order of 5-10 milliseconds for SSD disks and a fairly local network of 3 replicas. 

Scope of changes

Impact to Other Docs

MVP (Work and Date)

Resources (Scope or Design Docs, Invision, etc.)



 Comments   
Comment by Githook User [ 12/Nov/19 ]

Author:

{'name': 'Kay Kim', 'username': 'kay-kim', 'email': 'kay.kim@10gen.com'}

Message: DOCS-12096,DOCS-13207: clarify journaling frequency (secondary)
Branch: v3.6
https://github.com/mongodb/docs/commit/f6f579e0f7fc7fa0998e738c4f9d4c3f945c8eeb

Comment by Githook User [ 12/Nov/19 ]

Author:

{'name': 'Kay Kim', 'username': 'kay-kim', 'email': 'kay.kim@10gen.com'}

Message: DOCS-12096,DOCS-13207: clarify journaling frequency (secondary)
Branch: v4.0
https://github.com/mongodb/docs/commit/7b3f3af433fed06f1de7f2c8e546c135ba85bf41

Comment by Githook User [ 12/Nov/19 ]

Author:

{'username': 'kay-kim', 'email': 'kay.kim@10gen.com', 'name': 'Kay Kim'}

Message: DOCS-12096,DOCS-13207: clarify journaling frequency (secondary)
Branch: master
https://github.com/mongodb/docs/commit/bb86dbe5e5b1dbade1108dc288a3ddaa94bc9905

Comment by Githook User [ 12/Nov/19 ]

Author:

{'name': 'Kay Kim', 'username': 'kay-kim', 'email': 'kay.kim@10gen.com'}

Message: DOCS-12096,DOCS-13207: clarify journaling frequency
Branch: v3.6
https://github.com/mongodb/docs/commit/c609c2cddd1bd221c3b0560ba5895a90a69c1c25

Comment by Githook User [ 12/Nov/19 ]

Author:

{'name': 'Kay Kim', 'username': 'kay-kim', 'email': 'kay.kim@10gen.com'}

Message: DOCS-12096,DOCS-13207: clarify journaling frequency
Branch: v4.0
https://github.com/mongodb/docs/commit/8fece02cd4b75d532a4f9385f3c4a9654d5c3d9a

Comment by Githook User [ 12/Nov/19 ]

Author:

{'name': 'Kay Kim', 'username': 'kay-kim', 'email': 'kay.kim@10gen.com'}

Message: DOCS-12096,DOCS-13207: clarify journaling frequency
Branch: master
https://github.com/mongodb/docs/commit/a08ebfe5b32f636b0ba60fb7a261c4c5d8b9b3c7

Comment by Ravind Kumar (Inactive) [ 07/Nov/19 ]

We may also need to clarify the following to really close up this hole:

  • What is the relationship between journalCommitInterval and commitIntervalMs . Can you set a higher journalCommitiNterval than commitIntervalMs ?
  • Does setting j : true result definitely result in an immediate sync to disk?
Comment by Ravind Kumar (Inactive) [ 07/Nov/19 ]

This is related to (and probably can be absorbed into) DOCS-12096.

However, based on the conversation in that ticket, we only covered secondary oplog getMore's resulting in an immediate flush. Based on Paul's comments, it looks like majority write concern also causes immediate flushing?

This raises a few follow-ups:

  • Is it strictly w: "majority" ? Would setting a w : n where n is a majority (or more) also trigger this behavior?
  • Do any other read/write operations trigger an immediate flush?

cc boschg@mac.com, its been a while but our last conversation on this did not cover client-triggered journal flushing, only flushing due to replica set members.

Taking the findings here and in that ticket together, it seems like the behavior is:

  • With no pending getMore against the oplog, flushes occur on average every 50ms.
  • With any pending GetMore against the oplog, flushes occur immediately
  • With any w: "majority" write, flushes occur immediately

So something like:

  • Clusters on average flush every 50ms.
  • If writes occur more rapidly than 50ms, journal flush frequency also increases as oplog readers cause journal flushing
    • There's a note in DOCS-12096 about replication lag reducing oplog flushes I could not quite parse out.
  • w: majority or j:true always trigger an immediate flush

 

 This is all specific to WiredTiger. It is unclear how much of this behavior applies to MMAPv1, which has a default 30 ms commitIntervalMs.

Generated at Thu Feb 08 08:07:11 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.