[SERVER-38377] Additional logging for storage and replication initialization phases Created: 03/Dec/18  Updated: 27/Oct/23  Resolved: 20/Mar/20

Status: Closed
Project: Core Server
Component/s: Logging, Replication, Storage
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Emilio Scalise Assignee: Eric Milkie
Resolution: Works as Designed Votes: 1
Labels: SWDI
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-31149 Enable recovery progress messages Closed
Sprint: Storage NYC 2018-12-31, Repl 2019-01-14, Execution Team 2020-04-06
Participants:

 Description   

There are certain cases where the MongoDB process isn't producing enough logs:

  1. If a MongoDB process crashes when an index build is in progress and the index build is retried (replica set member or standalone not using noIndexBuildRetry) the index build is performed before the MongoDB process initializes replication.
  2. No explicit log entries are made regarding the retried index build.
  3. If a MongoDB process recovers the last clean checkpoint and there are many time consuming operations to rollback. This is also done before the MongoDB process initializes networking and replication.

Please add more explicit logging for this kind of recovery operations performed at mongod startup.



 Comments   
Comment by Eric Milkie [ 20/Mar/20 ]

With the conclusion of the Simultaneous Index Builds on All Nodes project, version 4.4 will no longer block on startup to build indexes (except single-phase indexes started under FCV 4.2). Thus, there are no index builds at startup to add logging for.

Comment by Eric Milkie [ 15/Jan/19 ]

I will edit the description to make this ticket only involve 1 and 2, eliminating number 3, and we'll just resolve it after the project is done.

Comment by A. Jesse Jiryu Davis [ 15/Jan/19 ]

milkie if numbers 1 and 2 will be handled during the Simultaneous Index Build project, is there a ticket I can make this one depend on?

Comment by A. Jesse Jiryu Davis [ 08/Jan/19 ]

Hi Emilio, I talked with Tess about this ticket and we're still not sure what your number 3 in the original description is about. When the node restarts, WiredTiger itself doesn't need to do significant operations to assemble valid data files. The replication system, when it's replaying oplog entries, logs like this at the default verbosity:

2019-01-08T15:47:07.675-0500 I REPL     [initandlisten] Recovering from stable timestamp: Timestamp(1546980394, 6) (top of oplog: { ts: Timestamp(1546980425, 3000), t: 2 }, appliedThrough: { ts: Timestamp(0, 0), t: -1 }, TruncateAfter: Timestamp(0, 0))
2019-01-08T15:47:07.675-0500 I REPL     [initandlisten] Starting recovery oplog application at the stable timestamp: Timestamp(1546980394, 6)
2019-01-08T15:47:07.675-0500 I REPL     [initandlisten] Replaying stored operations from { : Timestamp(1546980394, 6) } (exclusive) to { : Timestamp(1546980425, 3000) } (inclusive).

If someone is waiting for a node to start listening, the logs do describe why the node isn't listening yet. Is there anything missing from these logs that you'd expect to see?

Comment by Eric Milkie [ 27/Dec/18 ]

I'm moving this to the replication team.  Number 1 and 2 will already be taken care of by the Simultaneous Index Build project, while number 3 is regarding how replication replays oplog on startup.

Comment by Eric Milkie [ 18/Dec/18 ]

How are you starting this mongod process?  Is it just a standalone process or is part of a replica set?

Comment by Eric Milkie [ 18/Dec/18 ]

emilio.scalise can you give an example for number 3 in the description?

Comment by Eric Milkie [ 18/Dec/18 ]

I'm going to edit some of the details of this ticket. Index rebuilding at startup time is not specific to WiredTiger.

Generated at Thu Feb 08 04:48:47 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.