[SERVER-28193] Startup delay when running up a cloned secondary Created: 04/Mar/17  Updated: 12/Jul/17  Resolved: 09/Jun/17

Status: Closed
Project: Core Server
Component/s: MMAPv1
Affects Version/s: 3.4.2
Fix Version/s: None

Type: Question Priority: Major - P3
Reporter: Paul Reed Assignee: Mark Agarunov
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Participants:

 Description   

When starting up the mongod service on a standalone machine utilising a cloned drive from a stopped secondary, there is a start up delay of 2-3 minutes whilst stuff (unlogged) occurs.

Is there a way to prevent this delay ?

Process
1) Clone the data drive of a 3 machines secondary.
2) Load the drive to a standalone machine.
3) Start mongo service.

Log extract:

2017-03-04T10:13:05.132+0000 I CONTROL  [initandlisten] MongoDB starting : pid=4420 port=37017 dbpath=c:/MongoDBDrive/data 64-bit host=mach-4
2017-03-04T10:13:05.132+0000 I CONTROL  [initandlisten] targetMinOS: Windows 7/Windows Server 2008 R2
2017-03-04T10:13:05.133+0000 I CONTROL  [initandlisten] db version v3.4.2
2017-03-04T10:13:05.133+0000 I CONTROL  [initandlisten] git version: 3f76e40c105fc223b3e5aac3e20dcd026b83b38b
2017-03-04T10:13:05.133+0000 I CONTROL  [initandlisten] allocator: tcmalloc
2017-03-04T10:13:05.133+0000 I CONTROL  [initandlisten] modules: none
2017-03-04T10:13:05.133+0000 I CONTROL  [initandlisten] build environment:
2017-03-04T10:13:05.133+0000 I CONTROL  [initandlisten]     distmod: 2008plus
2017-03-04T10:13:05.133+0000 I CONTROL  [initandlisten]     distarch: x86_64
2017-03-04T10:13:05.133+0000 I CONTROL  [initandlisten]     target_arch: x86_64
2017-03-04T10:13:05.133+0000 I CONTROL  [initandlisten] options: { config: "c:\MongoDBBinaries\conf\mongo.conf", net: { http: { JSONPEnabled: false, RESTInterfaceEnabled: false, enabled: false }, port: 37017 }, operationProfiling: { mode: "off", slowOpThresholdMs: 2000 }, service: true, storage: { dbPath: "c:/MongoDBDrive/data", directoryPerDB: true, engine: "mmapv1" }, systemLog: { destination: "file", logAppend: false, path: "c:/MongoDBDrive/logs/logfile.log" } }
2017-03-04T10:13:05.201+0000 I JOURNAL  [initandlisten] journal dir=c:/MongoDBDrive/data\journal
2017-03-04T10:13:05.201+0000 I JOURNAL  [initandlisten] recover : no journal files present, no recovery needed
2017-03-04T10:13:05.647+0000 I JOURNAL  [durability] Durability thread started
2017-03-04T10:13:05.647+0000 I JOURNAL  [journal writer] Journal writer thread started
2017-03-04T10:13:05.663+0000 I CONTROL  [initandlisten] 
2017-03-04T10:13:05.663+0000 I CONTROL  [initandlisten] ** WARNING: Access control is not enabled for the database.
2017-03-04T10:13:05.663+0000 I CONTROL  [initandlisten] **          Read and write access to data and configuration is unrestricted.
2017-03-04T10:13:05.663+0000 I CONTROL  [initandlisten] 
*^^^^^^^^^VVVVVVV - nothing happening here in logs -----* 
2017-03-04T10:16:28.776+0000 I FTDC     [initandlisten] Initializing full-time diagnostic data capture with directory 'c:/MongoDBDrive/data/diagnostic.data'
2017-03-04T10:16:28.933+0000 I STORAGE  [initandlisten] 
2017-03-04T10:16:28.933+0000 I STORAGE  [initandlisten] ** WARNING: mongod started without --replSet yet 1 documents are present in local.system.replset
2017-03-04T10:16:28.933+0000 I STORAGE  [initandlisten] **          Restart with --replSet unless you are doing maintenance and  no other clients are connected.
2017-03-04T10:16:28.933+0000 I STORAGE  [initandlisten] **          The TTL collection monitor will not start because of this.
2017-03-04T10:16:28.933+0000 I STORAGE  [initandlisten] **         
2017-03-04T10:16:28.934+0000 I STORAGE  [initandlisten]  For more info see http://dochub.mongodb.org/core/ttlcollections
2017-03-04T10:16:28.934+0000 I STORAGE  [initandlisten] 
2017-03-04T10:16:28.950+0000 I STORAGE  [initandlisten] Service running
2017-03-04T10:16:28.963+0000 I NETWORK  [thread1] waiting for connections on port 37017
 



 Comments   
Comment by Mark Agarunov [ 09/Jun/17 ]

Hello paul.reed,

Unfortunately I have still not been able to reproduce the behavior you've described. I've closed this ticket as "cannot reproduce", but if any new information comes to light, please let us know and we will further investigate the issue.

Thanks,
Mark

Comment by Mark Agarunov [ 01/May/17 ]

Hello paul.reed,

My apologies for the delay in response. Unfortunately we have not yet established a definitive cause for the behavior you're seeing and are still investigating the issue. We will post updates on this as they are available.

Thanks,
Mark

Comment by Paul Reed [ 25/Apr/17 ]

Any progress on this ?

Comment by Paul Reed [ 04/Apr/17 ]

Ok - I have done that.

Same machine as before.

open shell
use local
db.system.replset.renameCollection('systemreplset')
close shell
net stop mongodb
net start mongodb
start shell
net stop mongodb

  • logs taken here

As you can see that this is now super speedy.
So my guess is you might be continually looking for replica sets even though the config does not specify a replica set configuration.


If is subsequently rename the collection back to 'system.replset' start up speed is still fast.

The machine running against this cloned drive is not one of the members of the replset.

Comment by Mark Agarunov [ 03/Apr/17 ]

Hello paul.reed,

Thank you for providing these files. Looking over the logs, it looks like the startup time may be disk io limited, however I would like to compare these logs to the logs from a startup that you don't experience the issue, also with verbose logging enabled. This would allow us to verify if the cause of the longer startup is a bug or a symptom of disk io limiting the speed of MMAPv1 recovery. To restate, please provide the complete logs, with verbose logging enabled as before, from a startup where this issue is not present if possible.

Thanks,
Mark

Comment by Paul Reed [ 28/Mar/17 ]

Files now uploaded - was not in the office last week.

Comment by Paul Reed [ 21/Mar/17 ]

Sorry - will upload tomorrow. Missed that you were waiting on me.

Comment by Mark Agarunov [ 21/Mar/17 ]

Hello paul.reed,

We haven’t heard back from you for some time, so I’m going to mark this ticket as resolved. If this is still an issue for you, please provide additional information and we will reopen the ticket.

Thanks,
Mark

Comment by Mark Agarunov [ 10/Mar/17 ]

Hello paul.reed,

Thank you for providing this. We are still investigating the issue and may need some more information. If possible, please provide:

  • The complete log files from mongod with verbose logging enabled as before
  • An archive (tar or zip) of the $dbpath/diagnostic.data directory

I've created an upload portal so that you can securely send us these files.

Thanks,
Mark

Comment by Paul Reed [ 08/Mar/17 ]

I wont post it here:

but it does these operations over and over - I guess for every collection/database.

mmf open c:/MongoDBDrive/data\z-Resources\z-Resources.ns
-1372 times

2017-03-08T20:12:57.197+0000 D STORAGE [initandlisten] test().Got99: clearing plan cache - collection info cache reset

  • 3848 times

2017-03-08T20:12:57.470+0000 D INDEX [initandlisten] "IndexRebuilder"::checkNS: admin.system.indexes
2017-03-08T20:12:57.470+0000 D STORAGE [initandlisten] ***** ROLLING BACK 0 disk writes
2017-03-08T20:12:57.470+0000 D STORAGE [initandlisten] ***** ROLLING BACK 0 custom changes

  • 3848 times

Taking 3 minutes in total.
Also I think it does occur on the live replicaset, not sure why my system.replset renames prevented it here in that case.

  • Another aside is that I always get a "system error has occurred" 1067 - on net stop mongodb - which I thought no longer occurred, could this be related ?

log shows nothing unusual I think:

2017-03-08T20:35:55.839+0000 I JOURNAL  [serviceShutdown] journalCleanup...
2017-03-08T20:35:55.839+0000 I JOURNAL  [serviceShutdown] removeJournalFiles
2017-03-08T20:35:55.839+0000 I JOURNAL  [serviceShutdown] old journal file will be removed: c:/MongoDBDrive/data\journal\j._0
2017-03-08T20:35:55.840+0000 D JOURNAL  [serviceShutdown] removeJournalFiles end
2017-03-08T20:35:55.840+0000 I JOURNAL  [serviceShutdown] Terminating durability thread ...
2017-03-08T20:35:55.846+0000 D JOURNAL  [durability] groupCommit begin
2017-03-08T20:35:55.847+0000 D JOURNAL  [durability] Processing commit number 16457
2017-03-08T20:35:55.847+0000 D JOURNAL  [durability] groupCommit end
2017-03-08T20:35:55.847+0000 I JOURNAL  [journal writer] Journal writer thread stopped
2017-03-08T20:35:55.847+0000 I JOURNAL  [durability] Durability thread stopped
2017-03-08T20:35:55.847+0000 I STORAGE  [serviceShutdown] shutdown: closing all files...
2017-03-08T20:35:56.303+0000 I STORAGE  [serviceShutdown] closeAllFiles() finished
2017-03-08T20:35:56.303+0000 I STORAGE  [serviceShutdown] shutdown: removing fs lock...
2017-03-08T20:35:56.303+0000 I CONTROL  [serviceShutdown] now exiting
2017-03-08T20:35:56.303+0000 I CONTROL  [serviceStopWorker] shutting down with code:49

Comment by Mark Agarunov [ 08/Mar/17 ]

Hello paul.reed,

Thank you for the report. To get a better idea of what may be causing this behavior, please set the logging verbosity for mongod to 5, either by passing -vvvvv as an option to mongod on the command line, or setting the verbosity parameter under systemLog in your configuration file to 5, and provide the complete log file from mongod. This may shed some light on what mongod is doing during this delay.

Thanks,
Mark

Comment by Paul Reed [ 07/Mar/17 ]

I am also seeing the delay on replicaset servers as well actually.

...
2017-03-07T13:17:00.569+0000 I JOURNAL  [initandlisten] recover : no journal files present, no recovery needed
2017-03-07T13:17:00.960+0000 I JOURNAL  [durability] Durability thread started
2017-03-07T13:17:00.961+0000 I JOURNAL  [journal writer] Journal writer thread started
2017-03-07T13:17:00.982+0000 I CONTROL  [initandlisten] 
2017-03-07T13:17:00.982+0000 I CONTROL  [initandlisten] ** WARNING: Access control is not enabled for the database.
2017-03-07T13:17:00.982+0000 I CONTROL  [initandlisten] **          Read and write access to data and configuration is unrestricted.
2017-03-07T13:17:00.982+0000 I CONTROL  [initandlisten] 
v^ --- 1 min 22 in this case --- 
2017-03-07T13:18:22.529+0000 I FTDC     [initandlisten] Initializing full-time diagnostic data capture with directory 'c:/MongoDBDrive/data/diagnostic.data'
2017-03-07T13:18:22.697+0000 I REPL     [initandlisten] Did not find local voted for document at startup;  NoMatchingDocument: Did not find replica set lastVote document in local.replset.election
2017-03-07T13:18:22.702+0000 I STORAGE  [initandlisten] Service running
...

Comment by Paul Reed [ 04/Mar/17 ]

*I meant 2 -3 minutes - of course 2-3 seconds would be fine. !!
*

We are running about 1TB of data.
The drives are SSD.
If I - stop and start mongo service I get the delay everytime.
If I rename "system.replset" - stop - then start the delay is gone.
If I start - reinstate "system.replset" - stop - then start the delay is still gone.

Generated at Thu Feb 08 04:17:25 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.