[SERVER-33416] Mongod service keeps restarting and now it crashes without logging why Created: 21/Feb/18  Updated: 23/Apr/18  Resolved: 18/Mar/18

Status: Closed
Project: Core Server
Component/s: Diagnostics, Logging
Affects Version/s: 3.6.2
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Robin [X] Assignee: Dmitry Agranat
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File metrics.2018-01-26T16-43-36Z-00000     File metrics.2018-01-28T01-44-33Z-00000     File metrics.2018-01-29T10-26-16Z-00000     File metrics.2018-01-29T10-53-51Z-00000     File metrics.2018-01-30T10-32-05Z-00000     File metrics.2018-01-30T15-33-51Z-00000     File metrics.2018-01-31T19-53-21Z-00000     File metrics.2018-02-01T13-30-47Z-00000     File metrics.2018-02-02T07-38-15Z-00000     File metrics.2018-02-02T15-49-43Z-00000     File metrics.2018-02-03T21-25-58Z-00000     File metrics.2018-02-05T03-01-44Z-00000     File metrics.2018-02-05T07-50-20Z-00000     File metrics.2018-02-06T13-03-29Z-00000     File metrics.2018-02-06T15-51-49Z-00000     File metrics.2018-02-07T08-57-08Z-00000     File metrics.2018-02-07T09-36-33Z-00000     File metrics.2018-02-07T12-34-44Z-00000     File metrics.2018-02-07T16-08-07Z-00000     File metrics.2018-02-08T13-13-22Z-00000     File metrics.2018-02-08T13-16-20Z-00000     File metrics.2018-02-08T15-01-29Z-00000     File metrics.2018-02-09T16-20-35Z-00000     File metrics.2018-02-11T02-43-11Z-00000     File metrics.2018-02-12T13-09-31Z-00000     File metrics.2018-02-13T07-15-17Z-00000     File metrics.2018-02-13T07-45-14Z-00000     File metrics.2018-02-13T12-43-08Z-00000     File metrics.2018-02-14T16-48-13Z-00000     File metrics.2018-02-15T07-36-04Z-00000     File metrics.2018-02-15T15-13-28Z-00000     File metrics.2018-02-16T07-31-54Z-00000     File metrics.2018-02-16T16-28-49Z-00000     File metrics.2018-02-17T23-06-04Z-00000     File metrics.2018-02-19T10-10-05Z-00000     File metrics.2018-02-20T08-44-45Z-00000     File metrics.2018-02-20T08-47-47Z-00000     File metrics.2018-02-20T09-02-27Z-00000     File metrics.2018-02-20T09-18-08Z-00000     File metrics.2018-02-20T09-32-26Z-00000     File metrics.2018-02-20T09-47-26Z-00000     File metrics.2018-02-20T09-48-08Z-00000     File metrics.2018-02-20T10-02-35Z-00000     File metrics.2018-02-20T10-48-07Z-00000     File metrics.2018-02-20T11-32-34Z-00000     File metrics.2018-02-20T11-41-47Z-00000     File metrics.2018-02-20T11-47-36Z-00000     File metrics.2018-02-20T12-17-29Z-00000     File metrics.2018-02-20T12-18-12Z-00000     File metrics.2018-02-20T12-32-28Z-00000     File metrics.2018-02-20T13-17-27Z-00000     File metrics.2018-02-20T13-18-09Z-00000     File metrics.2018-02-20T13-32-29Z-00000     File metrics.2018-02-20T13-47-28Z-00000     File metrics.2018-02-20T13-48-11Z-00000     File metrics.2018-02-20T14-02-28Z-00000     File metrics.2018-02-20T14-17-28Z-00000     File metrics.2018-02-20T14-48-10Z-00000     File metrics.2018-02-20T15-18-10Z-00000     File metrics.2018-02-20T15-47-28Z-00000     File metrics.2018-02-20T16-02-28Z-00000     File metrics.2018-02-20T16-32-29Z-00000     File metrics.2018-02-20T16-47-30Z-00000     File metrics.2018-02-20T16-48-15Z-00000     File metrics.2018-02-20T17-18-13Z-00000     File metrics.2018-02-20T17-47-29Z-00000     File metrics.2018-02-20T17-48-13Z-00000     File metrics.2018-02-20T18-18-15Z-00000     File metrics.2018-02-20T18-31-24Z-00000     File metrics.2018-02-20T18-48-12Z-00000     File metrics.2018-02-20T19-02-14Z-00000     File metrics.2018-02-20T19-06-50Z-00000     File metrics.2018-02-20T19-24-41Z-00000     File metrics.2018-02-20T19-40-29Z-00000     File metrics.2018-02-20T19-54-45Z-00000     File metrics.2018-02-20T19-57-24Z-00000     File metrics.2018-02-20T20-10-32Z-00000     File metrics.2018-02-20T20-21-11Z-00000     File metrics.2018-02-20T20-40-15Z-00000     File metrics.2018-02-20T20-54-45Z-00000     File metrics.2018-02-20T20-54-53Z-00000     File metrics.2018-02-20T20-56-53Z-00000     File metrics.2018-02-20T21-09-56Z-00000     File metrics.2018-02-20T21-10-48Z-00000     File metrics.2018-02-20T21-25-29Z-00000     File metrics.2018-02-20T21-39-38Z-00000     File metrics.2018-02-20T21-40-21Z-00000     File metrics.2018-02-20T22-10-17Z-00000     File metrics.2018-02-20T22-40-18Z-00000     File metrics.2018-02-20T22-54-34Z-00000     File metrics.2018-02-20T23-10-20Z-00000     File metrics.2018-02-20T23-24-35Z-00000     File metrics.2018-02-20T23-40-19Z-00000     File metrics.2018-02-20T23-54-33Z-00000     File metrics.2018-02-21T00-10-16Z-00000     File metrics.2018-02-21T00-24-35Z-00000     File metrics.2018-02-21T00-40-17Z-00000     File metrics.2018-02-21T00-54-33Z-00000     File metrics.2018-02-21T01-10-21Z-00000     File metrics.2018-02-21T01-24-37Z-00000     File metrics.2018-02-21T01-39-34Z-00000     File metrics.2018-02-21T01-40-17Z-00000     File metrics.2018-02-21T02-10-18Z-00000     File metrics.2018-02-21T02-24-35Z-00000     File metrics.2018-02-21T02-40-16Z-00000     File metrics.2018-02-21T03-09-34Z-00000     File metrics.2018-02-21T03-10-22Z-00000     File metrics.2018-02-21T03-24-36Z-00000     File metrics.2018-02-21T03-39-33Z-00000     File metrics.2018-02-21T03-40-16Z-00000     File metrics.2018-02-21T04-09-33Z-00000     File metrics.2018-02-21T04-10-17Z-00000     File metrics.2018-02-21T04-24-34Z-00000     File metrics.2018-02-21T04-39-35Z-00000     File metrics.2018-02-21T04-40-20Z-00000     File metrics.2018-02-21T04-54-34Z-00000     File metrics.2018-02-21T05-09-33Z-00000     File metrics.2018-02-21T05-10-16Z-00000     File metrics.2018-02-21T05-24-38Z-00000     File metrics.2018-02-21T05-40-20Z-00000     File metrics.2018-02-21T06-09-32Z-00000     File metrics.2018-02-21T06-40-18Z-00000     File metrics.2018-02-21T06-54-33Z-00000     File metrics.2018-02-21T07-09-33Z-00000     File metrics.2018-02-21T07-10-22Z-00000     File metrics.2018-02-21T07-13-11Z-00000     File metrics.2018-02-21T07-19-07Z-00000     File metrics.2018-02-21T07-24-51Z-00000     File metrics.2018-02-21T07-40-22Z-00000     File metrics.2018-02-21T07-41-22Z-00000     File metrics.2018-02-21T07-54-42Z-00000     File metrics.2018-02-21T08-09-33Z-00000     File metrics.2018-02-21T08-10-17Z-00000     File metrics.2018-02-21T08-22-48Z-00000     File metrics.2018-02-21T08-25-50Z-00000     File metrics.2018-02-21T08-40-23Z-00000     File metrics.2018-02-21T09-09-34Z-00000     File metrics.2018-02-21T09-10-20Z-00000     File metrics.2018-02-21T09-24-37Z-00000     File metrics.2018-02-21T09-26-53Z-00000     File metrics.2018-02-21T09-28-41Z-00000     File metrics.2018-02-21T09-31-49Z-00000     File metrics.2018-02-21T10-06-24Z-00000     File metrics.2018-02-21T10-10-24Z-00000     File metrics.2018-02-21T10-10-42Z-00000     File metrics.2018-02-21T10-24-42Z-00000     File metrics.2018-02-21T10-39-37Z-00000     File metrics.2018-02-21T10-40-21Z-00000     File metrics.2018-02-21T10-49-18Z-00000     File metrics.2018-02-21T10-54-44Z-00000     File metrics.2018-02-21T11-00-14Z-00000     File metrics.2018-02-21T11-00-43Z-00000     File metrics.2018-02-21T11-09-48Z-00000     File metrics.2018-02-21T11-10-35Z-00000     File metrics.2018-02-21T12-50-48Z-00000     File metrics.2018-02-21T14-06-12Z-00000     File metrics.2018-02-21T15-00-04Z-00000     File metrics.2018-02-21T15-24-50Z-00000     File metrics.interim     Zip Archive mongod.2017-07-12T11-09-09.zip    
Operating System: ALL
Steps To Reproduce:

After 10 minutes the Windows Service crashes.

I use the following MongoDB.conf:

systemLog:
  destination: file
  path: C:/ProgramData/Stuiter/Slurf/Logging/MongoDB.log
  logAppend: true
  logRotate: rename
  timeStampFormat: iso8601-local
 
storage:
  dbPath: C:/ProgramData/Stuiter/Slurf/Data/MongoDB
  journal:
    enabled: true
  wiredTiger:
    engineConfig:
      cacheSizeGB: 1
 
processManagement:
  windowsService:
    serviceName: MongoDbService
    displayName: MongoDbService
    description: MongoDB service
 
net:
   port: 27017
   bindIp: 127.0.0.1

Participants:

 Description   

I'm running Mongod as a Windows service. But it logs that it keeps restarting.

2018-02-21T12:09:48.053+0100 I NETWORK  [listener] connection accepted from 127.0.0.1:64865 #9 (8 connections now open)
2018-02-21T12:09:48.068+0100 I NETWORK  [listener] connection accepted from 127.0.0.1:64866 #10 (9 connections now open)
2018-02-21T12:09:48.069+0100 I NETWORK  [listener] connection accepted from 127.0.0.1:64867 #11 (10 connections now open)
2018-02-21T12:09:48.070+0100 I NETWORK  [listener] connection accepted from 127.0.0.1:64868 #12 (11 connections now open)
2018-02-21T12:09:48.071+0100 I NETWORK  [listener] connection accepted from 127.0.0.1:64869 #13 (12 connections now open)
2018-02-21T12:09:48.072+0100 I NETWORK  [listener] connection accepted from 127.0.0.1:64870 #14 (13 connections now open)
2018-02-21T12:09:48.072+0100 I NETWORK  [listener] connection accepted from 127.0.0.1:64872 #15 (14 connections now open)
2018-02-21T12:09:48.073+0100 I NETWORK  [listener] connection accepted from 127.0.0.1:64871 #16 (15 connections now open)
2018-02-21T12:09:48.074+0100 I NETWORK  [listener] connection accepted from 127.0.0.1:64873 #17 (16 connections now open)
2018-02-21T12:09:48.074+0100 I NETWORK  [listener] connection accepted from 127.0.0.1:64874 #18 (17 connections now open)
2018-02-21T12:09:48.075+0100 I NETWORK  [listener] connection accepted from 127.0.0.1:64875 #19 (18 connections now open)
2018-02-21T12:09:48.075+0100 I NETWORK  [listener] connection accepted from 127.0.0.1:64876 #20 (19 connections now open)
2018-02-21T12:10:32.591+0100 I CONTROL  [main] ***** SERVER RESTARTED *****
2018-02-21T12:10:32.760+0100 I CONTROL  [initandlisten] MongoDB starting : pid=31012 port=27017 dbpath=C:/ProgramData/Stuiter/Slurf/Data/MongoDB 64-bit host=Pc-StuiterSlurf
2018-02-21T12:10:32.760+0100 I CONTROL  [initandlisten] targetMinOS: Windows 7/Windows Server 2008 R2
2018-02-21T12:10:32.760+0100 I CONTROL  [initandlisten] db version v3.6.2
2018-02-21T12:10:32.760+0100 I CONTROL  [initandlisten] git version: 489d177dbd0f0420a8ca04d39fd78d0a2c539420
2018-02-21T12:10:32.760+0100 I CONTROL  [initandlisten] OpenSSL version: OpenSSL 1.0.1u-fips  22 Sep 2016
2018-02-21T12:10:32.761+0100 I CONTROL  [initandlisten] allocator: tcmalloc
2018-02-21T12:10:32.761+0100 I CONTROL  [initandlisten] modules: none
2018-02-21T12:10:32.761+0100 I CONTROL  [initandlisten] build environment:
2018-02-21T12:10:32.761+0100 I CONTROL  [initandlisten]     distmod: 2008plus-ssl
2018-02-21T12:10:32.761+0100 I CONTROL  [initandlisten]     distarch: x86_64
2018-02-21T12:10:32.761+0100 I CONTROL  [initandlisten]     target_arch: x86_64
2018-02-21T12:10:32.761+0100 I CONTROL  [initandlisten] options: { config: "C:\ProgramData\Stuiter\Slurf\Configuration\MongoDB.conf", net: { bindIp: "127.0.0.1", port: 27017 }, processManagement: { windowsService: { description: "MongoDB service", displayName: "MongoDbService", serviceName: "MongoDbService" } }, storage: { dbPath: "C:/ProgramData/Stuiter/Slurf/Data/MongoDB", journal: { enabled: true }, wiredTiger: { engineConfig: { cacheSizeGB: 1.0 } } }, systemLog: { destination: "file", logAppend: true, logRotate: "rename", path: "C:/ProgramData/Stuiter/Slurf/Logging/MongoDB.log", timeStampFormat: "iso8601-local" } }
2018-02-21T12:10:32.762+0100 W -        [initandlisten] Detected unclean shutdown - C:/ProgramData/Stuiter/Slurf/Data/MongoDB\mongod.lock is not empty.
2018-02-21T12:10:32.763+0100 I -        [initandlisten] Detected data files in C:/ProgramData/Stuiter/Slurf/Data/MongoDB created by the 'wiredTiger' storage engine, so setting the active storage engine to 'wiredTiger'.
2018-02-21T12:10:32.768+0100 W STORAGE  [initandlisten] Recovering data from the last clean checkpoint.
2018-02-21T12:10:32.768+0100 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=1024M,session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),statistics_log=(wait=0),verbose=(recovery_progress),
2018-02-21T12:10:33.106+0100 I STORAGE  [initandlisten] WiredTiger message [1519211433:106619][31012:140708089452240], txn-recover: Main recovery loop: starting at 30747/256
2018-02-21T12:10:33.108+0100 I STORAGE  [initandlisten] WiredTiger message [1519211433:107619][31012:140708089452240], txn-recover: Recovering log 30747 through 30748
2018-02-21T12:10:33.262+0100 I STORAGE  [initandlisten] WiredTiger message [1519211433:261730][31012:140708089452240], file:index-1-3415253110402412901.wt, txn-recover: Recovering log 30748 through 30748
2018-02-21T12:10:33.755+0100 I CONTROL  [initandlisten] 
2018-02-21T12:10:33.755+0100 I CONTROL  [initandlisten] ** WARNING: Access control is not enabled for the database.
2018-02-21T12:10:33.755+0100 I CONTROL  [initandlisten] **          Read and write access to data and configuration is unrestricted.
2018-02-21T12:10:33.756+0100 I CONTROL  [initandlisten] 
2018-02-21T12:10:33.756+0100 I CONTROL  [initandlisten] 
2018-02-21T12:10:33.756+0100 I CONTROL  [initandlisten] ** WARNING: The file system cache of this machine is configured to be greater than 40% of the total memory. This can lead to increased memory pressure and poor performance.
2018-02-21T12:10:33.756+0100 I CONTROL  [initandlisten] See http://dochub.mongodb.org/core/wt-windows-system-file-cache
2018-02-21T12:10:33.756+0100 I CONTROL  [initandlisten] 
2018-02-21T12:10:34.945+0100 I FTDC     [initandlisten] Initializing full-time diagnostic data capture with directory 'C:/ProgramData/Stuiter/Slurf/Data/MongoDB/diagnostic.data'
2018-02-21T12:10:34.948+0100 I NETWORK  [initandlisten] waiting for connections on port 27017



 Comments   
Comment by Dmitry Agranat [ 18/Mar/18 ]

Hi StuiterSlurf,

We haven’t heard back from you for some time, so I’m going to mark this ticket as resolved. If this is still an issue for you, please provide additional information and we will reopen the ticket.

Regards,
Dima

Comment by Kelsey Schubert [ 09/Mar/18 ]

Hi StuiterSlurf,

We still need additional information to diagnose the problem. If this is still an issue for you, would you please upload the files Dima requested to the portal I created?

Thank you,
Kelsey

Comment by Kelsey Schubert [ 01/Mar/18 ]

Hi StuiterSlurf,

I've created an upload portal which has a much higher file size limit for your convenience going forward.

Thanks,
Kelsey

Comment by Robin [X] [ 28/Feb/18 ]

Dear Dima,

There where to many files that I could upload. That's why they are all here one by one.

But I will upload the logs tomorrow.

The system has indeed 4 gb of RAM. But I can limit MongoDB to 1gb according to the specs. If it needs more memory it should not crash? Or am I seeing it wrong. Also the logging just quits at that time it crashes.

But with what program are you reading those diagnostic files?

Comment by Dmitry Agranat [ 28/Feb/18 ]

Hi StuiterSlurf,

Thank you for providing the requesting information.

The diagnostic.data does indicate that there is a memory pressure on your 4GB RAM server.

From the log I've noticed:

2018-02-21T12:10:33.756+0100 I CONTROL  [initandlisten] ** WARNING: The file system cache of this machine is configured to be greater than 40% of the total memory. This can lead to increased memory pressure and poor performance.
2018-02-21T12:10:33.756+0100 I CONTROL  [initandlisten] See http://dochub.mongodb.org/core/wt-windows-system-file-cache

Can you use the SetSystemFileCacheSize to limit the amount of memory that the file system cache can use and report back if this improves the situation? If the issue reoccurs after limiting the file system cache please provide:

  • The archived/zipped content of the diagnostic.data directory available under the dbpath. Please upload this as one zipped file rather than extracting and uploading all the individual files.
  • The compressed copy of the mongod logs covering the time of the crash.
  • A fresh set of .mdmp in case any new memory dump files are generated.

In the meantime:

  • Can you clarify if mongod is the only process running on this server?
  • I will review the provided memory dump files.

Thanks,
Dima

Comment by Robin [X] [ 27/Feb/18 ]

Do you got an estimate when you can get back to me on this Kelsey?

Comment by Ramon Fernandez Marina [ 21/Feb/18 ]

StuiterSlurf, when a ***** SERVER RESTARTED ***** line appears suddenly in the logs it's an indication that the operating system killed the mongod process abruptly, often because it was consuming too much memory.

This by itself is not an indication of a bug in the server, but we'll be able to know for sure if you can upload the contents of the diagnostic.data directory from your dbpath.

Chances are you need a machine with more memory.

Regards,
Ramón.

Comment by Robin [X] [ 21/Feb/18 ]

This is also for the 3.4 versions.

Generated at Thu Feb 08 04:33:20 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.