[SERVER-40368] Clustered Host Fails to Start - After Disabling Replication Created: 28/Mar/19  Updated: 25/Apr/19  Resolved: 25/Apr/19

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Steven B. Assignee: Danny Hatcher (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Steps To Reproduce:

Shutting down host via "service mongodb stop"

systemctl --no-block start mongod

Participants:

 Description   

MongoDB exits each time the application is started.

2019-03-28T00:58:40.027Z D STORAGE  [initandlisten] Setting oplog visibility at startup. Val: Timestamp(1553725030, 4)
2019-03-28T00:58:45.375Z D STORAGE  [initandlisten] Slow WT transaction. Lifetime of SnapshotId 1 was 52171ms
2019-03-28T00:58:45.385Z D STORAGE  [initandlisten] flushing directory /var/lib/mongo
2019-03-28T00:58:45.385Z I CONTROL  [initandlisten] 
2019-03-28T00:58:45.385Z I CONTROL  [initandlisten] ** WARNING: Access control is not enabled for the database.
2019-03-28T00:58:45.385Z I CONTROL  [initandlisten] **          Read and write access to data and configuration is unrestricted.
2019-03-28T00:58:45.385Z I CONTROL  [initandlisten] 
2019-03-28T00:59:21.587Z I CONTROL  [signalProcessingThread] got signal 15 (Terminated), will terminate after current cmd ends
2019-03-28T00:59:21.587Z I NETWORK  [signalProcessingThread] shutdown: going to close listening sockets...
2019-03-28T00:59:21.587Z I NETWORK  [signalProcessingThread] removing socket file: /tmp/mongodb-27017.sock
2019-03-28T00:59:21.588Z I REPL     [signalProcessingThread] shutting down replication subsystems
2019-03-28T00:59:21.588Z W REPL     [signalProcessingThread] ReplicationCoordinatorImpl::shutdown() called before startup() finished.  Shutting down without cleaning up the replication system
2019-03-28T00:59:21.588Z D QUERY    [signalProcessingThread] received interrupt request for unknown op: 2
2019-03-28T00:59:21.588Z D QUERY    [signalProcessingThread] received interrupt request for unknown op: 1
2019-03-28T00:59:21.588Z D -        [WT OplogTruncaterThread: local.oplog.rs] User Assertion: InterruptedAtShutdown: interrupted at shutdown src/mongo/db/operation_context.cpp 255
2019-03-28T00:59:21.588Z W -        [WT OplogTruncaterThread: local.oplog.rs] DBException thrown :: caused by :: InterruptedAtShutdown: interrupted at shutdown
2019-03-28T00:59:21.601Z I -        [WT OplogTruncaterThread: local.oplog.rs]  0x56041a4743e1 0x56041a474264 0x56041a469941 0x56041a4252d5 0x560418a764b9 0x560418a79727 0x56041a3c2a04 0x560419dd4011 0x560419ddaad2 0x560419dc7236 0x560419dc7935 0x560418b33702 0x560418b33e80 0x560419ec72e1 0x56041a584000 0x7ff4a8623dd5 0x7ff4a834cead
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"56041805C000","o":"24183E1","s":"_ZN5mongo15printStackTraceERSo"},{"b":"56041805C000","o":"2418264","s":"_ZN5mongo15printStackTraceEv"},{"b":"56041805C000","o":"240D941","s":"_ZN5mongo11DBException13traceIfNeededERKS0_"},{"b":"56041805C000","o":"23C92D5","s":"_ZN5mongo13error_details16ExceptionForImplILNS_10ErrorCodes5ErrorE11600EINS_15ExceptionForCatILNS_13ErrorCategoryE1EEENS4_ILS5_6EEEEEC1ERKNS_6StatusE"},{"b":"56041805C000","o":"A1A4B9","s":"_ZN5mongo13error_details23throwExceptionForStatusERKNS_6StatusE"},{"b":"56041805C000","o":"A1D727","s":"_ZN5mongo21uassertedWithLocationERKNS_6StatusEPKcj"},{"b":"56041805C000","o":"2366A04","s":"_ZN5mongo16OperationContext32waitForConditionOrInterruptUntilERSt18condition_variableRSt11unique_lockISt5mutexENS_6Date_tE"},{"b":"56041805C000","o":"1D78011","s":"_ZN5mongo28CondVarLockGrantNotification4waitEPNS_16OperationContextENS_8DurationISt5ratioILl1ELl1000EEEE"},{"b":"56041805C000","o":"1D7EAD2","s":"_ZN5mongo10LockerImplILb0EE12lockCompleteEPNS_16OperationContextENS_10ResourceIdENS_8LockModeENS_6Date_tEb"},{"b":"56041805C000","o":"1D6B236","s":"_ZN5mongo4Lock10GlobalLock16waitForLockUntilENS_6Date_tE"},{"b":"56041805C000","o":"1D6B935","s":"_ZN5mongo4Lock10GlobalLockC1EPNS_16OperationContextENS_8LockModeENS_6Date_tENS0_17InterruptBehaviorE"},{"b":"56041805C000","o":"AD7702"},{"b":"56041805C000","o":"AD7E80"},{"b":"56041805C000","o":"1E6B2E1","s":"_ZN5mongo13BackgroundJob7jobBodyEv"},{"b":"56041805C000","o":"2528000"},{"b":"7FF4A861C000","o":"7DD5"},{"b":"7FF4A824F000","o":"FDEAD","s":"clone"}],"processInfo":{ "mongodbVersion" : "4.0.7", "gitVersion" : "1b82c812a9c0bbf6dc79d5400de9ea99e6ffa025", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "3.10.0-957.5.1.el7.x86_64", "version" : "#1 SMP Wed Dec 19 10:46:58 EST 2018", "machine" : "x86_64" }, "somap" : [ { "b" : "56041805C000", "elfType" : 3, "buildId" : "FF5AE68E9F991D3E57D8D5C0270F9F794EFFF669" }, { "b" : "7FFE17BEB000", "elfType" : 3, "buildId" : "733D1C681CFBD8BBEB11E8B7F80876A9AED6BB74" }, { "b" : "7FF4A9A49000", "path" : "/lib64/libcurl.so.4", "elfType" : 3, "buildId" : "85F01551F029D8E8DFAAE6739F31FFB96D27610E" }, { "b" : "7FF4A9830000", "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "C444AE61E7CBB716FD9C18A0B46A7FE8F4FCF3E5" }, { "b" : "7FF4A93CE000", "path" : "/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "3593FA778645A59EA272DBBB59D318C60940E792" }, { "b" : "7FF4A915C000", "path" : "/lib64/libssl.so.10", "elfType" : 3, "buildId" : "AEF5E6F2240B55F90E9DF76CFBB8B9D9F5286583" }, { "b" : "7FF4A8F58000", "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "357693C8F1F49D93010C4E31529C07CDD2BD3D08" }, { "b" : "7FF4A8D50000", "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "EFDE2029C9A4A20BE5B8D8AE7E6551FF9B5755D2" }, { "b" : "7FF4A8A4E000", "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "5B14BE4D749631673523A61074C10959D50F5455" }, { "b" : "7FF4A8838000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "179F202998E429AA1215907F6D4C5C1BB9C90136" }, { "b" : "7FF4A861C000", "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "96900CB0FF25B26F2BBDF247DE1408242E4773D8" }, { "b" : "7FF4A824F000", "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "DD9A6BA0D81C91F5CA7DBB4A1AC58319CC26DD5A" }, { "b" : "7FF4A9CB2000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "A527FE72908703C5972AE384E78D1850D1881EE7" }, { "b" : "7FF4A801C000", "path" : "/lib64/libidn.so.11", "elfType" : 3, "buildId" : "F4123103FB2318594448C44E47091DD68D1C78C0" }, { "b" : "7FF4A7DF2000", "path" : "/lib64/libssh2.so.1", "elfType" : 3, "buildId" : "964423B6B0ED1FD8FA32E0B8E825A5D120D872F1" }, { "b" : "7FF4A7BA0000", "path" : "/lib64/libssl3.so", "elfType" : 3, "buildId" : "2E28F6A705F2ECEA8460D4716D5D1C24B5DDA5E4" }, { "b" : "7FF4A7979000", "path" : "/lib64/libsmime3.so", "elfType" : 3, "buildId" : "8D0B4010959C321022DF9CE239277A9D7B34A76A" }, { "b" : "7FF4A764C000", "path" : "/lib64/libnss3.so", "elfType" : 3, "buildId" : "F5A64BB37FA3972E545EF459A51310F0AB56FA56" }, { "b" : "7FF4A741C000", "path" : "/lib64/libnssutil3.so", "elfType" : 3, "buildId" : "E0705772325A52C3372FFFB8BDE5F786E2E200D6" }, { "b" : "7FF4A7218000", "path" : "/lib64/libplds4.so", "elfType" : 3, "buildId" : "24835AAB32F52DF9EFACEBA6DE7E9134B302F877" }, { "b" : "7FF4A7013000", "path" : "/lib64/libplc4.so", "elfType" : 3, "buildId" : "C08A53AF54D3BDB4C2E924CCA4D0754F976FCF65" }, { "b" : "7FF4A6DD5000", "path" : "/lib64/libnspr4.so", "elfType" : 3, "buildId" : "775C8FB1A2D708968742CBFAF1436F5AB9AB3CC9" }, { "b" : "7FF4A6B88000", "path" : "/lib64/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "BCC30853830CD911E58700591830DF51ABCBD7BA" }, { "b" : "7FF4A689F000", "path" : "/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "45BAB0BB455BDFA960FDA22E4124CF17B67CC930" }, { "b" : "7FF4A666C000", "path" : "/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "A9B3906192687CC45D483AE3C58C8AF745A6726A" }, { "b" : "7FF4A6468000", "path" : "/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "C77BC26CE4D420861BAEBCC075C418BD9311BB5C" }, { "b" : "7FF4A6259000", "path" : "/lib64/liblber-2.4.so.2", "elfType" : 3, "buildId" : "8832509D0687D79342E29FC6FEC587EA85C04CF4" }, { "b" : "7FF4A6004000", "path" : "/lib64/libldap-2.4.so.2", "elfType" : 3, "buildId" : "FC68D1DA42FB89A81E025368BCA66E5CD1AF82B6" }, { "b" : "7FF4A5DEE000", "path" : "/lib64/libz.so.1", "elfType" : 3, "buildId" : "B9D5F73428BD6AD68C96986B57BEA3B7CEDB9745" }, { "b" : "7FF4A5BDE000", "path" : "/lib64/libkrb5support.so.0", "elfType" : 3, "buildId" : "94B3BCB669126166B77CDCE6092679A6AA2004C8" }, { "b" : "7FF4A59DA000", "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "8CA73C16CFEB9A8B5660015B9223B09F87041CAD" }, { "b" : "7FF4A57BD000", "path" : "/lib64/libsasl2.so.3", "elfType" : 3, "buildId" : "9AF2AD92DADE046C6260DCCF02846BF78ABC658C" }, { "b" : "7FF4A5596000", "path" : "/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "D2DD4DA3FDE1477D25BFFF80F3A25FDB541A8179" }, { "b" : "7FF4A535F000", "path" : "/lib64/libcrypt.so.1", "elfType" : 3, "buildId" : "740CAD898E29E1F3B73A323CCEC4A7C88911647F" }, { "b" : "7FF4A50FD000", "path" : "/lib64/libpcre.so.1", "elfType" : 3, "buildId" : "F5B144F9F5D9BE451C80211B34DB2CE348E039B6" }, { "b" : "7FF4A4EFA000", "path" : "/lib64/libfreebl3.so", "elfType" : 3, "buildId" : "B758881F4B6AF6C28C07A1A57713CBD2144628D4" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x41) [0x56041a4743e1]
 mongod(_ZN5mongo15printStackTraceEv+0x74) [0x56041a474264]
 mongod(_ZN5mongo11DBException13traceIfNeededERKS0_+0x141) [0x56041a469941]
 mongod(_ZN5mongo13error_details16ExceptionForImplILNS_10ErrorCodes5ErrorE11600EINS_15ExceptionForCatILNS_13ErrorCategoryE1EEENS4_ILS5_6EEEEEC1ERKNS_6StatusE+0x45) [0x56041a4252d5]
 mongod(_ZN5mongo13error_details23throwExceptionForStatusERKNS_6StatusE+0x5EED) [0x560418a764b9]
 mongod(_ZN5mongo21uassertedWithLocationERKNS_6StatusEPKcj+0x194) [0x560418a79727]
 mongod(_ZN5mongo16OperationContext32waitForConditionOrInterruptUntilERSt18condition_variableRSt11unique_lockISt5mutexENS_6Date_tE+0x64) [0x56041a3c2a04]
 mongod(_ZN5mongo28CondVarLockGrantNotification4waitEPNS_16OperationContextENS_8DurationISt5ratioILl1ELl1000EEEE+0x91) [0x560419dd4011]
 mongod(_ZN5mongo10LockerImplILb0EE12lockCompleteEPNS_16OperationContextENS_10ResourceIdENS_8LockModeENS_6Date_tEb+0x442) [0x560419ddaad2]
 mongod(_ZN5mongo4Lock10GlobalLock16waitForLockUntilENS_6Date_tE+0x96) [0x560419dc7236]
 mongod(_ZN5mongo4Lock10GlobalLockC1EPNS_16OperationContextENS_8LockModeENS_6Date_tENS0_17InterruptBehaviorE+0x25) [0x560419dc7935]
 mongod(+0xAD7702) [0x560418b33702]
 mongod(+0xAD7E80) [0x560418b33e80]
 mongod(_ZN5mongo13BackgroundJob7jobBodyEv+0x131) [0x560419ec72e1]
 mongod(+0x2528000) [0x56041a584000]
 libpthread.so.0(+0x7DD5) [0x7ff4a8623dd5]
 libc.so.6(clone+0x6D) [0x7ff4a834cead]
-----  END BACKTRACE  -----
2019-03-28T00:59:31.885Z D STORAGE  [initandlisten] admin.system.keys: clearing plan cache - collection info cache reset
2019-03-28T00:59:31.886Z D STORAGE  [initandlisten] admin.system.version: clearing plan cache - collection info cache reset
2019-03-28T00:59:31.886Z D -        [initandlisten] reloading view catalog for database admin
2019-03-28T00:59:31.887Z D STORAGE  [initandlisten] config.system.sessions: clearing plan cache - collection info cache reset
2019-03-28T00:59:31.887Z D STORAGE  [initandlisten] config.transactions: clearing plan cache - collection info cache reset
2019-03-28T00:59:31.887Z D -        [initandlisten] reloading view catalog for database config
2019-03-28T00:59:31.887Z D STORAGE  [initandlisten] local.oplog.rs: clearing plan cache - collection info cache reset
2019-03-28T00:59:31.888Z D STORAGE  [initandlisten] local.replset.election: clearing plan cache - collection info cache reset
2019-03-28T00:59:31.888Z D STORAGE  [initandlisten] local.replset.minvalid: clearing plan cache - collection info cache reset
2019-03-28T00:59:31.889Z D STORAGE  [initandlisten] local.replset.oplogTruncateAfterPoint: clearing plan cache - collection info cache reset
2019-03-28T00:59:31.889Z D STORAGE  [initandlisten] local.startup_log: clearing plan cache - collection info cache reset
2019-03-28T00:59:31.890Z D STORAGE  [initandlisten] local.system.replset: clearing plan cache - collection info cache reset
2019-03-28T00:59:31.890Z D STORAGE  [initandlisten] local.system.rollback.id: clearing plan cache - collection info cache reset
2019-03-28T00:59:31.890Z D -        [initandlisten] reloading view catalog for database local

 

# mongod.conf
 
# where to write logging data.
systemLog:
  verbosity: 0
  quiet: false
  traceAllExceptions: false
  destination: file
  logAppend: true
  path: /var/log/mongodb/mongod.log
  timeStampFormat: iso8601-utc
 
 
# Where and how to store data.
storage:
  dbPath: /mnt/Mongo/var/lib/mongo
  journal:
    enabled: true
 
# how the process runs
processManagement:
  fork: true  # fork and run in background
  pidFilePath: /var/run/mongodb/mongod.pid  # location of pidfile
  timeZoneInfo: /usr/share/zoneinfo
 
 
# network interfaces
net:
  port: 27017
  bindIp: 0.0.0.0
  serviceExecutor: synchronous
  wireObjectCheck: true
 
operationProfiling:
  mode: off
 
 
#replication:
#  replSetName: rs0
#  enableMajorityReadConcern: 
 
 
setParameter:
  diagnosticDataCollectionEnabled: false
  cursorTimeoutMillis: 1800000  #30 Minutes - Sets the duration of time after which idle query cursors are removed from memory.



 Comments   
Comment by Danny Hatcher (Inactive) [ 03/Apr/19 ]

Hello Steven,

I'm glad to hear that you were able to resolve your startup issue.

The database shouldn't take that long to startup with 200GB of data. Could you provide mongod logs showing that long startup time?

Thanks,

Danny

Comment by Steven B. [ 01/Apr/19 ]

Thank you for replying.

I was able to resolve the issue by making the following change.

/lib/systemd/system/mongod.service
+Type=oneshot
+RemainAfterExit=true
# systemctl --no-block start mongod

However startup times are extremely slow. The database is ~200GB in size and I'm questioning if WiredTiger is the most appropriate storage engine. Could you point me towards some resources which will help end my dilemma?

Comment by Danny Hatcher (Inactive) [ 28/Mar/19 ]

Hello Steven,

Something outside the mongod process itself is calling for the process to shutdown:

2019-03-28T00:59:21.587Z I CONTROL  [signalProcessingThread] got signal 15 (Terminated), will terminate after current cmd ends

Could you please clarify the steps that you are performing to get to this state and where you think the problem is?

Thank you,

Danny

Generated at Thu Feb 08 04:54:46 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.