[SERVER-33362] Fatal Assertion 40652 when starting a replicaset on a fresh install with Kubernetes Created: 16/Feb/18  Updated: 02/Apr/18  Resolved: 09/Mar/18

Status: Closed
Project: Core Server
Component/s: Admin
Affects Version/s: 3.6.2
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: vianney rancurel Assignee: Dmitry Agranat
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
Operating System: ALL
Steps To Reproduce:

helm install --set image.tag=3.6.2 --set configmap.net.bindIp=0.0.0.0 stable/mongodb-replicaset

Participants:

 Description   

When running a MongoDB 3.6.2 replica set on a Kubernetes cluster, we regularly (though not always) observe failure to start replicas, even on empty data directories (fresh install).

2018-02-16T00:28:14.550+0000 I CONTROL  [initandlisten] MongoDB starting : pid=1 port=27017 dbpath=/data/db 64-bit host=zenko-mongodb-replicaset-1
2018-02-16T00:28:14.550+0000 I CONTROL  [initandlisten] db version v3.6.2
2018-02-16T00:28:14.550+0000 I CONTROL  [initandlisten] git version: 489d177dbd0f0420a8ca04d39fd78d0a2c539420
2018-02-16T00:28:14.551+0000 I CONTROL  [initandlisten] OpenSSL version: OpenSSL 1.0.1t  3 May 2016
2018-02-16T00:28:14.551+0000 I CONTROL  [initandlisten] allocator: tcmalloc
2018-02-16T00:28:14.551+0000 I CONTROL  [initandlisten] modules: none
2018-02-16T00:28:14.551+0000 I CONTROL  [initandlisten] build environment:
2018-02-16T00:28:14.551+0000 I CONTROL  [initandlisten]     distmod: debian81
2018-02-16T00:28:14.551+0000 I CONTROL  [initandlisten]     distarch: x86_64
2018-02-16T00:28:14.551+0000 I CONTROL  [initandlisten]     target_arch: x86_64
2018-02-16T00:28:14.551+0000 I CONTROL  [initandlisten] options: { config: "/config/mongod.conf", net: { bindIp: "0.0.0.0", port: 27017 }, replication: { replSetName: "rs0" }, storage: { dbPath: "/data/db" } }
2018-02-16T00:28:14.551+0000 W -        [initandlisten] Detected unclean shutdown - /data/db/mongod.lock is not empty.
2018-02-16T00:28:14.551+0000 I -        [initandlisten] Detected data files in /data/db created by the 'wiredTiger' storage engine, so setting the active storage engine to 'wiredTiger'.
2018-02-16T00:28:14.551+0000 W STORAGE  [initandlisten] Recovering data from the last clean checkpoint.
2018-02-16T00:28:14.551+0000 I STORAGE  [initandlisten] 
2018-02-16T00:28:14.551+0000 I STORAGE  [initandlisten] ** WARNING: Using the XFS filesystem is strongly recommended with the WiredTiger storage engine
2018-02-16T00:28:14.551+0000 I STORAGE  [initandlisten] **          See http://dochub.mongodb.org/core/prodnotes-filesystem
2018-02-16T00:28:14.552+0000 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=7511M,session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),statistics_log=(wait=0),verbose=(recovery_progress),
2018-02-16T00:28:14.800+0000 I STORAGE  [initandlisten] WiredTiger message [1518740894:800117][1:0x7fa7497aaa00], txn-recover: Main recovery loop: starting at 14/256
2018-02-16T00:28:14.800+0000 I STORAGE  [initandlisten] WiredTiger message [1518740894:800904][1:0x7fa7497aaa00], txn-recover: Recovering log 14 through 15
2018-02-16T00:28:14.868+0000 I STORAGE  [initandlisten] WiredTiger message [1518740894:868477][1:0x7fa7497aaa00], txn-recover: Recovering log 15 through 15
2018-02-16T00:28:14.952+0000 I STORAGE  [initandlisten] Starting WiredTigerRecordStoreThread local.oplog.rs
2018-02-16T00:28:14.952+0000 I STORAGE  [initandlisten] The size storer reports that the oplog contains 0 records totaling to 0 bytes
2018-02-16T00:28:14.952+0000 I STORAGE  [initandlisten] Scanning the oplog to determine where to place markers for truncation
2018-02-16T00:28:14.955+0000 I CONTROL  [initandlisten] 
2018-02-16T00:28:14.955+0000 I CONTROL  [initandlisten] ** WARNING: Access control is not enabled for the database.
2018-02-16T00:28:14.955+0000 I CONTROL  [initandlisten] **          Read and write access to data and configuration is unrestricted.
2018-02-16T00:28:14.955+0000 I CONTROL  [initandlisten] ** WARNING: You are running this process as the root user, which is not recommended.
2018-02-16T00:28:14.955+0000 I CONTROL  [initandlisten] 
2018-02-16T00:28:14.955+0000 I CONTROL  [initandlisten] 
2018-02-16T00:28:14.955+0000 I CONTROL  [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/enabled is 'always'.
2018-02-16T00:28:14.955+0000 I CONTROL  [initandlisten] **        We suggest setting it to 'never'
2018-02-16T00:28:14.955+0000 I CONTROL  [initandlisten] 
2018-02-16T00:28:14.955+0000 I CONTROL  [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/defrag is 'always'.
2018-02-16T00:28:14.955+0000 I CONTROL  [initandlisten] **        We suggest setting it to 'never'
2018-02-16T00:28:14.955+0000 I CONTROL  [initandlisten] 
2018-02-16T00:28:14.964+0000 F STORAGE  [initandlisten] Unable to start up mongod due to missing featureCompatibilityVersion document.
2018-02-16T00:28:14.964+0000 F STORAGE  [initandlisten] Please run with --repair to restore the document.
2018-02-16T00:28:14.964+0000 F -        [initandlisten] Fatal Assertion 40652 at src/mongo/db/db.cpp 660
2018-02-16T00:28:14.964+0000 F -        [initandlisten] 
 
***aborting after fassert() failure

We did not observe this when using MongoDB 3.4



 Comments   
Comment by Dmitry Agranat [ 28/Feb/18 ]

Hi vrancurel,

As of https://jira.mongodb.org/browse/SERVER-29452, we added this handling to make sure that mongod always starts up with a featureCompatibilityVersion document.

A user can run into this scenario if this document is accidentally deleted or if they try to start up mongod 3.6 on mongod 3.2 data files, since the featureCompatibilityVersion was introduced in mongod 3.4.

Thanks,
Dima

Comment by vianney rancurel [ 17/Feb/18 ]

We are sure because it even happens where the VMs are newly provisioned, so with an with an empty /mnt/disks, and the local volumes (/mnt/disks/vol*) used for each replica set (5 in total) are created during the 'helm install', and then bound to mongodb-replica-sets and mounted to their /data/db individually.

Comment by Ramon Fernandez Marina [ 16/Feb/18 ]

The following lines indicate that the dbpath was not empty when this mongod started:

2018-02-16T00:28:14.551+0000 W -        [initandlisten] Detected unclean shutdown - /data/db/mongod.lock is not empty.
2018-02-16T00:28:14.551+0000 I -        [initandlisten] Detected data files in /data/db created by the 'wiredTiger' storage engine, so setting the active storage engine to 'wiredTiger'.

Can you please retrace your steps and make sure that /data/db is empty if you're attempting to do a fresh install?

Comment by vianney rancurel [ 16/Feb/18 ]

Note that we use the official Mongodb-replicaset chart for Kubernetes: https://github.com/kubernetes/charts/tree/master/stable/mongodb-replicaset

Generated at Thu Feb 08 04:33:09 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.