[SERVER-19333] fassert when starting MongoD with replication enabled when the port is already in use Created: 08/Jul/15  Updated: 07/Oct/15  Resolved: 22/Sep/15

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 3.1.5
Fix Version/s: 3.1.9

Type: Bug Priority: Major - P3
Reporter: Adam Midvidy Assignee: Matt Dannenberg
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

Start 1st MongoD on 27017:

➜  mongo git:(master) ✗ ./mongod --dbpath=/tmp/m0 --replSet=rs0
2015-07-08T14:27:47.973-0400 I -        [initandlisten] Detected data files in /tmp/m0 created by the 'wiredTiger' storage engine, so setting the active storage engine to 'wiredTiger'.
2015-07-08T14:27:47.974-0400 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=18G,session_max=20000,eviction=(threads_max=4),statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),
2015-07-08T14:27:47.987-0400 I CONTROL  [initandlisten] MongoDB starting : pid=5357 port=27017 dbpath=/tmp/m0 64-bit host=vader
2015-07-08T14:27:47.987-0400 I CONTROL  [initandlisten] DEBUG build (which is slower)
2015-07-08T14:27:47.987-0400 I CONTROL  [initandlisten] 
2015-07-08T14:27:47.987-0400 I CONTROL  [initandlisten] ** NOTE: This is a development version (3.1.6-pre-) of MongoDB.
2015-07-08T14:27:47.987-0400 I CONTROL  [initandlisten] **       Not recommended for production.
2015-07-08T14:27:47.987-0400 I CONTROL  [initandlisten] 
2015-07-08T14:27:47.988-0400 I CONTROL  [initandlisten] 
2015-07-08T14:27:47.988-0400 I CONTROL  [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/defrag is 'always'.
2015-07-08T14:27:47.988-0400 I CONTROL  [initandlisten] **        We suggest setting it to 'never'
2015-07-08T14:27:47.988-0400 I CONTROL  [initandlisten] 
2015-07-08T14:27:47.988-0400 I CONTROL  [initandlisten] db version v3.1.6-pre-
2015-07-08T14:27:47.988-0400 I CONTROL  [initandlisten] git version: b057c6a2a2b65de10ba1720ec2c9ccd4a6ba40dc
2015-07-08T14:27:47.988-0400 I CONTROL  [initandlisten] OpenSSL version: OpenSSL 1.0.1f 6 Jan 2014
2015-07-08T14:27:47.988-0400 I CONTROL  [initandlisten] allocator: system
2015-07-08T14:27:47.988-0400 I CONTROL  [initandlisten] options: { replication: { replSet: "rs0" }, storage: { dbPath: "/tmp/m0" } }
2015-07-08T14:27:47.989-0400 I REPL     [initandlisten] Did not find local voted for document at startup;  NoMatchingDocument Did not find replica set lastVote document in local.replset.election
2015-07-08T14:27:47.989-0400 I REPL     [initandlisten] Did not find local replica set configuration document at startup;  NoMatchingDocument Did not find replica set configuration document in local.system.replset
2015-07-08T14:27:47.990-0400 I NETWORK  [initandlisten] waiting for connections on port 27017

Start second MongoD and observe fassert:

 
mongo git:(master) ✗ ./mongod --dbpath=/tmp/m1 --replSet=rs1
2015-07-08T14:27:52.256-0400 E NETWORK  [initandlisten] listen(): bind() failed errno:98 Address already in use for socket: 0.0.0.0:27017
2015-07-08T14:27:52.256-0400 E NETWORK  [initandlisten]   addr already in use
2015-07-08T14:27:52.256-0400 I -        [initandlisten] Detected data files in /tmp/m1 created by the 'wiredTiger' storage engine, so setting the active storage engine to 'wiredTiger'.
2015-07-08T14:27:52.256-0400 W -        [initandlisten] Detected unclean shutdown - /tmp/m1/mongod.lock is not empty.
2015-07-08T14:27:52.256-0400 W STORAGE  [initandlisten] Recovering data from the last clean checkpoint.
2015-07-08T14:27:52.257-0400 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=18G,session_max=20000,eviction=(threads_max=4),statistics=(fast),log=(enabled=true,archive=true,path=journal,comp
ressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),
2015-07-08T14:27:52.276-0400 I STORAGE  [initandlisten] Starting WiredTigerRecordStoreThread local.oplog.rs
2015-07-08T14:27:52.280-0400 I CONTROL  [initandlisten] MongoDB starting : pid=5403 port=27017 dbpath=/tmp/m1 64-bit host=vader
2015-07-08T14:27:52.280-0400 I CONTROL  [initandlisten] DEBUG build (which is slower)
2015-07-08T14:27:52.280-0400 I CONTROL  [initandlisten] 
2015-07-08T14:27:52.280-0400 I CONTROL  [initandlisten] ** NOTE: This is a development version (3.1.6-pre-) of MongoDB.
2015-07-08T14:27:52.280-0400 I CONTROL  [initandlisten] **       Not recommended for production.
2015-07-08T14:27:52.280-0400 I CONTROL  [initandlisten] 
2015-07-08T14:27:52.280-0400 I CONTROL  [initandlisten] 
2015-07-08T14:27:52.280-0400 I CONTROL  [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/defrag is 'always'.
2015-07-08T14:27:52.280-0400 I CONTROL  [initandlisten] **        We suggest setting it to 'never'
2015-07-08T14:27:52.280-0400 I CONTROL  [initandlisten] 
2015-07-08T14:27:52.280-0400 I CONTROL  [initandlisten] db version v3.1.6-pre-
2015-07-08T14:27:52.280-0400 I CONTROL  [initandlisten] git version: b057c6a2a2b65de10ba1720ec2c9ccd4a6ba40dc
2015-07-08T14:27:52.280-0400 I CONTROL  [initandlisten] OpenSSL version: OpenSSL 1.0.1f 6 Jan 2014
2015-07-08T14:27:52.280-0400 I CONTROL  [initandlisten] allocator: system
2015-07-08T14:27:52.280-0400 I CONTROL  [initandlisten] options: { replication: { replSet: "rs1" }, storage: { dbPath: "/tmp/m1" } }
2015-07-08T14:27:52.283-0400 I REPL     [initandlisten] Did not find local voted for document at startup;  NoMatchingDocument Did not find replica set lastVote document in local.replset.election
2015-07-08T14:27:52.284-0400 I -        [initandlisten] Fatal Assertion 18823
2015-07-08T14:27:52.302-0400 I CONTROL  [initandlisten] 
 0x21c6dd0 0x216768b 0x2150e69 0x1833c27 0x1dc26e5 0x1b4efcd 0x17c9375 0x17c8360 0x7fdca9af6ec5 0x17c49c9
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"1DC6DD0"},{"b":"400000","o":"1D6768B"},{"b":"400000","o":"1D50E69"},{"b":"400000","o":"1433C27"},{"b":"400000","o":"19C26E5"},{"b":"400000","o":"174EFCD"},{"b":"400000","o":"13C9
375"},{"b":"400000","o":"13C8360"},{"b":"7FDCA9AD5000","o":"21EC5"},{"b":"400000","o":"13C49C9"}],"processInfo":{ "mongodbVersion" : "3.1.6-pre-", "gitVersion" : "b057c6a2a2b65de10ba1720ec2c9ccd4a6ba40dc", "comp
iledModules" : [ "subscription" ], "uname" : { "sysname" : "Linux", "release" : "3.13.0-24-generic", "version" : "#47-Ubuntu SMP Fri May 2 23:30:00 UTC 2014", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2,
 "b" : "400000", "buildId" : "6DEB6ECFB8B5865794518B4E155564C232EE109C" }, { "b" : "7FFFDF8E0000", "elfType" : 3, "buildId" : "6755FAD2CADACDF1667E5B57FF1EDFC28DD1C976" }, { "b" : "7FDCAC3A6000", "path" : "/usr/
lib/x86_64-linux-gnu/libsasl2.so.2", "elfType" : 3, "buildId" : "666B276BD134B0E9579B67D4EE333F2D0FB813CD" }, { "b" : "7FDCAC160000", "path" : "/usr/lib/x86_64-linux-gnu/libgssapi_krb5.so.2", "elfType" : 3, "bui
ldId" : "EEF6871CB80C0344DF907DD8B0D8C90A0B57D4F0" }, { "b" : "7FDCABCF3000", "path" : "/usr/lib/x86_64-linux-gnu/libnetsnmpmibs.so.30", "elfType" : 3, "buildId" : "B86A74A738862C45C6E7A4C5E694DD42155710D4" }, {
 "b" : "7FDCABAE4000", "path" : "/usr/lib/x86_64-linux-gnu/libsensors.so.4", "elfType" : 3, "buildId" : "859FDBFDD82F0EFDEB44A433D9D8020A232A35E2" }, { "b" : "7FDCAB8E0000", "path" : "/lib/x86_64-linux-gnu/libdl
.so.2", "elfType" : 3, "buildId" : "C1AE4CB7195D337A77A3C689051DABAA3980CA0C" }, { "b" : "7FDCAB677000", "path" : "/usr/lib/x86_64-linux-gnu/libnetsnmpagent.so.30", "elfType" : 3, "buildId" : "96C16FDBBA28C66356
57AFDBAF0F5A1090072474" }, { "b" : "7FDCAB46D000", "path" : "/lib/x86_64-linux-gnu/libwrap.so.0", "elfType" : 3, "buildId" : "54FCBC5B0F994A13A9B0EAD46F23E7DA7F7FE75B" }, { "b" : "7FDCAB193000", "path" : "/usr/l
ib/x86_64-linux-gnu/libnetsnmp.so.30", "elfType" : 3, "buildId" : "61AE85EF50A072D671D55B4776383F8365A3FAA7" }, { "b" : "7FDCAADB8000", "path" : "/lib/x86_64-linux-gnu/libcrypto.so.1.0.0", "elfType" : 3, "buildI
d" : "F000D29917E9B6E94A35A8F02E5C62846E5916BC" }, { "b" : "7FDCAAAB2000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "1D76B71E905CB867B27CEF230FCB20F01A3178F5" }, { "b" : "7FDCAA8530
00", "path" : "/lib/x86_64-linux-gnu/libssl.so.1.0.0", "elfType" : 3, "buildId" : "A20EFFEC993A8441FA17F2079F923CBD04079E19" }, { "b" : "7FDCAA64B000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3,
 "buildId" : "92FCF41EFE012D6186E31A59AD05BDBB487769AB" }, { "b" : "7FDCAA2CF000", "path" : "/usr/lib/x86_64-linux-gnu/libstdc++.so.6", "elfType" : 3, "buildId" : "8F14AE1EAC77FB39AC09265C72F440E06C368CAE" }, { 
"b" : "7FDCAA0B8000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "24E835F580B65BEE8AAF85A223EEA48604E978DD" }, { "b" : "7FDCA9E9A000", "path" : "/lib/x86_64-linux-gnu/libpthread.s
o.0", "elfType" : 3, "buildId" : "9318E8AF0BFBE444731BB0461202EF57F7C39542" }, { "b" : "7FDCA9AD5000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "30C94DC66A1FE95180C3D68D2B89E576D5AE
213C" }, { "b" : "7FDCAC5C1000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "9F00581AB3C73E3AEA35995A0C50D24D59A01D47" }, { "b" : "7FDCA980A000", "path" : "/usr/lib/x86_64-linux-gnu/libkr
b5.so.3", "elfType" : 3, "buildId" : "00EB4BEB543C549A411AF26E8C2B4BBA806F57AE" }, { "b" : "7FDCA95DB000", "path" : "/usr/lib/x86_64-linux-gnu/libk5crypto.so.3", "elfType" : 3, "buildId" : "3057CF4B96D55B1CD2C36
81B2A1F75279F66F225" }, { "b" : "7FDCA93D7000", "path" : "/lib/x86_64-linux-gnu/libcom_err.so.2", "elfType" : 3, "buildId" : "04BF7D9BE17AC2A5F7121B246488932718870207" }, { "b" : "7FDCA91CC000", "path" : "/usr/l
ib/x86_64-linux-gnu/libkrb5support.so.0", "elfType" : 3, "buildId" : "86E28C0FCF8D7DAEB9AE77A7C7930F4C2B78A64E" }, { "b" : "7FDCA8E43000", "path" : "/usr/lib/libperl.so.5.18", "elfType" : 3, "buildId" : "FA1AD13
C56D51B69C9558106D8DAF14730B4C14F" }, { "b" : "7FDCA8C29000", "path" : "/lib/x86_64-linux-gnu/libnsl.so.1", "elfType" : 3, "buildId" : "497315006FCA1547A16E644FB7FEBA8BD2FAB054" }, { "b" : "7FDCA8A25000", "path"
 : "/lib/x86_64-linux-gnu/libkeyutils.so.1", "elfType" : 3, "buildId" : "0F03635F97B93D3DACD84F0ED363C56BD266044F" }, { "b" : "7FDCA880A000", "path" : "/lib/x86_64-linux-gnu/libresolv.so.2", "elfType" : 3, "buil
dId" : "616683BCFD8626F176EDA99B6A5D4D2C57996590" }, { "b" : "7FDCA85D1000", "path" : "/lib/x86_64-linux-gnu/libcrypt.so.1", "elfType" : 3, "buildId" : "1B0F2710E989E9A581C257DFFDC90085D0E1348A" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x27) [0x21c6dd0]
 mongod(_ZN5mongo10logContextEPKc+0x74) [0x216768b]
 mongod(_ZN5mongo13fassertFailedEi+0x66) [0x2150e69]
 mongod(_ZN5mongo7fassertEib+0x29) [0x1833c27]
 mongod(_ZN5mongo4repl26ReplicationCoordinatorImpl8shutdownEv+0xCB) [0x1dc26e5]
 mongod(_ZN5mongo11exitCleanlyENS_8ExitCodeE+0xCC) [0x1b4efcd]
 mongod(+0x13C9375) [0x17c9375]
 mongod(main+0x28) [0x17c8360]
 libc.so.6(__libc_start_main+0xF5) [0x7fdca9af6ec5]
 mongod(+0x13C49C9) [0x17c49c9]
-----  END BACKTRACE  -----
2015-07-08T14:27:52.302-0400 I -        [initandlisten] 
 
***aborting after fassert() failure

Sprint: RPL A (10/09/15)
Participants:

 Description   

MongoD from master currently appears to fassert when the listening port is already in use (when replication is enabled)



 Comments   
Comment by Githook User [ 22/Sep/15 ]

Author:

{u'username': u'dannenberg', u'name': u'matt dannenberg', u'email': u'matt.dannenberg@10gen.com'}

Message: SERVER-19333 abandon startup as soon as port binding fails
Branch: master
https://github.com/mongodb/mongo/commit/4119b212084ada7f544526f07cbf9352e7d5c910

Comment by Eric Milkie [ 21/Sep/15 ]

Right, there are actually quite a few functions named on the theme of "init and listen" in our codebase.

Comment by Matt Dannenberg [ 21/Sep/15 ]

_initAndListen calls setupSockets() which binds the port and MessageServer::run() which begins listening. These two portions of functionality need to be separated so we can have the former happen prior to starting replication and have the latter happen after the ReplicationCoordinator has been instantiated.

Comment by Eric Milkie [ 21/Sep/15 ]

bind() is already being called in setupSockets(), not in initAndListen()
Do we need to rearrange the calls to setupSockets() and startReplication()?

Comment by Matt Dannenberg [ 21/Sep/15 ]

The solution appears to be split initAndListen so that port binding happens separately from listening. This will allow us to bind the port prior to starting replication, but not begin listening until after replication starts, thus circumventing the problem.

Comment by Andy Schwerin [ 12/Jul/15 ]

It will take a little research to see if it's safe to transition from starting up to shutting down directly. That said, what is the reason why we must enter "starting up" before the listen port is bound? Maybe it's no longer applicable.

Comment by Matt Dannenberg [ 10/Jul/15 ]

When the commit was made, the server would shut down cleanly if the port was already bound:

2015-07-10T09:46:41.709-0400 I -        [initandlisten] MongoDB starting : pid=5510 port=27017 dbpath=/data/rs0/ 64-bit host=dannenstation
2015-07-10T09:46:41.709-0400 I -        [initandlisten] 
2015-07-10T09:46:41.709-0400 I -        [initandlisten] ** NOTE: This is a development version (2.7.7-pre-) of MongoDB.
2015-07-10T09:46:41.709-0400 I -        [initandlisten] **       Not recommended for production.
2015-07-10T09:46:41.709-0400 I -        [initandlisten] 
2015-07-10T09:46:41.710-0400 I -        [initandlisten] 
2015-07-10T09:46:41.710-0400 I -        [initandlisten] ** WARNING: You are running on a NUMA machine.
2015-07-10T09:46:41.710-0400 I -        [initandlisten] **          We suggest launching mongod like this to avoid performance problems:
2015-07-10T09:46:41.710-0400 I -        [initandlisten] **              numactl --interleave=all mongod [other options]
2015-07-10T09:46:41.710-0400 I -        [initandlisten] 
2015-07-10T09:46:41.710-0400 I -        [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/enabled is 'always'.
2015-07-10T09:46:41.710-0400 I -        [initandlisten] **        We suggest setting it to 'never'
2015-07-10T09:46:41.710-0400 I -        [initandlisten] 
2015-07-10T09:46:41.710-0400 I -        [initandlisten] db version v2.7.7-pre-
2015-07-10T09:46:41.710-0400 I -        [initandlisten] git version: 5da5d98e114e4a024be242b0baea4115bea9e56c
2015-07-10T09:46:41.710-0400 I -        [initandlisten] build info: Linux dannenstation 3.19.3-3-ARCH #1 SMP PREEMPT Wed Apr 8 14:10:00 CEST 2015 x86_64 BOOST_LIB_VERSION=1_49
2015-07-10T09:46:41.710-0400 I -        [initandlisten] allocator: tcmalloc
2015-07-10T09:46:41.710-0400 I -        [initandlisten] options: { net: { port: 27017 }, replication: { replSet: "rs0" }, storage: { dbPath: "/data/rs0/" } }
2015-07-10T09:46:41.714-0400 I JOURNAL  [initandlisten] journal dir=/data/rs0/journal
2015-07-10T09:46:41.714-0400 I JOURNAL  [initandlisten] recover : no journal files present, no recovery needed
2015-07-10T09:46:41.738-0400 E NETWORK  [initandlisten] listen(): bind() failed errno:98 Address already in use for socket: 0.0.0.0:27017
2015-07-10T09:46:41.738-0400 E NETWORK  [initandlisten]   addr already in use
2015-07-10T09:46:41.738-0400 I REPLSETS [initandlisten] Did not find local replica set configuration document at startup;  NoMatchingDocument Did not find replica set configuration document in local.new.replset
2015-07-10T09:46:41.739-0400 I COMMANDS [rsStart] assertion 11600 interrupted at shutdown ns:local.system.replset query:{}
2015-07-10T09:46:41.739-0400 I COMMANDS [initandlisten] now exiting
2015-07-10T09:46:41.739-0400 I NETWORK  [initandlisten] shutdown: going to close listening sockets...
2015-07-10T09:46:41.739-0400 I NETWORK  [initandlisten] shutdown: going to flush diaglog...
2015-07-10T09:46:41.739-0400 I COMMANDS [rsStart]  ntoskip:0 ntoreturn:1
2015-07-10T09:46:41.739-0400 I NETWORK  [initandlisten] shutdown: going to close sockets...
2015-07-10T09:46:41.739-0400 I REPLSETS [rsStart] replSet exception loading our local replset configuration object : 13106 nextSafe(): { $err: "interrupted at shutdown", code: 11600 }
2015-07-10T09:46:41.739-0400 I STORAGE  [initandlisten] shutdown: waiting for fs preallocator...
2015-07-10T09:46:41.739-0400 I STORAGE  [initandlisten] shutdown: final commit...
2015-07-10T09:46:41.739-0400 I REPLSETS [rsStart] replSet can't get local.system.replset config from self or any seed (EMPTYCONFIG)
2015-07-10T09:46:41.739-0400 I REPLSETS [rsStart] replSet info you may need to run replSetInitiate -- rs.initiate() in the shell -- if that is not already done
2015-07-10T09:46:41.744-0400 I STORAGE  [initandlisten] shutdown: closing all files...
2015-07-10T09:46:41.745-0400 I STORAGE  [initandlisten] closeAllFiles() finished
2015-07-10T09:46:41.745-0400 I JOURNAL  [initandlisten] journalCleanup...
2015-07-10T09:46:41.745-0400 I JOURNAL  [initandlisten] removeJournalFiles
2015-07-10T09:46:41.753-0400 I STORAGE  [initandlisten] shutdown: removing fs lock...
2015-07-10T09:46:41.753-0400 I COMMANDS [initandlisten] dbexit:  rc: 48

Comment by Eric Milkie [ 10/Jul/15 ]

I think the idea was that it shouldn't be possible to shut down in that config state – it should always transition to another state before allowing shutdown, as the startingup state should be atomic w.r.t. that.

Comment by Matt Dannenberg [ 10/Jul/15 ]

schwerin and spencer does either of you remember why we added this fassert? I saw that it was part of an overhaul of config states. But, I can't figure out the reasoning behind it.

Comment by Matt Dannenberg [ 10/Jul/15 ]

This has been the behavior since 3.0.0

Comment by Andy Schwerin [ 10/Jul/15 ]

Does this also affect 3.0? Looks like it might.

Comment by Charlie Swanson [ 08/Jul/15 ]

I think this may be replication specific. We see the address already in use a lot (see BF-917), and never see an fassert with a backtrace. Usually something like this happens:
https://logkeeper.mongodb.org/build/55824629ead33c485307fd93/test/55824639fa59d0759c080039

Generated at Thu Feb 08 03:50:36 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.