-
Type:
Bug
-
Resolution: Cannot Reproduce
-
Priority:
Major - P3
-
None
-
Affects Version/s: 3.2.0-rc1
-
Component/s: Replication
-
None
-
Fully Compatible
-
ALL
-
Repl C (11/20/15), Repl D (12/11/15), Repl E (01/08/16)
-
None
-
3
-
None
-
None
-
None
-
None
-
None
-
None
Not sure which of these steps might be significant
1. Start with 2-shard CSRS cluster. Each shard is PSA
2. Add third shard
3. Remove one S from second shard
4. Shutdown the removed S
5. Eventually give up and "assume" the node is shutdown and attempt to restart.
What I observed was:
1. At step 4, the removed node did not shutdown.
Here is the process - you can see it has been running for many hours:
[red_red_1_4]$ ps -ef | grep "/tmp/data" | grep 87564 501 87564 1 0 4:43PM ?? 7:00.96 /var/lib/mongodb-mms-automation/mongodb-osx-x86_64-3.2.0-rc1/bin/mongod -f /tmp/data/red_red_1_4/automation-mongod.conf [red_red_1_4]$ cat mongod.lock 87564 [red_red_1_4]$ date Thu Oct 29 01:18:31 UTC 2015
2. At step 5, the error message I got was unusual. I got:
2015-10-29T01:08:39.668+0000 E NETWORK [initandlisten] listen(): bind() failed errno:48 Address already in use for socket: 0.0.0.0:28004
2015-10-29T01:08:39.668+0000 E NETWORK [initandlisten] addr already in use
2015-10-29T01:08:39.668+0000 E STORAGE [initandlisten] Failed to set up sockets during startup.
2015-10-29T01:08:39.668+0000 I CONTROL [initandlisten] dbexit: rc: 48
I was expecting the usual message that indicates that mongod recognized that there was still a lock file from a previously running process.
Logs attached:
- mongodb.1.log - steps 1 through 4, including the failed shutdown
- mongodb.2.log - step 5 - the attempt to restart even though the old process is still running