-
Type: Bug
-
Resolution: Done
-
Priority: Major - P3
-
Affects Version/s: 3.0.0, 3.2.0-rc0
-
Component/s: Replication
-
None
-
Fully Compatible
-
ALL
-
- Start a 3 node replica set with one member delayed
- Issue a write to trigger the delayed member to go into a sleep
- Try to issue db.shutdownServer() (or send a signal) on the delayed node
-
Repl 2016-08-29
When a delayed secondary sees an oplog entry it doesn't need to apply yet, it goes into a sleep.
2015-10-26T12:00:27.413+1100 W REPL [rsSync] replSet slavedelay causing a long sleep of 269 seconds
If the member is attempted to be shut down during this sleep, it gets stuck.
This happens in 3.0.7 and 3.2.0-rc0 (note the timestamps)
2015-10-26T12:00:27.413+1100 W REPL [rsSync] replSet slavedelay causing a long sleep of 269 seconds 2015-10-26T12:00:27.930+1100 I NETWORK [initandlisten] connection accepted from 10.8.1.231:58065 #5 (1 connection now open) 2015-10-26T12:00:27.932+1100 I REPL [ReplicationExecutor] replSetElect voting yea for Joannas-MacBook-Pro.local:25217 (0) 2015-10-26T12:00:28.050+1100 I NETWORK [initandlisten] connection accepted from 10.8.1.231:58066 #6 (2 connections now open) 2015-10-26T12:00:28.209+1100 I REPL [ReplicationExecutor] Member Joannas-MacBook-Pro.local:25217 is now in state PRIMARY 2015-10-26T12:00:57.984+1100 I NETWORK [conn5] end connection 10.8.1.231:58065 (1 connection now open) 2015-10-26T12:00:57.985+1100 I NETWORK [initandlisten] connection accepted from 10.8.1.231:58111 #7 (2 connections now open) 2015-10-26T12:00:58.128+1100 I NETWORK [conn6] end connection 10.8.1.231:58066 (1 connection now open) 2015-10-26T12:00:58.129+1100 I NETWORK [initandlisten] connection accepted from 10.8.1.231:58112 #8 (2 connections now open) 2015-10-26T12:01:07.048+1100 I NETWORK [initandlisten] connection accepted from 127.0.0.1:58129 #9 (3 connections now open) 2015-10-26T12:01:07.065+1100 I COMMAND [conn9] terminating, shutdown command received 2015-10-26T12:01:07.066+1100 I REPL [conn9] Stopping replication applier threads 2015-10-26T12:01:08.019+1100 I STORAGE [conn7] got request after shutdown() 2015-10-26T12:01:08.152+1100 I STORAGE [conn8] got request after shutdown() 2015-10-26T12:04:57.563+1100 I CONTROL [conn9] now exiting 2015-10-26T12:04:57.564+1100 I NETWORK [conn9] shutdown: going to close listening sockets... 2015-10-26T12:04:57.564+1100 I NETWORK [conn9] closing listening socket: 7 2015-10-26T12:04:57.565+1100 I NETWORK [conn9] closing listening socket: 8 2015-10-26T12:04:57.565+1100 I NETWORK [conn9] removing socket file: /tmp/mongodb-25219.sock 2015-10-26T12:04:57.569+1100 I NETWORK [conn9] shutdown: going to flush diaglog... 2015-10-26T12:04:57.569+1100 I NETWORK [conn9] shutdown: going to close sockets... 2015-10-26T12:04:57.569+1100 I STORAGE [conn9] shutdown: waiting for fs preallocator... 2015-10-26T12:04:57.570+1100 I STORAGE [conn9] shutdown: final commit... 2015-10-26T12:04:57.585+1100 I JOURNAL [conn9] journalCleanup... 2015-10-26T12:04:57.585+1100 I JOURNAL [conn9] removeJournalFiles 2015-10-26T12:04:57.586+1100 I JOURNAL [conn9] Terminating durability thread ... 2015-10-26T12:04:57.684+1100 I JOURNAL [journal writer] Journal writer thread stopped 2015-10-26T12:04:57.684+1100 I JOURNAL [durability] Durability thread stopped 2015-10-26T12:04:57.685+1100 I STORAGE [conn9] shutdown: closing all files... 2015-10-26T12:04:57.687+1100 I STORAGE [conn9] closeAllFiles() finished 2015-10-26T12:04:57.687+1100 I STORAGE [conn9] shutdown: removing fs lock... 2015-10-26T12:04:57.688+1100 I CONTROL [conn9] dbexit: rc: 0
This is a regression - running the same test in 2.6 allows the mongod to shut down
2015-10-26T11:47:13.262+1100 [rsSync] replSet slavedelay sleep long time: 3600 2015-10-26T11:47:23.345+1100 [conn125] end connection 10.8.1.231:56443 (1 connection now open) 2015-10-26T11:47:23.345+1100 [initandlisten] connection accepted from 10.8.1.231:56472 #127 (2 connections now open) 2015-10-26T11:47:24.456+1100 [conn126] end connection 10.8.1.231:56445 (1 connection now open) 2015-10-26T11:47:24.457+1100 [initandlisten] connection accepted from 10.8.1.231:56475 #128 (2 connections now open) 2015-10-26T11:47:25.345+1100 [initandlisten] connection accepted from 127.0.0.1:56476 #129 (3 connections now open) 2015-10-26T11:47:50.514+1100 [conn129] assertion 13435 not master and slaveOk=false ns:test.system.namespaces query:{} 2015-10-26T11:47:53.417+1100 [conn127] end connection 10.8.1.231:56472 (2 connections now open) 2015-10-26T11:47:53.417+1100 [initandlisten] connection accepted from 10.8.1.231:56523 #130 (3 connections now open) 2015-10-26T11:47:54.518+1100 [conn128] end connection 10.8.1.231:56475 (2 connections now open) 2015-10-26T11:47:54.519+1100 [initandlisten] connection accepted from 10.8.1.231:56524 #131 (3 connections now open) 2015-10-26T11:47:55.487+1100 [conn129] assertion 13435 not master and slaveOk=false ns:admin.system.namespaces query:{} 2015-10-26T11:47:56.207+1100 [conn129] terminating, shutdown command received 2015-10-26T11:47:56.207+1100 [conn129] dbexit: shutdown called 2015-10-26T11:47:56.208+1100 [conn129] shutdown: going to close listening sockets... 2015-10-26T11:47:56.208+1100 [conn129] closing listening socket: 9 2015-10-26T11:47:56.208+1100 [conn129] closing listening socket: 10 2015-10-26T11:47:56.209+1100 [conn129] removing socket file: /tmp/mongodb-25219.sock 2015-10-26T11:47:56.211+1100 [conn129] shutdown: going to flush diaglog... 2015-10-26T11:47:56.211+1100 [conn129] shutdown: going to close sockets... 2015-10-26T11:47:56.212+1100 [conn129] shutdown: waiting for fs preallocator... 2015-10-26T11:47:56.212+1100 [rsBackgroundSync] Socket recv() errno:9 Bad file descriptor 10.8.1.231:25217 2015-10-26T11:47:56.212+1100 [conn129] shutdown: lock for final commit... 2015-10-26T11:47:56.212+1100 [conn131] end connection 10.8.1.231:56524 (2 connections now open) 2015-10-26T11:47:56.212+1100 [rsBackgroundSync] SocketException: remote: 10.8.1.231:25217 error: 9001 socket exception [RECV_ERROR] server [10.8.1.231:25217] 2015-10-26T11:47:56.212+1100 [conn130] end connection 10.8.1.231:56523 (2 connections now open) 2015-10-26T11:47:56.213+1100 [conn129] shutdown: final commit... 2015-10-26T11:47:56.214+1100 [rsBackgroundSync] replSet sync source problem: 10278 dbclient error communicating with server: Joannas-MacBook-Pro.local:25217 2015-10-26T11:47:56.223+1100 [conn129] shutdown: closing all files... 2015-10-26T11:47:56.223+1100 [conn129] closeAllFiles() finished 2015-10-26T11:47:56.224+1100 [conn129] journalCleanup... 2015-10-26T11:47:56.224+1100 [conn129] removeJournalFiles 2015-10-26T11:47:56.225+1100 [conn129] shutdown: removing fs lock... 2015-10-26T11:47:56.225+1100 [conn129] dbexit: really exiting now
- related to
-
SERVER-25546 Handle slaveDelay as part of op batching on secondaries
- Closed
-
SERVER-21004 Implement wait objects that support operation interruption/expiration
- Closed