[SERVER-25501] [WTJournalFlusher] stopping WTJournalFlusher thread Created: 09/Aug/16  Updated: 27/Sep/16  Resolved: 27/Sep/16

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 3.2.8
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: JihoPark Assignee: Kelsey Schubert
Resolution: Done Votes: 0
Labels: crash
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

OS : Ubuntu 14.04.4 LTS
CPU : Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz 4Core
MemTotal: 8169740 kB
2 Mongos, 3 Config, 1 ReplicaSet (P - S - A)


Operating System: Linux
Participants:

 Description   

Secondary Down Working Collection Find & Drop
I use mongo-java-driver-2.13.2.jar

Mongo Log ( Log Level : 5)

2016-08-08T16:44:54.133+0900 D STORAGE  [signalProcessingThread] WiredTigerSizeStorer::storeInto table:collection-93-5310035786157052224 -> { numRecords: 8374, dataSize: 2881743 }
2016-08-08T16:44:54.133+0900 D STORAGE  [signalProcessingThread] WiredTigerSizeStorer::storeInto table:collection-94-5310035786157052224 -> { numRecords: 24326, dataSize: 3632474 }
2016-08-08T16:44:54.133+0900 D STORAGE  [signalProcessingThread] WiredTigerSizeStorer::storeInto table:collection-95-5310035786157052224 -> { numRecords: 87828, dataSize: 151202930 }
2016-08-08T16:44:54.133+0900 D STORAGE  [signalProcessingThread] WiredTigerSizeStorer::storeInto table:collection-96-5310035786157052224 -> { numRecords: 18846, dataSize: 9131491 }
2016-08-08T16:44:54.133+0900 D STORAGE  [signalProcessingThread] WiredTigerSizeStorer::storeInto table:collection-97-5310035786157052224 -> { numRecords: 499, dataSize: 3902217 }
2016-08-08T16:44:54.133+0900 D STORAGE  [signalProcessingThread] WiredTigerSizeStorer::storeInto table:collection-98-5310035786157052224 -> { numRecords: 39048, dataSize: 26332094 }
2016-08-08T16:44:54.133+0900 D STORAGE  [signalProcessingThread] WiredTigerSizeStorer::storeInto table:collection-99-5310035786157052224 -> { numRecords: 46, dataSize: 23022 }
2016-08-08T16:44:54.180+0900 D STORAGE  [WTJournalFlusher] stopping WTJournalFlusher thread
2016-08-08T16:44:54.392+0900 I STORAGE  [signalProcessingThread] shutdown: removing fs lock...
2016-08-08T16:44:54.392+0900 I CONTROL  [signalProcessingThread] dbexit:  rc: 0



 Comments   
Comment by Kelsey Schubert [ 27/Sep/16 ]

Hi jiho park,

From your responses, I do not see anything to indicate a bug in the MongoDB server. For MongoDB-related support discussion please post on the mongodb-users group or Stack Overflow with the mongodb tag. A question like this involving more discussion would be best posted on the mongodb-users group.

Thank you,
Thomas

Comment by JihoPark [ 11/Aug/16 ]

Hi Thomas.

1. Would you please upload the complete logs for the primary and secondary? Please be sure to include when this issue occurred, and the subsequent startup logs of the secondary.
secondary log in description.
primary log is
2016-08-08T16:43:55.606+0900 I NETWORK [initandlisten] connection accepted from 127.0.0.1:40147 #4934 (51 connections now open)
2016-08-08T16:43:55.607+0900 I NETWORK [conn4934] end connection 127.0.0.1:40147 (50 connections now open)
2016-08-08T16:43:57.915+0900 I NETWORK [initandlisten] connection accepted from 127.0.0.1:40148 #4935 (51 connections now open)
2016-08-08T16:43:57.916+0900 I NETWORK [conn4935] end connection 127.0.0.1:40148 (50 connections now open)
2016-08-08T16:43:58.223+0900 I NETWORK [initandlisten] connection accepted from 127.0.0.1:40149 #4936 (51 connections now open)
2016-08-08T16:43:58.224+0900 I NETWORK [conn4936] end connection 127.0.0.1:40149 (50 connections now open)
2016-08-08T16:43:58.225+0900 I NETWORK [initandlisten] connection accepted from 127.0.0.1:40150 #4937 (51 connections now open)
2016-08-08T16:43:58.225+0900 I NETWORK [conn4937] end connection 127.0.0.1:40150 (50 connections now open)
2016-08-08T16:43:58.237+0900 I NETWORK [initandlisten] connection accepted from 127.0.0.1:40151 #4938 (51 connections now open)
2016-08-08T16:43:58.237+0900 I NETWORK [conn4938] end connection 127.0.0.1:40151 (50 connections now open)
2016-08-08T16:43:58.820+0900 I NETWORK [conn4889] end connection 127.0.0.1:40099 (49 connections now open)
2016-08-08T16:43:58.826+0900 I NETWORK [conn4884] end connection 127.0.0.1:40094 (48 connections now open)
2016-08-08T16:43:58.840+0900 I NETWORK [conn4919] end connection 127.0.0.1:40131 (47 connections now open)
2016-08-08T16:43:59.852+0900 I REPL [ReplicationExecutor] Member APSEO-SHPS-01:42002 is now in state SECONDARY
2016-08-08T16:44:02.402+0900 I NETWORK [initandlisten] connection accepted from 127.0.0.1:40152 #4939 (48 connections now open)
2016-08-08T16:44:02.402+0900 I NETWORK [conn4939] end connection 127.0.0.1:40152 (47 connections now open)
2016-08-08T16:44:02.403+0900 I NETWORK [initandlisten] connection accepted from 127.0.0.1:40153 #4940 (48 connections now open)
2016-08-08T16:44:23.424+0900 I SHARDING [LockPinger] cluster APSEO-SHPS-01:41001,APSEO-SHPS-01:41002,APSEO-SHPS-01:41003 pinged successfully at 2016-08-08T16:44:23.260+0900 by distributed lock pinger 'APSEO-SHPS-01:41001,APSEO-SHPS-01:41002,APSEO-SHPS-01:41003/APSEO-SHPS-01:42001:1467691717:561486104', sleeping for 30000ms
2016-08-08T16:44:25.400+0900 I NETWORK [initandlisten] connection accepted from 127.0.0.1:40154 #4941 (49 connections now open)
2016-08-08T16:44:35.514+0900 I NETWORK [initandlisten] connection accepted from 127.0.0.1:40165 #4942 (50 connections now open)
2016-08-08T16:44:53.542+0900 I NETWORK [ReplicaSetMonitorWatcher] Detected bad connection created at 1470641453420515 microSec, clearing pool for APSEO-SHPS-01:42002 of 0 connections
2016-08-08T16:44:53.588+0900 I SHARDING [LockPinger] cluster APSEO-SHPS-01:41001,APSEO-SHPS-01:41002,APSEO-SHPS-01:41003 pinged successfully at 2016-08-08T16:44:53.424+0900 by distributed lock pinger 'APSEO-SHPS-01:41001,APSEO-SHPS-01:41002,APSEO-SHPS-01:41003/APSEO-SHPS-01:42001:1467691717:561486104', sleeping for 30000ms
2016-08-08T16:44:53.878+0900 I REPL [ReplicationExecutor] Error in heartbeat request to APSEO-SHPS-01:42002; HostUnreachable: End of file
2016-08-08T16:44:54.119+0900 I REPL [ReplicationExecutor] Error in heartbeat request to APSEO-SHPS-01:42002; HostUnreachable: Connection reset by peer
2016-08-08T16:44:54.120+0900 I REPL [ReplicationExecutor] Error in heartbeat request to APSEO-SHPS-01:42002; HostUnreachable: Connection refused
2016-08-08T16:44:54.120+0900 I NETWORK [conn4940] end connection 127.0.0.1:40153 (49 connections now open)
2016-08-08T16:44:54.120+0900 I NETWORK [conn4942] end connection 127.0.0.1:40165 (48 connections now open)
2016-08-08T16:44:54.955+0900 I NETWORK [conn4920] end connection 127.0.0.1:40132 (47 connections now open)
2016-08-08T16:44:54.955+0900 I NETWORK [conn4883] end connection 127.0.0.1:40090 (47 connections now open)
2016-08-08T16:44:56.120+0900 I REPL [ReplicationExecutor] Error in heartbeat request to APSEO-SHPS-01:42002; HostUnreachable: Connection refused
2016-08-08T16:44:56.121+0900 I REPL [ReplicationExecutor] Error in heartbeat request to APSEO-SHPS-01:42002; HostUnreachable: Connection refused
2016-08-08T16:44:56.121+0900 I REPL [ReplicationExecutor] Error in heartbeat request to APSEO-SHPS-01:42002; HostUnreachable: Connection refused
2016-08-08T16:44:58.122+0900 I REPL [ReplicationExecutor] Error in heartbeat request to APSEO-SHPS-01:42002; HostUnreachable: Connection refused
2016-08-08T16:44:58.123+0900 I REPL [ReplicationExecutor] Error in heartbeat request to APSEO-SHPS-01:42002; HostUnreachable: Connection refused
2016-08-08T16:44:58.123+0900 I REPL [ReplicationExecutor] Error in heartbeat request to APSEO-SHPS-01:42002; HostUnreachable: Connection refused
2016-08-08T16:44:59.004+0900 I NETWORK [conn4879] end connection 127.0.0.1:39989 (45 connections now open)

2. Are you able to consistently reproduce this issue?
yes, the issue lasted until corrected.

Comment by Kelsey Schubert [ 10/Aug/16 ]

Hi jipark@ea.com,

Thanks for providing additional details, to continue to investigate please answer the following questions.

  • Would you please upload the complete logs for the primary and secondary? Please be sure to include when this issue occurred, and the subsequent startup logs of the secondary.
  • Are you able to consistently reproduce this issue?

Thank you,
Thomas

Comment by JihoPark [ 10/Aug/16 ]

@Ramon Fernandez
i find this case.
1. mongos Mode : ReadPref("secondary")
2. Send > collection Drop
3. Secondary Node (mongd) abnormal end

Comment by Ramon Fernandez Marina [ 09/Aug/16 ]

jiho park, can you please elaborate on what you think is the issue here? What the logs above show is a clean shutdown of MongoDB, and everything seems to be working as expected.

Generated at Thu Feb 08 04:09:22 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.