[SERVER-30239] fsyncunlock initiates but does not complete Created: 20/Jul/17  Updated: 20/Sep/17  Resolved: 18/Aug/17

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Question Priority: Major - P3
Reporter: Dharshan Rangegowda Assignee: Mark Agarunov
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Participants:

 Description   

MongoDB server 3.2.x
Mongo java driver 3.4.2
Hostname used for connection: 127.0.0.1. Server is a secondary of a replica set

I am running the fsyncunlock command directly using (new BasicDBObject("fsyncUnlock", 1) to work around issue https://jira.mongodb.org/browse/JAVA-2501. The fsyncUnlock is also being run on the same connection as shown in the logs below

However sometimes I notice that fsyncunlock initiates but does not complete. Traces are below

2017-07-19T00:12:17.309+0000 I ACCESS [conn3540] Successfully authenticated as principal admin on admin
2017-07-19T00:12:17.310+0000 I NETWORK [initandlisten] connection accepted from 127.0.0.1:35222 #3541 (6 connections now open)
2017-07-19T00:12:17.325+0000 I ACCESS [conn3541] Successfully authenticated as principal admin on admin
2017-07-19T00:12:17.325+0000 I COMMAND [conn3541] CMD fsync: sync:1 lock:1
2017-07-19T00:12:27.777+0000 I COMMAND [conn3541] db is now locked, no writes allowed. db.fsyncUnlock() to unlock
2017-07-19T00:12:27.777+0000 I COMMAND [conn3541] For more info see http://dochub.mongodb.org/core/fsynccommand
2017-07-19T00:12:27.779+0000 I COMMAND [conn3541] command admin.$cmd command: fsync
{ fsync: 1, lock: 1 }
keyUpdates:0 writeConflicts:0 numYields:0 reslen:162 locks:{} protocol:op_
query 10454ms
2017-07-19T00:12:59.792+0000 I COMMAND [conn3541] command: unlock requested
2017-07-19T00:12:59.796+0000 I NETWORK [conn3540] end connection 127.0.0.1:35220 (5 connections now open)
2017-07-19T00:12:59.796+0000 I NETWORK [conn3541] end connection 127.0.0.1:35222 (4 connections now open)
2017-07-19T00:14:02.671+0000 I NETWORK [initandlisten] connection accepted from 10.83.16.163:45556 #3542 (5 connections now open)
2017-07-19T00:14:02.685+0000 I ACCESS [conn3542] Successfully authenticated as principal admin on admin
2017-07-19T00:14:02.686+0000 I NETWORK [initandlisten] connection accepted from 10.83.16.163:45558 #3543 (6 connections now open)
2017-07-19T00:14:02.701+0000 I ACCESS [conn3543] Successfully authenticated as principal admin on admin
2017-07-19T00:14:02.760+0000 I NETWORK [initandlisten] connection accepted from 127.0.0.1:35232 #3544 (7 connections now open)
2017-07-19T00:14:02.774+0000 I ACCESS [conn3544] Successfully authenticated as principal admin on admin
2017-07-19T00:14:02.778+0000 I NETWORK [conn3544] end connection 127.0.0.1:35232 (6 connections now open)
2017-07-19T00:14:02.781+0000 I NETWORK [conn3543] end connection 10.83.16.163:45558 (5 connections now open)
2017-07-19T00:14:02.782+0000 I NETWORK [conn3542] end connection 10.83.16.163:45556 (4 connections now open)
2017-07-19T00:14:26.393+0000 I NETWORK [conn3506] end connection 109.173.132.164:64795 (3 connections now open)
2017-07-19T00:15:52.324+0000 I CONTROL [signalProcessingThread] got signal 15 (Terminated), will terminate after current cmd ends
2017-07-19T00:15:52.324+0000 I FTDC [signalProcessingThread] Shutting down full-time diagnostic data capture
2017-07-19T00:15:52.328+0000 I REPL [signalProcessingThread] Stopping replication applier threads
2017-07-19T00:15:52.378+0000 W EXECUTOR [rsBackgroundSync] killCursors command task failed: CallbackCanceled: Callback canceled
2017-07-19T00:15:52.501+0000 I STORAGE [conn1757] got request after shutdown()
2017-07-19T00:15:53.210+0000 I STORAGE [conn1758] got request after shutdown()
2017-07-19T00:15:54.295+0000 I CONTROL [signalProcessingThread] now exiting
2017-07-19T00:15:54.295+0000 I NETWORK [signalProcessingThread] shutdown: going to close listening sockets...
2017-07-19T00:15:54.295+0000 I NETWORK [signalProcessingThread] closing listening socket: 6
2017-07-19T00:15:54.295+0000 I NETWORK [signalProcessingThread] closing listening socket: 7
2017-07-19T00:15:54.295+0000 I NETWORK [signalProcessingThread] removing socket file: /tmp/mongodb-27017.sock
2017-07-19T00:15:54.295+0000 I NETWORK [signalProcessingThread] shutdown: going to flush diaglog...



 Comments   
Comment by Dharshan Rangegowda [ 18/Aug/17 ]

Hi Mark,

The behaviour does not happen everytime. I see it once every 5-10 snapshots.

I have attached a snapshot where the issue happens - can you take a look at that?. You can see all the files there but wt does not recognize them. I think there is something getting corrupted in whatever catalog wt uses.

Comment by Mark Agarunov [ 18/Aug/17 ]

Hello dharshanr@scalegrid.net,

Unfortunately I have not been able to reproduce the behavior you've described. I've closed this ticket as "cannot reproduce", but if any new information comes to light, please let us know and we will further investigate the issue.

Thanks,
Mark

Generated at Thu Feb 08 04:23:07 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.