[JAVA-2501] fsyncunlock does not work on secondaries with 3.2.x Created: 22/Apr/17  Updated: 29/Oct/23  Resolved: 25/Apr/17

Status: Closed
Project: Java Driver
Component/s: Command Operations
Affects Version/s: 3.4.2
Fix Version/s: 3.5.0

Type: Bug Priority: Minor - P4
Reporter: Dharshan Rangegowda Assignee: Ross Lawley
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by SERVER-28876 fsyncLock results in corrupt disk sna... Closed

 Description   

I am using java driver version 3.4.2

Fsyncunlock does not work on the secondary with 3.2.3 & 3.2.12. It works fine on the primary.
Here is the stack trace of the error.

21 Apr 2017 22:11:42,014 ERROR ~ Application-handleException: Timed out after 10000 ms while waiting for a server that matches WritableServerSelector. Client view of cluster state i
s {type=REPLICA_SET, servers=[{address=127.0.0.1:27017, type=REPLICA_SET_SECONDARY, roundTripTime=0.2 ms, state=CONNECTED}]
21 Apr 2017 22:11:42,014 ERROR ~ Timed out after 10000 ms while waiting for a server that matches WritableServerSelector. Client view of cluster state is {type=REPLICA_SET, servers=
[{address=127.0.0.1:27017, type=REPLICA_SET_SECONDARY, roundTripTime=0.2 ms, state=CONNECTED}]
21 Apr 2017 22:11:42,015 ERROR ~ com.mongodb.MongoTimeoutException: Timed out after 10000 ms while waiting for a server that matches WritableServerSelector. Client view of cluster s
tate is {type=REPLICA_SET, servers=[{address=127.0.0.1:27017, type=REPLICA_SET_SECONDARY, roundTripTime=0.2 ms, state=CONNECTED}]
        at com.mongodb.connection.BaseCluster.createTimeoutException(BaseCluster.java:377)
        at com.mongodb.connection.BaseCluster.selectServer(BaseCluster.java:104)
        at com.mongodb.binding.ClusterBinding$ClusterBindingConnectionSource.<init>(ClusterBinding.java:75)
        at com.mongodb.binding.ClusterBinding$ClusterBindingConnectionSource.<init>(ClusterBinding.java:71)
        at com.mongodb.binding.ClusterBinding.getWriteConnectionSource(ClusterBinding.java:68)
        at com.mongodb.operation.OperationHelper.withConnection(OperationHelper.java:411)
        at com.mongodb.operation.FsyncUnlockOperation.execute(FsyncUnlockOperation.java:41)
        at com.mongodb.operation.FsyncUnlockOperation.execute(FsyncUnlockOperation.java:38)
        at com.mongodb.Mongo.execute(Mongo.java:845)
        at com.mongodb.Mongo.unlock(Mongo.java:644)
        at controllers.MongoDBInstance.fsyncAndLock(MongoDBInstance.java:861)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at play.mvc.ActionInvoker.invokeWithContinuation(ActionInvoker.java:548)
        at play.mvc.ActionInvoker.invoke(ActionInvoker.java:502)
        at play.mvc.ActionInvoker.invokeControllerMethod(ActionInvoker.java:478)
        at play.mvc.ActionInvoker.invokeControllerMethod(ActionInvoker.java:473)
        at play.mvc.ActionInvoker.invoke(ActionInvoker.java:161)
        at play.server.PlayHandler$NettyInvocation.execute(PlayHandler.java:257)
        at play.Invoker$Invocation.run(Invoker.java:278)
        at play.server.PlayHandler$NettyInvocation.run(PlayHandler.java:235)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)



 Comments   
Comment by Dharshan Rangegowda [ 20/Jul/17 ]

I am able to repro this fairly consistently on one setup - let me file a new bug given that this is a different issue.

Comment by Dharshan Rangegowda [ 19/Jul/17 ]

Hi folks,

A quick update on this bug. Running the (fsyncUnlock,1) command explicitly to unlock is also failing intermittently. The server I tested against was 3.2.x

The command returns successfully in the client but in the server traces it doesn't confirm the unlock like it usually does (it just says received request). Ended up having to restart the server to move forward. fsyncUnlock is a synchronous command right?

Traces below

2017-07-19T00:12:17.309+0000 I ACCESS [conn3540] Successfully authenticated as principal admin on admin
2017-07-19T00:12:17.310+0000 I NETWORK [initandlisten] connection accepted from 127.0.0.1:35222 #3541 (6 connections now open)
2017-07-19T00:12:17.325+0000 I ACCESS [conn3541] Successfully authenticated as principal admin on admin
2017-07-19T00:12:17.325+0000 I COMMAND [conn3541] CMD fsync: sync:1 lock:1
2017-07-19T00:12:27.777+0000 I COMMAND [conn3541] db is now locked, no writes allowed. db.fsyncUnlock() to unlock
2017-07-19T00:12:27.777+0000 I COMMAND [conn3541] For more info see http://dochub.mongodb.org/core/fsynccommand
2017-07-19T00:12:27.779+0000 I COMMAND [conn3541] command admin.$cmd command: fsync

{ fsync: 1, lock: 1 }

keyUpdates:0 writeConflicts:0 numYields:0 reslen:162 locks:{} protocol:op_
query 10454ms
2017-07-19T00:12:59.792+0000 I COMMAND [conn3541] command: unlock requested
2017-07-19T00:12:59.796+0000 I NETWORK [conn3540] end connection 127.0.0.1:35220 (5 connections now open)
2017-07-19T00:12:59.796+0000 I NETWORK [conn3541] end connection 127.0.0.1:35222 (4 connections now open)
2017-07-19T00:14:02.671+0000 I NETWORK [initandlisten] connection accepted from 10.83.16.163:45556 #3542 (5 connections now open)
2017-07-19T00:14:02.685+0000 I ACCESS [conn3542] Successfully authenticated as principal admin on admin
2017-07-19T00:14:02.686+0000 I NETWORK [initandlisten] connection accepted from 10.83.16.163:45558 #3543 (6 connections now open)
2017-07-19T00:14:02.701+0000 I ACCESS [conn3543] Successfully authenticated as principal admin on admin
2017-07-19T00:14:02.760+0000 I NETWORK [initandlisten] connection accepted from 127.0.0.1:35232 #3544 (7 connections now open)
2017-07-19T00:14:02.774+0000 I ACCESS [conn3544] Successfully authenticated as principal admin on admin
2017-07-19T00:14:02.778+0000 I NETWORK [conn3544] end connection 127.0.0.1:35232 (6 connections now open)
2017-07-19T00:14:02.781+0000 I NETWORK [conn3543] end connection 10.83.16.163:45558 (5 connections now open)
2017-07-19T00:14:02.782+0000 I NETWORK [conn3542] end connection 10.83.16.163:45556 (4 connections now open)
2017-07-19T00:14:26.393+0000 I NETWORK [conn3506] end connection 109.173.132.164:64795 (3 connections now open)
2017-07-19T00:15:52.324+0000 I CONTROL [signalProcessingThread] got signal 15 (Terminated), will terminate after current cmd ends
2017-07-19T00:15:52.324+0000 I FTDC [signalProcessingThread] Shutting down full-time diagnostic data capture
2017-07-19T00:15:52.328+0000 I REPL [signalProcessingThread] Stopping replication applier threads
2017-07-19T00:15:52.378+0000 W EXECUTOR [rsBackgroundSync] killCursors command task failed: CallbackCanceled: Callback canceled
2017-07-19T00:15:52.501+0000 I STORAGE [conn1757] got request after shutdown()
2017-07-19T00:15:53.210+0000 I STORAGE [conn1758] got request after shutdown()
2017-07-19T00:15:54.295+0000 I CONTROL [signalProcessingThread] now exiting
2017-07-19T00:15:54.295+0000 I NETWORK [signalProcessingThread] shutdown: going to close listening sockets...
2017-07-19T00:15:54.295+0000 I NETWORK [signalProcessingThread] closing listening socket: 6
2017-07-19T00:15:54.295+0000 I NETWORK [signalProcessingThread] closing listening socket: 7
2017-07-19T00:15:54.295+0000 I NETWORK [signalProcessingThread] removing socket file: /tmp/mongodb-27017.sock
2017-07-19T00:15:54.295+0000 I NETWORK [signalProcessingThread] shutdown: going to flush diaglog...

Comment by Jeffrey Yemin [ 25/Apr/17 ]

Nothing firm yet, but likely in the next couple of months. Note also that, as Ross indicated above, MongoDB 3.2 added an fsyncUnlock command that can be called directly from your application, so there is a straightforward workaround:

        mongoClient.getDB("admin").command(new BasicDBObject("fsyncUnlock", 1));
        // or
        mongoClient.getDatabase("admin").runCommand(new Document("fsyncUnlock", 1));

Comment by Dharshan Rangegowda [ 25/Apr/17 ]

Thanks for the quick turnaround. Any estimate on when 3.5.0 will be generally released?

Comment by Githook User [ 25/Apr/17 ]

Author:

{u'username': u'rozza', u'name': u'Ross Lawley', u'email': u'ross.lawley@gmail.com'}

Message: Updated FsyncUnlockOperation to be a ReadOperation

The command does not require a primary to run.
JAVA-2501
Branch: master
https://github.com/mongodb/mongo-java-driver/commit/7d3a6637d6e1c1c12ee194f47aefd9455113f767

Comment by Ross Lawley [ 25/Apr/17 ]

PR: https://github.com/rozza/mongo-java-driver/pull/187

Comment by Ross Lawley [ 25/Apr/17 ]

Hi dharshanr@scalegrid.net,

A direct connection is the correct way forward. Currently, mongo#unlock uses an operation underneath that requires a Primary node. This is a bug as fsync lock and unlock can be called on Secondaries. To work around you can issue the command by creating it manually and calling: DB#command or MongoDatabase#runCommand. See the fsync command documentation for more information.

Running a fsync lock and unlock on anything other than a direct connection may not work as expected. This is because the driver will select a server based on the read preference and if there are multiple matching mongod's then it can easily select different machines.

All the best,

Ross

Comment by Dharshan Rangegowda [ 24/Apr/17 ]

Hi Ross,

1) The db.fsyncLock and unlock are being sent on the same connection.
2) The readPreference is ReadPreference.secondaryPreferred.
3) I am connecting to '127.0.0.1' as the host - so this should connect to a particular server instead of replica set right? This is being run on the secondary.

Comment by Ross Lawley [ 24/Apr/17 ]

Hi dharshanr@scalegrid.net,

It looks like the FsyncUnlockOperation requires a writable server and isn't able to find one. Can you confirm there is no Primary node at the time of calling mongo#unlock? Please note from the documentation

When calling db.fsyncLock(), ensure that the connection is kept open to allow a subsequent call to db.fsyncUnlock().

Closing the connection may make it difficult to release the lock.

Given, you have a connection to a replica set, where was the original fsync lock command sent to? You will need to send the unlock to the same server. As replica set topologies can change, it may be prudent to make a direct connection to a replica set member rather than connecting to the set as a whole.

Finally, you can issue an fsync unlock command with a secondary read preferenc by calling DB#command or MongoDatabase#runCommand and passing the relevant read preference.

I hope that helps,

Ross

Generated at Thu Feb 08 08:57:22 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.