[SERVER-39296] 'no progress was made executing batch write op' Error with LockStateChangeFailed Created: 31/Jan/19  Updated: 08/Mar/19  Resolved: 08/Mar/19

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 4.0.4
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Ji Eun Lee Assignee: Eric Sedor
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Participants:

 Description   

I got this error in  Java Driver(Spring)

 

no progress was made executing batch write op in testdb.testcol after 5 rounds (0 ops completed in 6 rounds total)

 

 

And I found these error.

  • Every mongos
  • 2018-11-26T05:51:05.267+0900 W SHARDING [replSetDistLockPinger] pinging failed for distributed lock pinger :: caused by :: FailedToSatisfyReadPreference: Could not find host matching read preference { mode: "primary" } for set  cstest
     
    2018-11-26T05:51:05.267+0900 W SHARDING [mongosMain] Error initializing sharding state, sleeping for 2 seconds and trying again :: caused by :: FailedToSatisfyReadPreference: Error loading clusterID :: caused by :: Could not find host matching read preference { mode: "nearest" } for set  cstest 
     
     
    2018-11-26T05:51:30.117+0900 W SHARDING [mongosMain] Error initializing sharding state, sleeping for 2 seconds and trying again :: caused by :: FailedToSatisfyReadPreference: Error loading clusterID :: caused by :: Could not find host matching read preference { mode: "nearest" } for set  cstest 
     
    2018-11-26T05:59:33.255+0900 W SHARDING [replSetDistLockPinger] pinging failed for distributed lock pinger :: caused by :: LockStateChangeFailed: findAndModify query predicate didn't match any lock document
     
     
    2018-11-28T05:25:14.810+0900 W SHARDING [replSetDistLockPinger] pinging failed for distributed lock pinger :: caused by :: LockStateChangeFailed: findAndModify query predicate didn't match any lock document
     
     
    2018-11-28T05:26:53.096+0900 W SHARDING [replSetDistLockPinger] pinging failed for distributed lock pinger :: caused by :: LockStateChangeFailed: findAndModify query predicate didn't match any lock document

  • config server 1
  • 2018-11-26T05:56:59.294+0900 W SHARDING [signalProcessingThread] cant reload ShardRegistry  :: caused by :: CallbackCanceled: Callback canceled 
     
     
    2018-11-26T09:13:42.316+0900 W SHARDING [conn80] chunk operation commit failed and metadata will be revalidated :: caused by :: BadValue: preCondition failed 
     
    2018-11-28T02:00:50.865+0900 W SHARDING [Balancer] Error encountered while logging config change with ID [mongoserver01-2018-11-28T02:00:50.863+0900-5c2591e293cf03de3eeb13e1] into collection actionlog: InterruptedDueToReplStateChange: operation was interrupted
     
     
    2018-11-28T02:00:51.814+0900 W SHARDING [signalProcessingThread] cant reload ShardRegistry  :: caused by :: CallbackCanceled: Callback canceled

  • config server 2
  • 2018-11-28T05:44:37.470+0900 W SHARDING [Balancer] Skipping balancing round :: caused by :: BadValue: Failed to refresh the balancer settings :: caused by :: activeWindow format is  { start: "hh:mm" , stop: "hh:mm" }
     
    2018-11-28T07:30:34.308+0900 W SHARDING [conn683] chunk operation commit failed and metadata will be revalidated :: caused by :: BadValue: preCondition failed 
     
    2018-11-26T06:27:50.009+0900 W SHARDING [replSetDistLockPinger] Failed to unlock lock with ts: 5c23277b25fd6dc4a037d94d and _id: config :: caused by :: NotMaster: Not primary while running findAndModify command on collection config.locks
    

    cat config.log | grep "5c23277b25fd6dc4a037d94d" | wc -l
    5277
    

  • config server 3
  • 2018-11-26T05:52:27.515+0900 W SHARDING [signalProcessingThread] cant reload ShardRegistry  :: caused by :: CallbackCanceled: Callback canceled
     
     
    2018-11-26T05:56:37.424+0900 E STORAGE  [initandlisten] Failed to set up listener: SocketException: Address already in use
     
    2018-11-26T05:56:58.419+0900 W SHARDING [signalProcessingThread] cant reload ShardRegistry  :: caused by :: CallbackCanceled: Callback canceled
     
     
    2018-11-28T02:00:51.960+0900 W REPL     [rsBackgroundSync] Fetcher stopped querying remote oplog with error: HostUnreachable: error in fetcher batch callback :: caused by :: Connection was closed
     
    2018-11-26T06:27:20.014+0900 W SHARDING [replSetDistLockPinger] Failed to unlock lock with ts: 5c23277cf85fd40cc7a40389 and _id: config :: caused by :: NotMaster: Not primary while running findAndModify command on collection config.locks
    

    cat config.log | grep "5c23277cf85fd40cc7a40389"|wc -l
    100901
    

 

 

 

 

 



 Comments   
Comment by Eric Sedor [ 08/Mar/19 ]

Hi,

We haven’t heard back from you for some time, so I’m going to mark this ticket as resolved. If this is still an issue for you, please provide additional information and we will reopen the ticket.

Regards,
Eric

Comment by Eric Sedor [ 19/Feb/19 ]

Hello, we still need additional information to diagnose the problem. If this is still an issue for you, would you please provide the above information?

Comment by Eric Sedor [ 31/Jan/19 ]

Hi wldms1615@gmail.com, thank you for the good summary so far. Based on this we'd like to ask for an archive (tar or zip) of the $dbpath/diagnostic.data directory of each config server and mongos. Can you please attach this to this ticket?

Can you also provide the write operation being executed by the app?

Finally, which of the timestamps in the above logs corresponds the driver-side "no progress error", and can you go also attach the full log files for the day you saw the problem?

Thank you in advance!

Generated at Thu Feb 08 04:51:37 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.