[SERVER-63887] SnapshotUnavailable error on sharded clusters/replica sets Created: 22/Feb/22  Updated: 29/Oct/23  Resolved: 29/Mar/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 7.0.0-rc0

Type: Bug Priority: Major - P3
Reporter: Neil Shweky (Inactive) Assignee: Henrik Edin
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-55505 Enable Feature flag for PM-2218 Closed
is depended on by RUBY-2909 Snapshot Query Examples for the Manual Closed
Related
related to SERVER-41532 Mongos can fail with "a non-retryable... Closed
related to SERVER-66974 Server responding with unexpected err... Closed
is related to DRIVERS-2181 Snapshot Query Examples for the Manual Implementing
is related to SERVER-39704 Allow mongos to retry on stale versio... Needs Scheduling
Assigned Teams:
Storage Execution
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

Note that I haven't been able to reproduce this locally, only on CI. Here is the patch that I ran. The code I'm using is as follows:

 client = Mongo::Client.new(uri_string, database: "pets")
 client['cats'].delete_many({})
 client['dogs'].delete_many({})
 
client['cats'].insert_one({ 
    name:"Whiskers", 
    color:"white", 
    age:10, 
    adoptable:true 
})
 
client['dogs'].insert_one({ 
 name:"Pebbles", 
 color:"Brown", 
 age:10, 
 adoptable:true 
})
client.close
  
adoptablePetsCount = 0
 
# Start Snapshot Query Example 1
# note that I create the client twice because this part needs to be in the example
 
client = Mongo::Client.new(uri_string, database:"pets")
client.start_session(snapshot:true) do |session|
    adoptablePetsCount = client['cats'].aggregate([
        { "$match": { "adoptable": true } },
        { "$count": "adoptableCatsCount" }
    ], session: session).first["adoptableCatsCount"]
 
    adoptablePetsCount += client['dogs'].aggregate([
         { "$match": \{ "adoptable": true } },
         { "$count": "adoptableDogsCount" }
    ], session: session).first["adoptableDogsCount"]
    
    puts adoptablePetsCount
end

Sprint: Execution Team 2023-04-03
Participants:

 Description   

Summary

I was implementing DRIVERS-2181 in Ruby, and these tests pass locally but fail on CI (but only very occasionally). This ticket is for implementing snapshot query example tests. When running these tests on CI (it passes locally) I occasionally get the following error on both replica sets and sharded clusters.

Something like this error has been reported before. See SERVER-41532.

The error is as follows:  

 Mongo::Error::OperationFailure: command failed because can not establish a snapshot :: caused by :: Unable to read from a snapshot due to pending collection catalog changes; please retry the operation. Snapshot timestamp is Timestamp(1645560051, 6). Collection minimum is Timestamp(1645560051, 7) (on localhost:27017, modern retry, attempt 1)

Note that sending distinct commands seems to fix this for sharded clusters. See the comments under SERVER-39704. This still fails for replica sets.

Motivation

Who is the affected end user?

mongo-ruby-driver spec tests are failing

How does this affect the end user?

I'm not sure that it does, since I'm having a hard time reproducing it

How likely is it that this problem or use case will occur?

It doesn't seem very likely since I'm having a hard time reproducing it.

If the problem does occur, what are the consequences and how severe are they?

The snapshot fails, but it seems like if you retry it, it will work. See SERVER-41532.

 



 Comments   
Comment by Jeremy Mikola [ 29/Mar/22 ]

Thinking about how we can mitigate this possible pain point for applications, I wonder if it'd be reasonable to add SnapshotUnavailable(246) to the list of retryable errors, so retryable reads could kick in.

The current implementation would only afford us one additional retry attempt, which may not be sufficient, but that may change with client-side operation timeout (DRIVERS-555).

Generated at Thu Feb 08 05:58:56 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.