[CSHARP-755] MongoServer.Primary is not thread-safe when using a replicaset Created: 10/Jun/13  Updated: 20/Mar/14  Resolved: 11/Jun/13

Status: Closed
Project: C# Driver
Component/s: None
Affects Version/s: 1.8.1
Fix Version/s: 1.8.2

Type: Bug Priority: Critical - P2
Reporter: Arnaud TAMAILLON Assignee: Robert Stam
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates CSHARP-724 MongoServer.IsPrimary Would Error Whe... Closed
Backwards Compatibility: Fully Compatible

 Description   

System.InvalidOperationException: Sequence contains more than one matching element
at System.Linq.Enumerable.SingleOrDefault[TSource](IEnumerable`1 source, Func`2 predicate)
at MongoDB.Driver.MongoServer.get_Primary()

MongoServerInstance.Primary is calling _serverProxy.Instances.SingleOrDefault(x => x.IsPrimary);

In our case proxy is ReplicaSetMongoServerProxy

When method ProcessConnectedInstanceStateChange is called, the previously disconnected instance can come with Primary set to true.
In this case the previous Primary instance may not see its flag unset by ProcessConnectedPrimaryStateChange method before a call to MongoServerInstance.Primary is done => Exception.



 Comments   
Comment by Robert Stam [ 28/Jun/13 ]

A new fix has been implemented for CSHARP-724 that reliably keeps track of the current primary.

This should resolve the race condition you were still seeing.

Comment by Craig Wilson [ 11/Jun/13 ]

Ok, I'll reopen the issue since you are still seeing this cheery-picked into 1.8.1. Your comment mentioned 274, which was obviously a type and why I didn't completely understand what you are saying. We did some extensive testing against this and our tests all cleared up, but I can see the race condition that is possible, although much less likely.

It comes down to the fact that we are publishing the same information we are using internally for public consumption. Immediately when the server becomes primary, this is public knowledge before we have a chance to unset the previous primary. There isn't a whole lot we can do about this at this particular time. It would require a massive refactoring of how things are done internally. We have already done much of this refactoring for our 2.0 release and will not be backporting to the 1.x cycle. I know this is a crappy answer and I apologize for it.

For now, I have provided the workaround you can use (the Instances property instead of the Primary property) and filter out the Disconnected instances. We could, arguably do that as well and I'll bring that up, but it means a change in behavior as well.

If you have any suggestions on the correction for this, I'm happy to listen and talk through them. For now, I'm going to leave this ticket closed re-open CSHARP-724 to identify that it is not completely fixed.

Comment by Arnaud TAMAILLON [ 11/Jun/13 ]

This has NOT been fixed by CSHARP-724, the issue is still there with CSHARP-724 applied to the 1.8.1.
My report is a report based on 1.8.1 here I cherry picked CSHARP-724, as stated in comments.

If primary is not reliable, why provide it ? In production code, we cannot rely on sometimes-it-works-sometime-it-does-not API...

Comment by Craig Wilson [ 11/Jun/13 ]

Hi Arnaud,
Thanks for the bug report. This was identified previously (CSHARP-724) and has been fixed. Note that it isn't a thread-safety issue, just an issue with how we handle a change in primaries.
As I suggested in the other issue, I'd highly suggest you not use our internal state mechanisms for cluster health monitoring, but rather issue replSetStatus commands to get the state of the cluster directly from the cluster. Up to you, but until we release 1.9, this will be an issue for you.
You can get around this by using the Instances property and filtering what you want that way.

Sorry for the trouble,
Craig

Comment by Arnaud TAMAILLON [ 10/Jun/13 ]

Note that this issue is visible with the CSHARP-274 fix applied.

Generated at Wed Feb 07 21:37:44 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.