[CSHARP-460] User types that implement generic collection interfaces are incorrectly serialized. Created: 02/May/12 Updated: 02/Apr/15 Resolved: 24/May/12 |
|
| Status: | Closed |
| Project: | C# Driver |
| Component/s: | None |
| Affects Version/s: | 1.4.1 |
| Fix Version/s: | 1.5 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Alexander Nagy | Assignee: | Robert Stam |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
All |
||
| Issue Links: |
|
||||
| Description |
|
Any user defined type that implement IEnumerable<T>, ICollection<T>, ICollection<KeyValuePair<,>>, or IDictionary<,> are incorrectly serialized. The bug lies in the sole use of the __genericSerializerDefinitions dictionary for type resolution. The following class hierarchy should serialize using the CollectionGenericSerializer: interface IFooBarList<T> : IList<T> However the logic only calls GetGenericTypeDefinition on DummyCollection, which will not resolve against the static list of __genericSerializerDefinitions. What needs to happen is the code should also call GetInterfaces, and traverse the generic interface definitions. Special care needs to be taken here, because a query against some types like Dictionary<,> will return multiple candidates such as IDictionary<,>, ICollection<KeyValuePair<,>>, etc.. Precedence should always be given the the furthest derived generic interface (I.e. IDictionary<,> if no serializer is registered for Dictionary<,>. Also, KeyValuePair<,> type should be serializable on it's own, but currently only works within the context of the DictionaryGenericSerializer. Case in point is that IDictionary<,> is really just an ICollection<KeyValuePair<,>> and they two could be decoupled. At it's best, right now the serializer attempts to use the class map serializer and crashes. At it's worst, the class map serializer will silently succeed in serializing nothing on a user collection, a case of silent data corruption. |
| Comments |
| Comment by Alexander Nagy [ 24/May/12 ] |
|
Thanks! |
| Comment by Robert Stam [ 24/May/12 ] |
|
Agreed. All interfaces are checked whether they are directly or indirectly inherited. |
| Comment by Alexander Nagy [ 23/May/12 ] |
|
It all looks good to me - one call out: These semantics should apply whenever a collection interface shows up in the derivation chain. Specifically this covers scenarios where people derive from base collection types in order to augment functionality (i.e. both Dictionary<TKey, TValue> and List<T> are not sealed to enable these scenarios). |
| Comment by Robert Stam [ 23/May/12 ] |
|
Here is a summary of the current situation: 1. Any class that subclasses a .NET collection class ends up being serialized using BsonClassMapSerializer, which doesn't work because the .NET collection class isn't designed the way BsonClassMapSerializer expects Item 2 is particularly surprising to people... The plan is to change the serialization of classes that implement collection interfaces to work better in more cases (but unfortunately not all). The new approach is essentially identical to that taken by the XmlSerializer in .NET. The new approach will be: 1. Any class that implements IDictionary<TKey, TValue> will be serialized using DictionarySerializer<TKey, TValue> Classes that implement IEnumerable<T> will also have to implement ICollection<T> if deserialization is desired (the Add method in ICollection<T> is needed for deserialization). Classes that implement IEnumerable will also have to implement IList if deserialization is desired (the Add method in IList is needed for deserialization). NOTE: just like with XmlSerializer, if your custom class implements either IEnumerable or IDictionary (legacy or generic) then ONLY the collection values will be serialized. If your custom class added additional public properties they will NOT be serialized. This is the opposite behavior as before, but is wrong less often (not serializing the collection values is always wrong, but custom collections that don't have additional properties are now serialized correctly). If you want to serialize your additional properties along with the collection values you will have to encapsulate and delegate to the collection class instead of subclassing it. Alex's suggestion to add a KeyValuePairSerializer<TKey, TValue> is a good one also. Work is in progress on this change. |
| Comment by Robert Stam [ 15/May/12 ] |
|
Planning to look at this soon. I will look at your commit as well. |
| Comment by Alexander Nagy [ 15/May/12 ] |
|
Appreciate your attention here, I could submit a pull request for this depending on how we might want to fix this as I've already got a starting point here: https://github.com/optimiz3/mongo-csharp-driver/commit/8223ae41958fc5995e9399fff215b88b5bb1179c |
| Comment by Alexander Nagy [ 08/May/12 ] |
|
What I meant by a serializer implicitly supporting derived classes was whether a serializer is selected to serialize derived classes, rather than whether any particular instance of a serializer can serialize derived instances of a class. I see the design call out you are making now - users register serializer instances rather than serializer types (except in the generic case). I agree with you it would be a risky to use a single registered serializer instance for multiple types. This has been something that has been floating in my mind for a while - perhaps the RegisterGenericSerializer api could be refactored to be a RegisterSerializer(<type>, <serializerType>) API? (serializerType would have to have a paramater-less constructor) It could disambiguate the scenario, a serializerType being registered could mean the serializer is valid for derived types, while an instance being registered could mean what it does today being valid for just a specific type, and produce a warning/exception if derived types are serialized. |
| Comment by Robert Stam [ 08/May/12 ] |
|
It is still an open question how to know when it is safe to use a BsonClassMapSerializer. Your point that if any base class has a custom serializer then it is unlikely that BsonClassMapSerializer is safe is helpful, because it could be one of the sanity checks that we use before using BsonClassMapSerializer for a class. As of a few releases ago an instance BsonClassMapSerializer now longer implicitly supports derived classes. That is because BsonClassMapSerializer now has a _classMap field that references the corresponding class map, so each derived class must have its own instance of BsonClassMapSerializer. |
| Comment by Alexander Nagy [ 08/May/12 ] |
|
Two options I can think of for ambiguous derivation: 1. Throw if such a case is presented |
| Comment by Alexander Nagy [ 08/May/12 ] |
|
For the user-registered base class case, I agree that the derived class case is more ambiguous. However it's almost certain that in the user-registered base class case, the class map serializer would be wrong. It definitely would not be equivalent to the custom semantics the user would have applied higher up in the derivation hierarchy. |
| Comment by Robert Stam [ 08/May/12 ] |
|
That doesn't sound right... if you use a serializer that is registered for a base class then it is likely that that serializer will only serialize the base class's fields/properties because that's all it knows about. I think each class needs to have its own serializer registered. |
| Comment by Alexander Nagy [ 05/May/12 ] |
|
Re: the question of how to detect what serializer is the correct one, I think there is a pretty elegant solution: If we treat the inheritance chain as a tree with System.Object at the root, then evaluating the correct serializer is a matter of reverse traversing an object's inheritance chain back up to the root, picking the first registered serializer. This benefits by giving the user the power to override differing levels of serialization depending on where in the object inheritance graph they register a serializer. So in the following inheritance chain: FooBar : Object If no ICollection<T> serializer is registered, then ClassMap is used DummyCollection3 : DummyCollection2, ICollection<float> If both a FooBar and an ICollection<T> serializer is registered, then the collection serializer is used for DummyCollection3 In effect, we would be applying the same order of evaluation that is used in standard inheritance. This also ensures that ClassMap is always the default serializer, and that whenever there is are multiple choices, the most-specific serializer is chosen. |
| Comment by Alexander Nagy [ 04/May/12 ] |
|
Also this has resulted in user confusion in the past: http://stackoverflow.com/questions/6471114/how-do-i-serialize-a-custom-collection-in-mongodb-bson |
| Comment by Alexander Nagy [ 03/May/12 ] |
|
The scenario here can probably be narrowed to eliminate ambiguity/risk: If a class implements a standard .Net collection (anything with ICollection/IEnumerable somewhere as a parent), serialize it as a collection, otherwise evaluate it as normal. The only people potentially exposed then would be those who derived from a collection type and then augmented it. Re data lost: Consider the opposite - right now ClassMapSerializer drops all data contained in the parent collection, but not the members. Something gets lost no matter what, but the current way also breaks standard design patterns. Microsoft the designers of .Net, have consistently designed the framework in this way |
| Comment by Robert Stam [ 03/May/12 ] |
|
But the "plus more" often includes additional state that should be serialized. Wouldn't picking a serializer based on the base class (or on one of the possibly multiple interfaces implemented) result in that additional state not being serialized? I would think it's dangerous for the default behavior to allow data to be lost in serialization. I will read up on the references you mention. A separate issue is that BsonClassMapSerializer is the default serializer for all classes that don't otherwise have some serializer registered for them, whether that makes sense or not. I have been trying to think of some way of knowing when the BsonClassMapSerializer can or cannot be used. So I would totally agree that applying a BsonClassMapSerializer to your DummyCollection<T> class is wrong, the question is how to know it is wrong. It is convenient that BsonClassMapSerializer automatically gets used for all your classes, but the dark side is that it sometimes ends up getting used when it shouldn't. |
| Comment by Alexander Nagy [ 03/May/12 ] |
|
Additional evidence: http://msdn.microsoft.com/en-us/library/ms950721.aspx Q: Why aren't all properties of collection classes serialized? A: The XmlSerializer only serializes the elements in the collection when it detects either the IEnumerable or the ICollection interface. This behavior is by design. The only work around is to re-factor the custom collection into two classes, one of which exposes the properties including one of the pure collection types. |
| Comment by Alexander Nagy [ 03/May/12 ] |
|
What this really boils down to is what the natural behavior is for classes that implement generic interfaces and in particular, collection interfaces. When someone derives from a class/interface, there are implicitly saying that the class is of the base type. There's a lot of literature on this but basically it's the strongest way of saying "I have the exact same semantics of my parent plus more". The inherrent reason to deriving from a collection is to augment it. Because of this, one would expect the derived class to inherit whatever serialization semantics the parent would have. It's really bizarre right now that the serialization semantics of a List/Dictionary/whatever completely change when a child derives from one. If one wanted to lose the collection semantics, a clearer way would be to compose the object relationship with collection as a private member. This has the effect of saying "this class is not a collection, but uses collection functionality". Here, I'd expect the opposite, if the user really wanted to serialize a derived collection as a ClassMap, then they would specify an override. Finally, there is significant precedence supporting what I'm saying here in the .Net framework: Cursory searching of the internet about .net serializers and collections turns up substatial evidence that as-is, the current semantic in the MongoDb driver is incorrect. (http://blogs.msdn.com/b/karstenj/archive/2005/12/09/502247.aspx, http://stackoverflow.com/questions/5286302/why-cant-i-add-attributes-when-serializing-observablecollectiont, etc.) That the Mongo driver is doing something opposite is highly counter intuitive to folks familiar with .Net, and all of the standard design patterns. |
| Comment by Robert Stam [ 03/May/12 ] |
|
I don't think it's correct for EnumerableSerializer<T> to be automatically selected for the DummyCollection<T> class. It's entirely possible that DummyCollection<T> has additional state that EnumerableSerializer<T> doesn't know about. If you know that EnumerableSerializer<T> is an adequate serializer for DummyCollection<T> you can call BsonSerializer.RegisterGenericSerializerDefinition to say so. |
| Comment by Alexander Nagy [ 03/May/12 ] |
|
Fix here: https://github.com/optimiz3/mongo-csharp-driver/commit/8223ae41958fc5995e9399fff215b88b5bb1179c |
| Comment by Alexander Nagy [ 02/May/12 ] |
|
I've got a full fix for this implemented, will send out a pull request soon. |