[JAVA-3747] Redesign Codec Resolution Created: 27/May/20  Updated: 30/Mar/22

Status: Backlog
Project: Java Driver
Component/s: Codecs, Scala
Affects Version/s: None
Fix Version/s: None

Type: New Feature Priority: Major - P3
Reporter: Aki ks Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Zip Archive mongo.zip    

 Description   

The MongoDB Scala driver advertises that it is a "modern idiomatic MongoDB Scala Driver". I would mostly agree, if there wasn't one big issue: The codec resolution is done at runtime via reflection. This is not idiomatic scala and has limitations. We can do that better

One of the biggest drawbacks of the current approach is that the jvm does not provide type information for Generic types at runtime. We can therefore not properly deserialize a List[String] since the information that the List contains String is not available via the reflection api.

Another Issue is that a codec for class that may some how be contained in the class-structure of the elements store in the Database must be explicitly registered. If I for instance have a `case class Foo(a: Either[Bar, Baz])`, I must register a Codec for the case classes Foo, Bar, Baz and the types of any fields of their fields.

The Solution

A more idiomatic scala way to implement codec resolution would be to use implicits instead. This would for instance solve the issue of deserializing Lists, as generic types are available at compile time. There will even no longer be any runtime overhead for the lookup up of codec classes in the CodecRegistries.

To implement that, the Driver should provide a Generic Codec[A] class. When creating for instance an MongoCollection[A], a Codec[A] class would be implicitly resolved. There is no longer a need for manually registering codecs.

For me this is one of the most anoying aspects when working with Mongo on the JVM. While I've currently decided to just no longer use Mongo when writing Software for the JVM, I've written a small/incomplete wrapper around the Mongo java driver in the past. This incomplete wrapper implements Codec resolution at compile time using implicits and can serialize Lists and Map[String, ?] properly. It can even serializes any "case class" or ADT (sealed trait + case class implementations) data structures using shapeless, without defining or registering codecs somewhere. A cust om written Macro as currently used could make this even more performant.

This wrapper (see attachment) is far from beeing an ideal implementation and was never used seriously, but take it as an inspiration of what the future Mongo-Scala-Driver might look like. You may also take at the Couchbase Scala Driver. Their Scala driver and its serialization mechanism feels way more idiomatic!

Could this be a potential candidate for next major release of the Mongo-Scala-Driver?



 Comments   
Comment by Ross Lawley [ 10/Jun/20 ]

Thanks again aki-ks.cloud.mongodb.com@m.mader.dev,

Some great points. I'm moving this ticket to "open" for future consideration.

Comment by Aki ks [ 02/Jun/20 ]

The current scala driver already includes Macros that work for case classes and adt's, but those do only summon one `Codec[A]` for the one requested type "A". Instead of using shapeless we should implement such custom macros. It would generate more optimized Code the when using Shapleless. Back then I've just decided to use Shapeless since I've wanted to get running Code quickly and I've often had troubled with scala macros in the past.

So once you've retrieved macro-generated codecs, you must currently register them in the Codec-Registry one by one. For every type that might be contained in the Datastructure to be serialized, one Codec must get explicitly registered. As you've said, at runtime some kind of `Map[Class[A], Codec[A]]` looks those Codes up. But if you've for instance forgot to register a Codec for one type, you will notice this only at runtime when such a type should get serialized for the first time.
...And If I remember correctly from what I've seen a few years ago: If CodecRegistries and CodecProviders are combined in a bad way, which can happen quite easily, a list of codecs gets iterated and one after another gets asked whether it supports the requested class (linear-time) rather than using one (constant-time) HashMap lookup.

My main idea is to completely get rid of the usage of CodecRegistry in the Scala driver. Due to limitiations in the Java language it was required there and it has many disadvantages, but with the power of implicits we do no longer need to rely on it for codec-resolution in the Scala-driver.
When ever a database operation such as `def find[A]` or `def updateOne[A]` is invoked, a Typeparameter is available that allows to implicitly find the corresponding Codec at Compiletime.

Comment by Ross Lawley [ 02/Jun/20 ]

Hi aki-ks.cloud.mongodb.com@m.mader.dev,

Thanks for the ticket.  As you may know the Scala driver is a layer written upon the Java driver and internally, the Java driver uses a CodecRegistry for converting type T to its equivalent Bson Type.  The registry can be thought as a Map like structure where the key is the class of the type to be encoded and the value is the Codec.  So it isn't really reflection based persae but as you rightly pointed out that can cause issues where generics are involved - as the type information is erased the user has to cast to the expected type.

To combat this the Scala driver does two things:

  1. Implicitly wraps the strongly typed BsonDocument into a Scala Document class.
    Adding support for implicit conversion from Scala native types to Bson types
    Has the base type of BsonValues so no cast class exceptions 
  2. The Macro Codec
    Adds support for Case Classes by looking up type information at compile time.

That said, I really like what you have done by using shapeless you've used reflection to implicitly handle case classes and other scala types to map into a Codec.  Also by wrapping that into a Macro to set the Codec at compile time that would improve the performance by removing the runtime reflection.

These are two approaches to the same problem, the Java driver uses a single registry throughout and your approach provides a specialized custom codec for each usecase.

The only downside would be to take a hard dependency on shapeless, which can causes issues with users ecosystems with different versions etc.  The JVM drivers aim to take the absolute minimum of outside dependencies because of the drivers wide usecases and the interactions with many different systems and stacks.

I would love to see an alternative Scala driver implementation written ontop of the Reactive Streams Driver that uses shapeless and your ideas.  I think that could be a benefit to Scala users.

Ross

Generated at Thu Feb 08 09:00:20 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.