[JAVA-4954] Refactor PojoCodec to leverage CodecProvider.get(clazz, typeArguments, registry) Created: 29/Apr/23  Updated: 09/Oct/23

Status: Backlog
Project: Java Driver
Component/s: POJO
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Jeffrey Yemin Assignee: Valentin Kavalenka
Resolution: Unresolved Votes: 0
Labels: tech-debt
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Issue split
split to JAVA-4967 Deprecate Parameterizable, introduce ... Closed
Related
related to JAVA-2554 Investigate Optional support for the ... Closed
Epic Link: Investigate our POJO implementation
Quarter: FY24Q2, FY24Q4
Documentation Changes Summary:

1. What would you like to communicate to the user about this feature?
2. Would you like the user to see examples of the syntax and/or executable code and its output?
3. Which versions of the driver/connector does this apply to?


 Description   

Update: the Parameterizable interface was replaced with CodecProvider.get(Class<T> clazz, List<Type> typeArguments, CodecRegistry registry).

The Parameterizable interface was introduced along with RecordCodec to allow it to be deeply immutable even in the face of type parameters. We can make PojoCodec similarly immutable by having it make use of this interface. There are a few aspects to this

  • PojoCodec implements Parameterizable
  • PojoCodec uses CodecRegistry#get(java.lang.Class<T>, java.util.List<java.lang.reflect.Type>) to resolve codecs for parameterized POJO types, including nested POJOs and collections. Doing this for collections may be tricky since a different mechanism was introduced in JAVA-2554 that may conflict with this effort.

The end result should be that the full set of parameterized codecs is resolved before the codec is ever used for encoding or decoding.



 Comments   
Comment by Githook User [ 09/Jun/23 ]

Author:

{'name': 'Valentin Kovalenko', 'email': 'valentin.kovalenko@mongodb.com', 'username': 'stIncMale'}

Message: Remove specialization-related conditional logic from `PojoCodecImpl` by introducing `LazyPropertyModelCodec.NeedsSpecializationCodec` (#1136)

This commit moves some of the conditional logic from `PojoCodecImpl` under the rug,
thus allegedly simplifying `PojoCodecImpl`, but not the POJO codec implementation overall.

JAVA-4954
Branch: master
https://github.com/mongodb/mongo-java-driver/commit/14cfcdca70e3e4e93e6fecefc35af1eb4da061cf

Comment by Valentin Kavalenka [ 06/Jun/23 ]

Use get(Class<T> clazz, List<Type> typeArguments, CodecRegistry registry) if PropertyCodecProvider/PropertyCodecRegistry and PropertyModelBuilder.codec are not used

I don't see how we can tell whether those mechanisms are used or not when deciding whether to use the new (not yet existing) implementation of get, given that the way they are used is lazy, while the new get is eager. But even if this is possible (I am not sure I can do that), it will make things much more complex (I still fail to understand the existing code, and have a voice in my head telling me that trying to understand it is not worth it)

The way out I see that does not introduce a new POJO codec is something you will not like:

  1. Deprecate PropertyCodecProvider/PropertyCodecRegistry and PropertyModelBuilder.codec for removal without providing an alternative.
  2. Remove the aforementioned in a major release and implement a replacement in the form of the new get implementation.
Comment by Jeffrey Yemin [ 06/Jun/23 ]

Is there a longer time horizon that we could take towards simplification, something like:

  1. Use get(Class<T> clazz, List<Type> typeArguments, CodecRegistry registry) if PropertyCodecProvider/PropertyCodecRegistry and PropertyModelBuilder.codec are not used
  2. Deprecate PropertyCodecProvider/PropertyCodecRegistry and PropertyModelBuilder.codec
  3. Remove PropertyCodecProvider/PropertyCodecRegistry and PropertyModelBuilder.codec in a major release
Comment by Valentin Kavalenka [ 06/Jun/23 ]

The thoughts below may be used as a starter for a discussion.

I think it may be impossible to refactor POJO codec such that it benefits from PojoCodecProvider implementing the new get(Class<T> clazz, List<Type> typeArguments, CodecRegistry registry) method introduced in JAVA-4967 (the same goes about Parameterizable, only now its more clear).

  1. Notice that PojoCodecProvider does not currently implement this method, yet POJO codec handles parameterized types. How does it do that? My understanding is that this happens because of the logic in LazyPropertyModelCodec, which deals with type parameters.
  2. If we implement the new get method on PojoCodecProvider, that method must deal with List<Type> typeArguments eagerly, as its done in RecordCodecProvider/RecordCodec, allowing the ProvidersCodecRegistry to "secretly" create LazyCodec (via ChildCodecRegistry) when it deems needed (not that I currently understand any part of that code).
  3. Adding that eager logic in addition to the logic in and around LazyPropertyModelCodec will only further complicate the implementation of the POJO codec. In other words, it seems that the POJO codec can only benefit from the new get method if it can get rid of the logic in LazyPropertyModelCodec and around it, which it can't because it must continue supporting PropertyCodecProvider/PropertyCodecRegistry and PropertyModelBuilder.codec.
  4. Furthermore, if the new get is implemented, the implementation can't even exist separately from the existing code: it must fall back to the existing logic if it fails to produce a codec for a parameterized type. Not only it seems impossible for it to know whether it failed or not (it may produce a codec, but the codec will still fail to either correctly encode or decode), but even if there is a way, the get code will be entangled with the existing code, making things more complex than when having two separate implementations.
Comment by Valentin Kavalenka [ 17/May/23 ]

Yes, that's how the new CodecProvider.get and the existing CodecRegistry.get are supposed to work together. I understand this part, but it says nothing about how to simplify PojoCodecImpl with its ClassModel/PropertyModel/PropertyCodecProvider public API (PropertyModel allows users to specify the optional Codec for it, which is a mechanism that duplicates / conflicts with the CodecProvider, and with PropertyCodecProvider; i.e., there are three conflicting mechanisms solving the same task, all expressed via public API).

In order to refactor a piece of code, one at first must understand it. I do not have understanding of the POJO codec code (it is spread across many classes, PojoCodecImpl is just one of them), and the way I am trying to approach it is by understanding it bit by bit. One of such bits I expressed in https://jira.mongodb.org/browse/JAVA-4954?focusedCommentId=5432833&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-5432833. And it seems like I even found the answer to the question I asked there: the broken PojoCodecImpl (one with specialized being false) is replaced after being created by the code in LazyPropertyModelCodec.createCodec(). It appears that at first LazyPropertyModelCodec.getCodecFromPropertyRegistry() creates a broken PojoCodecImpl, and then LazyPropertyModelCodec.createCodec() creates a working PojoCodecImpl based on the data in the broken PojoCodecImpl.

Comment by Jeffrey Yemin [ 17/May/23 ]

I was hoping that the following could happen:

  1. PojoCodecProvider could override the new CodecProvider#get overload you added in the PR in order to look up codecs for parameterized POJOs
  2. PojocCodec could use the {{CodecRegistry#get(java.lang.Class<T>, java.util.List<java.lang.reflect.Type>)}} method to look up codecs for its fields whose types are parameterized.

Let me know your thoughts.

Comment by Valentin Kavalenka [ 17/May/23 ]

Got it, thank you jeff.yemin@mongodb.com.

Since I fail to develop even a high-level view on how to refactor PojoCodecImpl such that it utilizes Parameterizable / the new CodecProvider.get and becomes simpler, I decided to see if I may have any success trying to do small pieces of refactoring, like simplifying a piece of code that may be written in a convoluted way. PropertyModel.codec/cachedCodec and PojoCodecImpl.specialized/specialize() (they are connected) appeared to be a good start.

From the inception, PojoCodec (the code was later moved to PojoCodecImpl) had the specialized field and the specialization logic around it, as well as the specialized checks in encode/decode methods that render the codec broken if specialized (it is set only in the constructor of PojoCodec) is not true. I fail to understand this logic: why would we want to create a codec that we know can never encode and can only sometimes decode? At first glance this may look somewhat similar to creating an unusable Parameterizable codec only to later call Parameterizable.parameterize and replace it with the working one created by this method, but nothing like that happens with PojoCodec - once created as not specialized, it stays broken and does not get replaced as far as I can see.

Comment by Jeffrey Yemin [ 16/May/23 ]

I believe we did that because those existing codecs are public and the work to make them Parameterizable looked to be a breaking change. PojoCodecImpl, by contrast, is not public, only PojoCodecProvider

Comment by Valentin Kavalenka [ 16/May/23 ]

jeff.yemin@mongodb.com If we currently think that PojoCodecImpl may probably be refactored to use Parameterized (or the new CodecProvider.get(Class<T>, List<Type>, CodecRegistry)), why did we have to add CollectionCodec, MapCodecV2 and did not refactor IterableCodec, MapCodec? These codecs appear to be much simpler than the POJO codec, yet we had to create new versions of them.

Comment by Valentin Kavalenka [ 08/May/23 ]

PojoCodec is implemented by PojoCodecImpl and AutomaticPojoCodec, but the latter is a trivial decorator that changes exception messages, which is why only PojoCodecImpl is relevant in the context of this task.

Mutability of PojoCodecImpl.

There are two conceptual sources of mutability of PojoCodecImpl:

  • The one related to creating and representing the POJO metadata: PojoCodecImpl contains DiscriminatorLookup, which is mutable. At the moment I am not sure if ClassModel is mutable.
  • And the one related to creating and configuring codecs: PojoCodecImpl contains CodecRegistry and PropertyCodecRegistry, which are mutable.

Within the scope of this task, only the mutability related to creating and configuring codecs should be considered. The two sources of mutability are tangled with each other because ClassModel refers to Codec s via PropertyModel s, which is (I am guessing) to avoid looking up codecs in CodecRegistry/PropertyCodecRegistry each time they are needed. jeff.yemin@mongodb.com Is my guess correct here? RecordCodec.ComponentModel also refers a Codec, presumably for the same reason. This appears to hamper straightforward reusing the code from RecordCodec in PojoCodecImpl.

Representing types with their type arguments

The approaches PojoCodecImpl and RecordCodec take on representing types with their type arguments are similar conceptually, but on the other hand, they are coded differently. This difference is an obstacle when trying to reuse the code from RecordCodec in PojoCodecImpl . Moreover, the approach taken by PojoCodecImpl seems at the first glance superior to the one taken by RecordCodec. Below I describe what we have for woking with types in the Java SE API, in PojoCodecImpl, in RecordCodec.

java.lang.reflect.Type in the Java SE API

The only provided implementation of Type is Class. Some Type s are ParameterizedType s, which know about their type arguments (values of the type parameters). Note that because a Type, when parameterized, knows of its type arguments, which are also represented as Type, it appears that a Type is essentially a graph of types that includes all the types involved.

org.bson.codecs.pojo.TypeWithTypeParameters in PojoCodecImpl

Just knowing the Type is not enough when encoding/decoding, as it does not tell us about the fields/properties/components of a class, i.e., we do need Class. This is likely the reason why TypeWithTypeParameters was introduced when writing PojoCodecImpl (jeff.yemin@mongodb.com Was this indeed the reason?):

  • just like Type, it knows about its type arguments
  • it also knows about its fields and the fields of each type argument because it uses Class instead of Type.

The driver then uses TypeWithTypeParameters in PropertyCodecRegistry.get(TypeWithTypeParameters).

Note here that specifically for PojoCodec, PropertyCodecRegistry and PropertyCodecProvider were introduced to the public API (unlike for CodecRegistry, a user has no way to specify his implementation of PropertyCodecRegistry, and can only specify a custom PropertyCodecProvider). My guess is that these two interfaces (of which only PropertyCodecProvider should have been made part of the public API, which is the same issue as we have with CodecRegistry and CodecProvider) were introduced as a worse solution to the same problem that the introduction of CodecRegistry.get(Class<T>, List<Type>) solved. jeff.yemin@mongodb.com Am I guessing correctly here?

A pair of Class and List<Type> in RecordCodec

Instead of using TypeWithTypeParameters, RecordCodec uses two unrelated (from the standpoint of the compiler) objects Class and List<Type> to represent seemingly the same thing as TypeWithTypeParameters, and, consequently, CodecRegistry.get(Class, List<Type>) instead of CodecRegistry.get(TypeWithTypeParameters).

jeff.yemin@mongodb.com What were the reasons for not using TypeWithTypeParameters (at the first glance, I like this type), and using a pair of Class and List<Type> instead? Also, if you note that something I wrote above is clearly wrong, could you please correct me?

Generated at Thu Feb 08 09:03:22 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.