[JAVA-4954] Refactor PojoCodec to leverage CodecProvider.get(clazz, typeArguments, registry) Created: 29/Apr/23 Updated: 09/Oct/23 |
|
| Status: | Backlog |
| Project: | Java Driver |
| Component/s: | POJO |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Jeffrey Yemin | Assignee: | Valentin Kavalenka |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | tech-debt | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Epic Link: | Investigate our POJO implementation | ||||||||||||||||
| Quarter: | FY24Q2, FY24Q4 | ||||||||||||||||
| Documentation Changes Summary: | 1. What would you like to communicate to the user about this feature? |
||||||||||||||||
| Description |
|
Update: the Parameterizable interface was replaced with CodecProvider.get(Class<T> clazz, List<Type> typeArguments, CodecRegistry registry). The Parameterizable interface was introduced along with RecordCodec to allow it to be deeply immutable even in the face of type parameters. We can make PojoCodec similarly immutable by having it make use of this interface. There are a few aspects to this
The end result should be that the full set of parameterized codecs is resolved before the codec is ever used for encoding or decoding. |
| Comments |
| Comment by Githook User [ 09/Jun/23 ] |
|
Author: {'name': 'Valentin Kovalenko', 'email': 'valentin.kovalenko@mongodb.com', 'username': 'stIncMale'}Message: Remove specialization-related conditional logic from `PojoCodecImpl` by introducing `LazyPropertyModelCodec.NeedsSpecializationCodec` (#1136) This commit moves some of the conditional logic from `PojoCodecImpl` under the rug, JAVA-4954 |
| Comment by Valentin Kavalenka [ 06/Jun/23 ] |
I don't see how we can tell whether those mechanisms are used or not when deciding whether to use the new (not yet existing) implementation of get, given that the way they are used is lazy, while the new get is eager. But even if this is possible (I am not sure I can do that), it will make things much more complex (I still fail to understand the existing code, and have a voice in my head telling me that trying to understand it is not worth it) The way out I see that does not introduce a new POJO codec is something you will not like:
|
| Comment by Jeffrey Yemin [ 06/Jun/23 ] |
|
Is there a longer time horizon that we could take towards simplification, something like:
|
| Comment by Valentin Kavalenka [ 06/Jun/23 ] |
|
The thoughts below may be used as a starter for a discussion. I think it may be impossible to refactor POJO codec such that it benefits from PojoCodecProvider implementing the new get(Class<T> clazz, List<Type> typeArguments, CodecRegistry registry) method introduced in
|
| Comment by Valentin Kavalenka [ 17/May/23 ] |
|
Yes, that's how the new CodecProvider.get and the existing CodecRegistry.get are supposed to work together. I understand this part, but it says nothing about how to simplify PojoCodecImpl with its ClassModel/PropertyModel/PropertyCodecProvider public API (PropertyModel allows users to specify the optional Codec for it, which is a mechanism that duplicates / conflicts with the CodecProvider, and with PropertyCodecProvider; i.e., there are three conflicting mechanisms solving the same task, all expressed via public API). In order to refactor a piece of code, one at first must understand it. I do not have understanding of the POJO codec code (it is spread across many classes, PojoCodecImpl is just one of them), and the way I am trying to approach it is by understanding it bit by bit. One of such bits I expressed in https://jira.mongodb.org/browse/JAVA-4954?focusedCommentId=5432833&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-5432833. And it seems like I even found the answer to the question I asked there: the broken PojoCodecImpl (one with specialized being false) is replaced after being created by the code in LazyPropertyModelCodec.createCodec(). It appears that at first LazyPropertyModelCodec.getCodecFromPropertyRegistry() creates a broken PojoCodecImpl, and then LazyPropertyModelCodec.createCodec() creates a working PojoCodecImpl based on the data in the broken PojoCodecImpl. |
| Comment by Jeffrey Yemin [ 17/May/23 ] |
|
I was hoping that the following could happen:
Let me know your thoughts. |
| Comment by Valentin Kavalenka [ 17/May/23 ] |
|
Got it, thank you jeff.yemin@mongodb.com. Since I fail to develop even a high-level view on how to refactor PojoCodecImpl such that it utilizes Parameterizable / the new CodecProvider.get and becomes simpler, I decided to see if I may have any success trying to do small pieces of refactoring, like simplifying a piece of code that may be written in a convoluted way. PropertyModel.codec/cachedCodec and PojoCodecImpl.specialized/specialize() (they are connected) appeared to be a good start. From the inception, PojoCodec (the code was later moved to PojoCodecImpl) had the specialized field and the specialization logic around it, as well as the specialized checks in encode/decode methods that render the codec broken if specialized (it is set only in the constructor of PojoCodec) is not true. I fail to understand this logic: why would we want to create a codec that we know can never encode and can only sometimes decode? At first glance this may look somewhat similar to creating an unusable Parameterizable codec only to later call Parameterizable.parameterize and replace it with the working one created by this method, but nothing like that happens with PojoCodec - once created as not specialized, it stays broken and does not get replaced as far as I can see. |
| Comment by Jeffrey Yemin [ 16/May/23 ] |
|
I believe we did that because those existing codecs are public and the work to make them Parameterizable looked to be a breaking change. PojoCodecImpl, by contrast, is not public, only PojoCodecProvider |
| Comment by Valentin Kavalenka [ 16/May/23 ] |
|
jeff.yemin@mongodb.com If we currently think that PojoCodecImpl may probably be refactored to use Parameterized (or the new CodecProvider.get(Class<T>, List<Type>, CodecRegistry)), why did we have to add CollectionCodec, MapCodecV2 and did not refactor IterableCodec, MapCodec? These codecs appear to be much simpler than the POJO codec, yet we had to create new versions of them. |
| Comment by Valentin Kavalenka [ 08/May/23 ] |
|
PojoCodec is implemented by PojoCodecImpl and AutomaticPojoCodec, but the latter is a trivial decorator that changes exception messages, which is why only PojoCodecImpl is relevant in the context of this task. Mutability of PojoCodecImpl.There are two conceptual sources of mutability of PojoCodecImpl:
Within the scope of this task, only the mutability related to creating and configuring codecs should be considered. The two sources of mutability are tangled with each other because ClassModel refers to Codec s via PropertyModel s, which is (I am guessing) to avoid looking up codecs in CodecRegistry/PropertyCodecRegistry each time they are needed. jeff.yemin@mongodb.com Is my guess correct here? RecordCodec.ComponentModel also refers a Codec, presumably for the same reason. This appears to hamper straightforward reusing the code from RecordCodec in PojoCodecImpl. Representing types with their type argumentsThe approaches PojoCodecImpl and RecordCodec take on representing types with their type arguments are similar conceptually, but on the other hand, they are coded differently. This difference is an obstacle when trying to reuse the code from RecordCodec in PojoCodecImpl . Moreover, the approach taken by PojoCodecImpl seems at the first glance superior to the one taken by RecordCodec. Below I describe what we have for woking with types in the Java SE API, in PojoCodecImpl, in RecordCodec. java.lang.reflect.Type in the Java SE APIThe only provided implementation of Type is Class. Some Type s are ParameterizedType s, which know about their type arguments (values of the type parameters). Note that because a Type, when parameterized, knows of its type arguments, which are also represented as Type, it appears that a Type is essentially a graph of types that includes all the types involved. org.bson.codecs.pojo.TypeWithTypeParameters in PojoCodecImplJust knowing the Type is not enough when encoding/decoding, as it does not tell us about the fields/properties/components of a class, i.e., we do need Class. This is likely the reason why TypeWithTypeParameters was introduced when writing PojoCodecImpl (jeff.yemin@mongodb.com Was this indeed the reason?):
The driver then uses TypeWithTypeParameters in PropertyCodecRegistry.get(TypeWithTypeParameters). Note here that specifically for PojoCodec, PropertyCodecRegistry and PropertyCodecProvider were introduced to the public API (unlike for CodecRegistry, a user has no way to specify his implementation of PropertyCodecRegistry, and can only specify a custom PropertyCodecProvider). My guess is that these two interfaces (of which only PropertyCodecProvider should have been made part of the public API, which is the same issue as we have with CodecRegistry and CodecProvider) were introduced as a worse solution to the same problem that the introduction of CodecRegistry.get(Class<T>, List<Type>) solved. jeff.yemin@mongodb.com Am I guessing correctly here? A pair of Class and List<Type> in RecordCodecInstead of using TypeWithTypeParameters, RecordCodec uses two unrelated (from the standpoint of the compiler) objects Class and List<Type> to represent seemingly the same thing as TypeWithTypeParameters, and, consequently, CodecRegistry.get(Class, List<Type>) instead of CodecRegistry.get(TypeWithTypeParameters). jeff.yemin@mongodb.com What were the reasons for not using TypeWithTypeParameters (at the first glance, I like this type), and using a pair of Class and List<Type> instead? Also, if you note that something I wrote above is clearly wrong, could you please correct me? |