[GODRIVER-2426] Distinct does not provide unmarshalling api Created: 18/May/22  Updated: 29/Jun/22  Resolved: 13/Jun/22

Status: Closed
Project: Go Driver
Component/s: None
Affects Version/s: 1.9.1
Fix Version/s: None

Type: Improvement Priority: Minor - P4
Reporter: Peter Ivanov Assignee: Benji Rewis (Inactive)
Resolution: Done Votes: 0
Labels: neweng
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File distinct_runCommand.go    
Issue Links:
Related
related to GODRIVER-2444 Consider a DistinctCursor collection ... Backlog
related to GODRIVER-2443 Make Distinct return a decodable stru... Blocked

 Description   

Summary

Consider the difference between these interfaces:

 

func (c *Cursor) All(ctx context.Context, results interface{}) error

 

and

func (coll *Collection) Distinct(ctx context.Context, fieldName string, filter interface{},
    opts ...*options.DistinctOptions) ([]interface{}, error)

 

In the former case, result is unmarshal-ed into user-provided slice, and in the latter, provided as []interface which is populated via a code like this:

 

for i, val := range values {
    raw := bson.RawValue{Type: val.Type, Value: val.Data}
    err = *raw.Unmarshal(&retArray[i])
        if err != nil {
            return nil, err
        }
    }
}

 

This proves problematic for a caller if you want to provide the same api with user-provided slices.

Why would you want this?

For example, if the field in question is a custom bson-marshalled type, and you wish to have the result as a slice of this type. Or if you simply do not wish to spoil your business logic with extra type checks and conversions. 

 

A list of workarounds appears to be:

  1. Write a conversion code with heavy use of reflection (needs PoC)
  2. Write a conversion code that marshals the result back into Raw and then marshals into user-provided slice (poor performance, needs PoC)
  3. Use public operation.NewDistinct() and implement the desired functionality outside of the driver (needs PoC)
  4. Change the code of the driver to allow for return of the original DistinctResult or to provide the desired api

We currently consider option 4 and will be happy to send a patch upstream, but it is best to first discuss if such change would be welcome. 

 

Motivation

Who is the affected end user?

Anyone who would use Distinct() in a general purpose code.

How does this affect the end user?

They need to implement a workaround.

How likely is it that this problem or use case will occur?

Any usage, possibly depending on the usage pattern.

If the problem does occur, what are the consequences and how severe are they?

A need to implement a workaround.

Is this issue urgent?

No.

Is this ticket required by a downstream team?

No.

Is this ticket only for tests?

No.



 Comments   
Comment by Benji Rewis (Inactive) [ 29/Jun/22 ]

Gotcha, I'm not sure I can think of a way to unmarshal from a slice of BSON values returned from the server into an interface{} without the use of either reflection or generics... Let me know if you have any luck there or think of a method that's not DistinctCursor that may be helpful for your use-case.

for someone who knows the concrete type of the result it solves almost all problems.

I agree, and I think that's probably strong enough motivation to add the method.

Comment by Peter Ivanov [ 29/Jun/22 ]

Yes, party. As we discussed before, it is a partial solution and it is still better than nothing. Then again, for someone who knows the concrete type of the result it solves alsmost all problems.

(Come to think of it, I may be able to remove the reflection with the use of generics. I'll give it a try at some point.)

Comment by Benji Rewis (Inactive) [ 28/Jun/22 ]

Ah I understand. If we provided a DistinctCursor method on Collection that returned a Cursor that could iterate over the distinct values, would that help your use-case? I'm guessing not, as a Cursor represents a set of bson.Raw values that would still have to be appended to a user-provided interface{} with reflect logic. What API would be helpful?

Comment by Peter Ivanov [ 24/Jun/22 ]

What I mean is a piece of code like this:

 

func unmarshalSliceRawValue(ctx context.Context, result interface{}, raw []bson.RawValue) error {
    resultv := reflect.ValueOf(result)
    if !isPtrToSlice(result) {
        panic(...)
    }
    sliceValue := resultv.Elem()
    sliceValue = sliceValue.Slice(0, sliceValue.Cap())
    elemType := sliceValue.Type().Elem()
    for i := 0; i < len(raw); i++ {
        if sliceValue.Len() == i {
            elemPtr := reflect.New(elemType)
            err := unmarshal(ctx, raw[i], elemPtr.Interface())
            if err != nil {
                return err
            }
            sliceValue = reflect.Append(sliceValue, elemPtr.Elem())
            sliceValue = sliceValue.Slice(0, sliceValue.Cap())
        } else {
            err := unmarshal(ctx, raw[i], sliceValue.Index(i).Addr().Interface())
            if err != nil {
                return err
            }
        }
    }
    resultv.Elem().Set(sliceValue.Slice(0, len(raw)))
    return nil
}

 

That is, append slice with individual unmarshalled objects.

Comment by Benji Rewis (Inactive) [ 22/Jun/22 ]

Ah I did assume that there was a single concrete type that could represent the value of the field for the distinct key.

And then I use quite a lot of reflect to unmarshal individual values into a result slice that is provided by a caller.

I'm a little confused what you mean here. Do you know the set of types that the fields for the distinct key could be? And, if so, are you using reflection to determine which of those types to unmarshal the individual values to? Would exposing a DistinctCursor method (as proposed in GODRIVER-2444) that returns a Cursor help avoid that reflection? I'm not sure I see how, as I imagine you'd run into the same issue.

 

Comment by Peter Ivanov [ 21/Jun/22 ]

Benji, I'm afraid its use is limited. 

The part of example that does show a way of making your own impl relies on knowing the concrete type of the result. In my case, this is impossible, otherwise I might have settled with the original driver api plus some conversion.

I did manage to make it work, but my code is different. First, I decode the raw result to struct like this:

type distinctResult struct {
  Values []bson.RawValue `bson:"values"`
}

And then I use quite a lot of reflect to unmarshal individual values into a result slice that is provided by a caller. 

Comment by Benji Rewis (Inactive) [ 13/Jun/22 ]

And let me know if that workaround with RunCommand proves unusable for any reason (either through this ticket or another).

Comment by Benji Rewis (Inactive) [ 13/Jun/22 ]

Filed GODRIVER-2444 to track the addition of a method like DistinctCursor. Feel free to track that ticket for the improvement. I agree that the current Distinct method is confusing API and a new method pre-2.0 could be helpful. Thanks again for your improvement suggestion, petr.ivanov.s@gmail.com!

Comment by Peter Ivanov [ 09/Jun/22 ]

Thank you, Benji. 

The example seems all right. Can't say more until I use it in some way.

I fully understand the issue with breaking change, but the current API seems awkward enough  to warrant adding an alternative before v2, in my opinion. This seems better than dozens of users implementing the same thing differently, on their own. 

Comment by Benji Rewis (Inactive) [ 02/Jun/22 ]

Thanks for your feature request, petr.ivanov.s@gmail.com! I agree it's odd that the All method on Cursor allows users to unmarshal to their own custom structs while the Distinct method on Collection unmarshals to []interface{} before returning. However, any changes to the return types of the Distinct API are breaking changes and can't be implemented before a v2.x driver version.

The best way to immediately resolve your problem is to use RunCommand. Here's an example of how you might run "distinct" through RunCommand and Decode to your own custom struct representing the structure of the queried field's values.

distinct_runCommand.go

As far as improvements to the driver, I've created GODRIVER-2443 to track improving the Distinct API in the v2.x driver and GODRIVER-2444 to track a proposal to add something like a DistinctCursor function to the current v1.x driver that would return a Cursor like Find or Aggregate.

Let me know what you think and if that example is helpful.

Comment by Kevin Albertson [ 23/May/22 ]

Thank you for the report petr.ivanov.s@gmail.com. The description has been updated. We will look into this soon.

Comment by Peter Ivanov [ 19/May/22 ]

For some reason code blocks ended up malformed, and I seem to lack the privilege to fix it. 

Generated at Thu Feb 08 08:38:34 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.