[GODRIVER-2326] bson.UnmarshalExtJSON does not support Infinity/-Infinity values Created: 04/Mar/22 Updated: 27/Oct/23 Resolved: 05/Apr/22 |
|
| Status: | Closed |
| Project: | Go Driver |
| Component/s: | BSON |
| Affects Version/s: | 1.8.4 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | Cedric Cordenier | Assignee: | Matt Dale |
| Resolution: | Gone away | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Description |
SummaryThe bson.UnmarshalExtJSON function and its variants does not support ** the literals Infinity/-Infinity. While these values are not supported by the JSON spec, they can currently be stored in mongo documents. How to Reproducehttps://play.golang.com/p/WNfYw7T_Si8 Additional BackgroundThis has created problems for us when attempting to ingest mongo changestreams, since the messages emitted by our DB cannot be unmarshaled using bson.UnmarshalExtJSON. To address this, I propose extending the JSON scanner to support Infinity and -Infinity as tokens. |
| Comments |
| Comment by PM Bot [ 05/Apr/22 ] | ||
|
There hasn't been any recent activity on this ticket, so we're resolving it. Thanks for reaching out! Please feel free to comment on this if you're able to provide more information. | ||
| Comment by Matt Dale [ 22/Mar/22 ] | ||
|
cedric.cordenier@coinbase.com were you able to determine if KafkaConnect was the source of the literal "Infinity" values? | ||
| Comment by David Golden [ 10/Mar/22 ] | ||
|
Thanks! Is that the MongoDB Kafka Connector or a third party one? I did see there's an output format configuration key "output.json.formatter". You might check what's being used there. If it's the Extended JSON one, then maybe that's a bug to report to the Kafka Connector bug tracker | ||
| Comment by Cedric Cordenier [ 10/Mar/22 ] | ||
|
Hi David, yes no problem The flow looks schematically a bit like this: Mongoid -> MongoDB (as BSON) -> KafkaConnect source connector for Mongo (as Ext JSON) -> our service In our case, it is the source connector which converts the BSON into Extended JSON in this case. However it is the data in the upstream Mongo that contains the literal Infinity values; they are not being lossily transformed by the Kafka connector, since this Extended JSON contains other data types encoded losslessly, such as ObjectIDs and Timestamps. To my knowledge all of the components in this chain are using official mongodb drivers, which raises the question of how the data is getting into Mongo in this literal format. I understand that my proposal is not ideal since it breaks with the spec, but if you would rather not accept this PR (which I totally understand given the implications), do you think that we could either: expose an API to override the JSON scanner used, or provide an option to allow literal Infinity values? In the meantime, I'll explore your proposed workaround. Thanks for your help! | ||
| Comment by David Golden [ 10/Mar/22 ] | ||
|
Cedric, could you please elaborate a bit more on your reconciliation system? All official MongoDB drivers are required to implement the Extended JSON Specification and so there is a way to convert BSON to proper Extended JSON in every supported language. If your upstream is using a supported driver, then they should be marshaling to Extended JSON using the driver, preferably using the "canonical" format for lossless transformation. If they're marshaling to plain (and non-standard) JSON, then that's already a type-lossy transformation in your upstream. Not only does conversion to plain JSON not support infinite doubles, it converts all your integers to doubles as well and doesn't support any of the MongoDB specific types like Object ID, Decimal128, etc. I understand you're looking at a solution to this one particular concern, but I encourage you to step back and look at the broader situation around lossy types in your pipeline. As a workaround for your immediate problem, however, since your reconciliation already isn't using Extended JSON, you can use any JSON parser that can handle non-standard JSON to convert to native Go types and marshal that to BSON. | ||
| Comment by Cedric Cordenier [ 10/Mar/22 ] | ||
|
Hi Matt The solution you proposed was the first I explored. Unfortunately, the system this is affecting is a reconciliation system which pulls in data from multiple upstream MongoDB clusters. We don't have control over the data and therefore aren't able to pre-clean the data in anyway. We've explored other options, like rewriting the BSON on the fly, but these have proven brittle. I see a larger problem here though: if Infinity/-Infinity can be stored in MongoDB without any issues (as seems to be the case), then I think the BSON package in the go driver should be able to decode the data. That's the reason for wanting the extend the BSON scanner in the package. | ||
| Comment by Matt Dale [ 10/Mar/22 ] | ||
|
cedric.cordenier@coinbase.com thanks for opening the ticket and the associated PR! The Extended JSON parser does support encoding/decoding infinity and negative infinity values using a document syntax like:
See an example on the Go Playground here: https://play.golang.com/p/oYMm5uGjsZ2 Is it possible to update whatever system is generating the documents that contain the literal value Infinity to use the following valid Extended JSON value?
|