[SERVER-46079] $convert should allow any type to be converted to string Created: 11/Feb/20  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Asya Kamsky Assignee: Backlog - Query Optimization
Resolution: Unresolved Votes: 1
Labels: expression, product-priority, qopt-team
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-43411 add base64 and UUID conversion functi... In Code Review
Assigned Teams:
Query Optimization
Sprint: Query 2020-07-13
Participants:

 Description   

$convert with target type string (aka $toString) should accept all supported bson types. For example, converting an object or an array fails:

> db.foo.aggregate([{$set: {x: {$toString: {x: 2}}}}])
2020-02-11T14:12:49.418-0500 E  QUERY    [js] uncaught exception: Error: command failed: {
 "ok" : 0,
 "errmsg" : "Failed to optimize pipeline :: caused by :: Unsupported conversion from object to string in $convert with no onError value",
 "code" : 241,
 "codeName" : "ConversionFailure"
} : aggregate failed :



 Comments   
Comment by Kai Orend [ 03/Dec/20 ]

Having the ability to convert a document into a string representation would be very helpful for being able to store the output

of stages like $currentOp and $planCacheStats, which would contain fields prefixed with a $ (as they would contain MongoDB queries). Right now it is not possible to store a query representation using an aggregation pipeline. 

 

Comment by Asya Kamsky [ 10/Jul/20 ]

I can see extending $convert in the future to have an option for string output mode (optional).

Obviously it wouldn't be available in simple `$to<Type>` shortcuts.

Comment by Asya Kamsky [ 10/Jul/20 ]

I can see types that are MDB specific should consider a stable format (not sure if extended JSON is best or not). Types that are "understandable" outside of MDB (dates, numbers) should err on the side of 2. I think converting to string main use case is debugging and user presentation, which doesn't have concern of how to convert the data back to BSON.

Comment by David Storch [ 10/Jul/20 ]

I see a few competing concerns:

  1. Consistency with existing string formatting of $toString.
  2. Optimizing for human readability versus optimizing for machines converting all BSON types to and from string (without losing type information).
  3. Ensuring the format is well-known and stable.

The concern that I don't see addressed if we just call our internal Value::toString() is #3. This is an arbitrary format that isn't documented or stable. We would have to write down our intentions here, and add more testing to ensure that the format does not change. Perhaps we could define an even more relaxed variant of relaxed extended JSON (v2) which is consistent with the behavior we already have for $toString()? At least for things like Timestamp and BinData, I'd rather use something more extended JSON-like than the formats that Karmen showed above.

Comment by Bruce Lucas (Inactive) [ 10/Jul/20 ]

I agree with asya that it is more important to be consistent with current $convert behavior.

Comment by Karmen Liang [ 09/Jul/20 ]

The current behavior for $convert does indeed use human-readable strings instead of extended JSON, but there does exist a BSONObj toString function that converts objects to extended JSON-style strings. david.storch the Value toString function formats BinData and Timestamp like this:

 

db.test.aggregate({$set: {x: {$toString: BinData(0,"TWFu")} }})
db.test.aggregate({$set: {x: {$toString: Timestamp(1412180887, 1)} }})
{ "x" : "BinData(0, \"4D616E\")" } // BinData is converted to hex first
{ "x" : "Timestamp(1412180887, 1)" }

 

Since there are differing opinions about how to format the output strings, I'll pick this back up after a decision is made.

Comment by Asya Kamsky [ 08/Jul/20 ]

Btw, we should not be using extended JSON here - we didn't for already existing types (all numbers just become numbers (though Decimal has more precision even when they are zeros). Dates don't do the $date format, but just a string representation.

Comment by Asya Kamsky [ 08/Jul/20 ]

-I agree we should be consistent (say with log format) but it absolutely will be lossy...-

We should first be consistent with current $convert behavior, is what I meant to say.

Comment by Bruce Lucas (Inactive) [ 08/Jul/20 ]

In my opinion we should be consistent across the product line about how objects are converted to string representations rather than having a hodge-podge of different representations, and uniformly adopting extended JSON would be probably the right way to accomplish that.

Surely we already have code readily available to convert objects to extended JSON?

Comment by Ted Tuckman [ 08/Jul/20 ]

Dave and I had a brief discussion about whether this conversion should stick to extended JSON format. karmen.liang Can you look into that and see if your current approach gives results that are equivalent to canonical or relaxed extended JSON? I would guess it doesn't, so it may be worth investigating how much work that would be to do. asya, Should this be lossless? It seems a little excessive to me to expect that from a toString function. Presumably it should be stable, so we'd need to think about that either way. david.storch, feel free to add on if I missed anything we touched on.

Comment by David Storch [ 08/Jul/20 ]

karmen.liang ted.tuckman do we actually have a defined conversion from value to string for all types that we feel comfortable exposing as a stable API? For example, does converting an object or array to a string use extended JSON? How are we formatting BinData or Timestamp?

Generated at Thu Feb 08 05:10:26 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.