[CDRIVER-1937] bson_init_from_data() missing Created: 25/Nov/16  Updated: 26/Jan/17  Resolved: 26/Jan/17

Status: Closed
Project: C Driver
Component/s: libbson
Affects Version/s: None
Fix Version/s: 1.6.0

Type: Improvement Priority: Minor - P4
Reporter: Arseny Vakhrushev Assignee: A. Jesse Jiryu Davis
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

There is no bson_init_from_data() counterpart for bson_new_from_data(). Please note, however, that there is bson_init_from_json() for its bson_new_from_json() sibling in case of JSON.

In other words, there is no effective way to initialize a pre-allocated bson_t from a raw data buffer similar to bson_init_from_json(). The only feasible way now is to use a temporary which is sloppy:

bson_t bson;
bson_copy_to(bson_new_from_data(data, length), &bson);



 Comments   
Comment by Hannes Magnusson [ 26/Jan/17 ]

The docs have been updated to improve discoverability of init/new functions

Comment by Arseny Vakhrushev [ 18/Jan/17 ]

Right on! I'll leave the opportunity to create it to someone else if it's worth doing.

Comment by Hannes Magnusson [ 18/Jan/17 ]

that sounds like a conversation for a different ticket

Comment by Arseny Vakhrushev [ 09/Jan/17 ]

Your cooperation is much appreciated, Hannes! Many thanks!

It was however a mistake to add those convenience functions. See for example CDRIVER-1916.

Oh, no, that's a completely different problem in libbson. Wrappers have nothing to do with it as it is a separate API design flaw. For example, how come bson_iter_init() has a return value in the first place? If it didn't, there wouldn't be a problem like CDRIVER-1916 at all.

Consider this, bson_iter_init() returns a boolean when it clearly should't whereas some other calls will yell in stderr on errors instead of returning them. How is that? bson_iter_init() should probably silently initialize an iterator much like bson_init(). If the underlying data is invalid, there are two options:
1) API unambiguously requires that iterator functions be called on valid data, or undefined behaviour ensues; or
2) The subsequent bson_iter_next() call returns false.

If you ask me, I think it's a sloppy road to try to check for data integrity in each API call which does not solve the problem anyway. But that is the primary reason why so many API calls have return values. Leave alone the fact the user's manual doesn't bother to explain what ...

"true if the operation was applied successfully, otherwise false and bson should be discarded"

... even means. In what cases can an operation fail and what does a "failure" exactly mean? What party - caller or callee - is responsible for that? What can be done to prevent it? For a user like me, this looks vague and ambiguous. But the real problem here is that I am left with no other choice but to either propagate these conditions blindly or ignore them altogether thus breaking the contract and potentially multiplying the suffering even more. In both ways, the higher level user is left to pay the bill they will never understand.

Comment by Hannes Magnusson [ 06/Jan/17 ]

I agree that these oneliner helper functions like bson_iter_init_find are very convenient, and I'm sure bson_init_from_data could also be equally convenient.

It was however a mistake to add those convenience functions. See for example CDRIVER-1916.

Rather then adding functions for all mutations of symmetric function names, I think we should rather extend the docs to point out the other ways to achieve the same goal instead.

To that end, I am going to crosslink all the bson_init* and bson_new* functions, and also add link to bson_concat and then close this ticket.

Comment by Arseny Vakhrushev [ 06/Jan/17 ]

Thanks for your reply, Hannes! The purpose of this report is to show that there seems to be a little inconsistency (imbalance) in the current API in regards to the above.

If a user (someone, not me) sees two API calls, namely bson_new_from_json() and bson_new_from_data(), most definitely they will anticipate to see two symmetrical calls, namely bson_init_from_json() and bson_init_from_data(). But the latter is just not there...

One could care less. One could use bson_concat() if you say it's efficient. One could use bson_copy() if they didn't care about performance. The question is not about how to attain the goal. It's surely attainable. The question is if this way is visible and simple enough for the user.

I see other similar inconsistencies throughout the API. For example, there's bson_iter_init_find() which is just a wrapper around (bson_iter_init() && bson_iter_find()) along with its sibling bson_iter_init_find_case() to counter bson_iter_find_case(). These guys are surely convenient although obviously redundant. And I, as a user, appreciate that someone made an effort and included them into the API.

But... for some reason, there's no bson_iter_init_find_descendant() for its bson_iter_find_descendant() sibling. Why?

Someone took care and created a simple one-liner named bson_iter_init_find(). I fail to see why it's impossible to do the same for a hypothetical bson_init_from_data() even if it would be just a one-liner like (bson_init() && bson_concat()) (or whatever your pick is).

There seems to be lack of consistent approach to these things in general. And I'm just pointing this out. That's all. You don't have to defend yourself. Your project - your rules. These are just my five cents...

Comment by Hannes Magnusson [ 05/Jan/17 ]

I think maybe bson_concat does then exactly what you want?

The question isn't how technically difficult it is to add another function that does something, or add another function to complete some homebrewed naming schema for all mutations of its variations, instead, the questions is if we really want to encourage doing something that is better achieved in other ways – or maybe not at all.

If you have heap allocated bson_t you can initialize it with BSON_INITIALIZER and simply concatenate existing bson_t to it which I believe might solve your problem if bson_init_static does not suite your case.

Comment by Arseny Vakhrushev [ 09/Dec/16 ]

I don't think you would want a bson_init_from_data which would copy the data into a bson_t?

This is exactly what I want actually. Please see the workaround above. I have an uninitialized bson_t allocated on the heap, and I want to initialize it with data that already exists in another bson_t object, i.e. effectively copy the document by value.

Have you tried using bson_init_static for your usecase?

According to the manual, this call is not intended for copying purposes:

"... function shall initialize a read-only bson_t on the stack using the data provided. No copies of the data will be made and therefore must remain valid for the lifetime of the bson_t."

The above is more like creating a reference for the lifetime of the source object. No doubt, this can be useful sometimes, but it has nothing to do with my case when a newly created bson_t is not read-only, nor can it be guaranteed that the source will remain valid for its lifetime.

Again, there are ways to initialize an already allocated, but uninitialized bson_t with:
1. Nothing - bson_init()
2. JSON - bson_init_from_json()

There is no way to initialize it with data (i.e. make an effective copy by value).

As for allocate-and-initialize, a bson_t can be initialized with:
1. Nothing - bson_new()
2. JSON - bson_new_from_json()
3. Data - bson_new_from_data()

The difference to bson_init_from_json though is because json is entirely different data it must allocate new space for the conversion.

I really can't see how JSON itself is related to this issue because it should be fairly simple in a hypothetical bson_init_from_data() to leave the allocation of a bson_t structure to the user with the remaining code identical to bson_new_from_data().

Comment by Hannes Magnusson [ 09/Dec/16 ]

bson_init_static is meant to initialize bson_t from pre-allocated data (without taking ownership of the data).

The difference to bson_init_from_json though is because json is entirely different data it must allocate new space for the conversion.
I don't think you would want a bson_init_from_data which would copy the data into a bson_t?

Have you tried using bson_init_static for your usecase?

Comment by Arseny Vakhrushev [ 25/Nov/16 ]

Sorry, the example is also leaky. It should in fact be something like this:

bson_t bson, *tmp = bson_new_from_data(data, length);
bson_copy_to(tmp, &bson);
bson_destroy(tmp);

Generated at Wed Feb 07 21:13:40 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.