[CDRIVER-504] Convenient API to create BSON arrays Created: 12/Jan/15  Updated: 25/Jul/23  Resolved: 25/Jul/23

Status: Closed
Project: C Driver
Component/s: None
Affects Version/s: None
Fix Version/s: 1.25.0

Type: New Feature Priority: Minor - P4
Reporter: Jeffrey Yemin Assignee: Kevin Albertson
Resolution: Done Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to JAVA-1618 BasicBSONList doesn't support zero-le... Closed
related to CDRIVER-2526 Impove the BCON_APPEND_CTX() function... Backlog
is related to SERVER-21946 when sent invalid BSON, server runs o... Backlog
is related to SERVER-16814 Validate array keys in BSON documents Backlog
Epic Link: Improve Developer Experience
Quarter: FY24Q2
Case:

 Description   

Using method like https://api.mongodb.org/libbson/current/bson_append_int32.html, it appears that API consumers can choose the key for values not just in documents, but also in arrays. This will create BSON documents that violate the BSON spec: which says:

Array - The document for an array is a normal BSON document with integer values for the keys, starting with 0 and continuing sequentially. For example, the array ['red', 'blue'] would be encoded as the document {'0': 'red', '1': 'blue'}. The keys must be in ascending numerical order

This has caused users some problems when consuming documents created by the C driver. See JAVA-1618.



 Comments   
Comment by Githook User [ 25/Jul/23 ]

Author:

{'name': 'Kevin Albertson', 'email': 'kevin.albertson@mongodb.com', 'username': 'kevinAlbs'}

Message: CDRIVER-504 add `bson_array_builder_t` (#1344)

  • document and test `bson_uint32_to_string` behavior when input buffer is too small.
  • add `ASSERT_BSON_EQUAL_BSON` test macro
  • add `bson_array_builder_t`
  • add `bson_array_builder_append_*` functions
  • use `bson_array_builder_t`
    Replace existing use of `bson_append_array_begin`.
    bcon-speed.c was deliberately not modified to use `bson_array_builder_append_array_begin` to avoid changing performance.
  • move tutorial `Appending BSON` example code to separate file
  • add doc page for `bson_array_builder_t` and update doc examples
  • link to `bson_array_builder_t` doc page
  • colocate `BSON_APPEND_*` macros with functions
    Intends to make it easier to spot if a macro is missing.
  • add `BSON_APPEND_ITER` and `BSON_APPEND_NOW_UTC`
  • colocate `bson_append_` with `bson_array_builder_append_` functions
    Intends to make it easier to spot if a function is missing.

---------

Co-authored-by: Adrian Dole <donald@dole.tech>
Branch: master
https://github.com/mongodb/mongo-c-driver/commit/9affd70760c66ed9bb85652d014802841ed3378e

Comment by A. Jesse Jiryu Davis [ 21/Aug/17 ]

Might want an API like:

typedef struct _bson_array_t bson_array_t;
 
BSON_EXPORT (bson_array_t *)
bson_array_begin (bson_t *bson,
                  const char *key,
                  int key_length);
 
BSON_EXPORT (bool)
bson_array_end (bson_array_t *array);
 
BSON_EXPORT (bool)
bson_array_append_int32 (bson_array_t *array,
                         int32_t value);
 
/* ... and "append" functions for all BSON types ... */
 
bool f()
{
   bson_t b;
   bson_array_t * a;
   bool r;
 
   bson_init (&b);
   a = bson_array_begin (&b, "myArray", -1);
   if (!a) {
      fprintf (stderr, "error\n");
      return false;
   }
 
   r = bson_array_append_int32 (a, 1);
   if (!r) {
      fprintf (stderr, "error\n");
      return false;
   }
   
   r = bson_array_end (a);
   if (!r) {
      fprintf (stderr, "error\n");
      return false;
   }
 
   return true;
}

The opaque struct bson_array_t stores the current array index and probably a small char buffer as an optimization for calling bson_append_int32.

bson_array_begin seems like a nice function name, but is also confusingly similar to the existing bson_append_array_begin.

Comment by Tim Dorcey [ 21/Apr/16 ]

I first ran into this by accidentally using "0" for all keys in some early test code. Finding that everything seemed to work, I continued the practice to avoid nuisance of building the incrementing key string, that seemed totally redundant. I was eventually surprised to discover that this breaks dot notation queries, e.g., "{a.1:{$exists:true}}". Now I had a bunch of existing code to fix, and found the easiest fix was to replace each of my "0" keys with a function call that would return a sequentially incrementing string. E.g., BSON_APPEND_INT32(bson,myfun(context),value), not realizing that BSON_APPEND_INT32 is a macro that evaluates my function twice. So, all of my arrays ended up being indexed "1","3","5",.... Fortunately, I found that db.collection.copyTo() will reset the keys to "0","1","2",...

The array keys are not visible in bson_as_json or the mongo shell, so this is a tricky and unexpected problem to have. My recommendation would be to validate any user provided keys at run time and also allow a default "" value which would automatically generate correct value. Perhaps you could allow discontiguous keys, as long as they were ordered, and then fill in missing elements with null.

Comment by Asya Kamsky [ 18/Dec/15 ]

Note that we discovered that this can cause a hard-to-debug problem because sending a pipeline to the server which is all "0" labeled array caused the server to run just the last stage even though the server logs printed the full pipeline as if it were well-formed. SERVER-21946 filed.

Comment by Jeffrey Yemin [ 16/Jan/15 ]

That's totally reasonable, and not unexpected. They can always be removed in a major release some day.

Comment by Mira Carey [ 16/Jan/15 ]

You're quite right, libbson offers absolutely no guarantees that the keys you write will be correct for an output array.

We also have no way of identifying if the bson_t you're writing to is going to be an array. We could:

  • encode that information into the type for bson_append_array_begin
    • keeping the check fully runtime, we could runtime check that the key you provided was correct (this seems crazy)
    • ignore the key you pass to the existing functions (maybe also crazy)
    • require that you not pass a key to the existing functions (maybe okay...)
  • provide a new type, maybe bson_array_t
    • expose different methods that don't accept a key
    • provide a conversion back to bson_t for the rest of the api

Honestly, I think I like the idea of adding a new type the most, but I want to mull a bit on the best way to transition out without killing backwards compatibility. Either way, I don't think I'll remove the ability to write invalid keys for any existing bson_t's (including those from bson_append_array_begin), as there are working use cases I'd be breaking.

Which is to say that I'll probably WONTFIX this specific issue, but deprecate those elements of the api that encourage that behavior.

Generated at Wed Feb 07 21:09:42 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.