[CXX-757] document{} << const char* casts argument to bool Created: 03/Dec/15  Updated: 11/Jan/16  Resolved: 08/Dec/15

Status: Closed
Project: C++ Driver
Component/s: BSON
Affects Version/s: 0.2.0
Fix Version/s: 3.0.0-rc0

Type: Bug Priority: Major - P3
Reporter: Scott Deerwester Assignee: Mira Carey
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File test.json     File testConnection.cpp    
Epic Link: C++11 Driver MongoDB 3.2

 Description   

When constructing a query in the C++ driver, the following:

const char *term = "value";
document{} << "attribute" << term;

casts the second argument (term) to bool, creating a query "attribute"=true instead of the expected "attribute" = "value". The example assumes that the attached JSON file has been loaded to collection testCollection in database test.



 Comments   
Comment by Githook User [ 08/Dec/15 ]

Author:

{u'username': u'hanumantmk', u'name': u'Jason Carey', u'email': u'jcarey@argv.me'}

Message: CXX-757 Fix value append on bson builder core

Disables append for non-char pointer types

  • prevents implicit boolean conversion

removes appends for string literals && enables append for char pointers

Comment by Mira Carey [ 04/Dec/15 ]

After some conversation on our side, it seems like most of my team is leaning towards the: "all else being equal, support const char*". I've also had a chance to play around with disabling non-char pointers, which seems to work.

Code review for details:
https://github.com/mongodb/mongo-cxx-driver/pull/389

Comment by Scott Deerwester [ 04/Dec/15 ]

Thanks, Jason. I really appreciate your obvious depth in considering all of the factors behind a potential change. It's too easy to be reactive, rather than responsive, and it's refreshing to see somebody getting it right.

Comment by Mira Carey [ 04/Dec/15 ]

Scott,

Considering your reply:

1. As important as efficiency is, creating a sense of robustness for the developer is at least as important. Having something as basic as using raw C/C++ types break creates an impression of a fragile API. The cost of that goes well beyond saving a few ms here and there.

  • I'm 100% on board with you considering const char*'s a raw C type. I'm less convinced that a modern api should go to lengths for them in C++. null terminated strings are error prone, fairly expensive and easy to use wrong. It seems like a lot of modern C++ (from std::string to the new std::string_view) have been dedicated to building a world without const char*'s
  • The realization that we're getting implicit bool conversion for pointer types has me worried. Adding a const char* overload would help for the one type, but leaves in silent runtime corruption for other user mistakes.

2. As far as I'm aware (which isn't very far at all), the primary, and perhaps sole use case for bsoncxx::stream::document is creating queries. I.e. it has no bearing on performance for retrieval. If that's true, then performance at retrieval time swamps performance in creating the query, since a single disk (or even cache) access is many, many times more expensive than an extra memchr.

  • The other prominent use case is for inserts. We use the C++11 driver internally for some of our performance testing, and there's a noticeable perf difference when generating large numbers of small documents.

I need to do a little more fiddling, but I think I've figured out a rough list of priorities:

  1. It needs to be impossible to get an implicit boolean conversion from a pointer type. It's too easy to accidentally pass a pointer in, that needs to break compilation on any non-char types. I'll remove the raw append(bool) if necessary for this if I can't find another work-around
  2. We need efficient literal key append. This isn't the use case you were talking about, but passing const char*'s for keys doesn't work now, and I don't think we can spare the perf to let it (this is where we do the copy)
  3. Need to make a decision about const char* for values. On further reflection, I think we can make it efficient (I can make the append do the memchr inline, which any reasonable compiler should be able to elide for literals), but I want to consider a bit if it's the right thing to do.
Comment by Scott Deerwester [ 04/Dec/15 ]

Thanks for the reply, Jason. I would lean fairly strongly toward your second option, for two primary reasons:

  1. As important as efficiency is, creating a sense of robustness for the developer is at least as important. Having something as basic as using raw C/C++ types break creates an impression of a fragile API. The cost of that goes well beyond saving a few ms here and there.
  2. As far as I'm aware (which isn't very far at all), the primary, and perhaps sole use case for bsoncxx::stream::document is creating queries. I.e. it has no bearing on performance for retrieval. If that's true, then performance at retrieval time swamps performance in creating the query, since a single disk (or even cache) access is many, many times more expensive than an extra memchr.

In short, the cost-benefit analysis of breakage vs ms while constructing a query weighs heavily in favor of eliminating the breakage.

Comment by Mira Carey [ 03/Dec/15 ]

Scott,

Unfortunately, this basically boils down the age old problem in C++ that bool's and pointers are a bit too close together (you're seeing the built-in pointer conversion to bool, over the provided implicit string_view constructor).

And while there is the obvious solution (a const char* overload), we're actually trying to pick up a string literal template match (which saves us a memchr and a copy (because we have extra lifetime info)) of the form:

template <std::size_t n>
void append(const char (&v)[n]);

Which is technically a worse match than const char* (because array to pointer decay is closer than a template)

So there are a couple of ways forward:

1. Status Quo

Not the best, because users can accidentally pass pointer types and get them bool converted, which can lead to silent data corruption issues.

2. Addition of const char* overload

Better, because we do what the user meant. But worse, because we traded runtime performance. Also, users can still accidentally call the boolean overload with other char types (wide chars on windows lets say).

3. Removal of the boolean overload

Makes a lot more things not compile

  • Your example will fail, and you'll have to wrap a string_view or b_utf8 on top of your const char*
  • Anyone currently passing raw true/false into the library will have to wrap with b_bool

But the perf is exactly where we wanted it and no one sees anything unexpected at runtime


I'm leaning pretty heavily towards 3, despite the breakage, because it offers perf and safety for a little usability. What are your thoughts?
-Jason

Generated at Wed Feb 07 22:00:14 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.