[CXX-246] mongo::fromjson / JParse misparses JSON on locales that use a comma as decimal separator Created: 30/May/14  Updated: 07/Jan/15  Resolved: 07/Jan/15

Status: Closed
Project: C++ Driver
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Martin Hostettler Assignee: Unassigned
Resolution: Won't Fix Votes: 0
Labels: legacy-cxx
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

mongo::fromjson can't parse this simple json on a linux system with locale setting LC_NUMERIC=de_DE.UTF-8:

{
    "test1": 1,
    "test2": 1
}

parsing fails with the following error message:

terminate called after throwing an instance of 'mongo::MsgAssertionException'
  what():  code FailedToParse: FailedToParse: Expecting '}' or ',': offset:17 of:{
    "test1": 1,
    "test2": 1
}

Then the doSaslX functions 

This error is caused by JParse::number's usage of strtod which may be locale specific and in that cause don't parse according to the json syntax but by some other rules.

The code already points to SERVER-11920 in comments.

This is with the most recent commit on the legacy branch (2e86826)



 Comments   
Comment by Adam Midvidy [ 07/Jan/15 ]

Currently we require the C locale, and we do not plan to change this at this time.

Comment by Martin Hostettler [ 30/May/14 ]

I worked around the problem for integers with the following patch

diff --git a/src/mongo/db/json.cpp b/src/mongo/db/json.cpp
index 9e5531f..0f61960 100644
--- a/src/mongo/db/json.cpp
+++ b/src/mongo/db/json.cpp
@@ -911,29 +911,40 @@ namespace mongo {
     }
 
     Status JParse::number(const StringData& fieldName, BSONObjBuilder& builder) {
-        char* endptrll;
-        char* endptrd;
+        const char *numptr;
+        const char *numend;
         long long retll;
         double retd;
 
-        // reset errno to make sure that we are getting it from strtod
-        errno = 0;
-        // SERVER-11920: We should use parseNumberFromString here, but that function requires that
-        // we know ahead of time where the number ends, which is not currently the case.
-        retd = strtod(_input, &endptrd);
-        // if pointer does not move, we found no digits
-        if (_input == endptrd) {
-            return parseError("Bad characters in value");
+        // find boundry of number.
+
+        // Skip whitespace
+        // 'isspace()' takes an 'int' (signed), so (default signed) 'char's get sign-extended
+        // and therefore 'corrupted' unless we force them to be unsigned ... 0x80 becomes
+        // 0xffffff80 as seen by isspace when sign-extended ... we want it to be 0x00000080
+        numptr = _input;
+        while (numptr < _input_end &&
+               isspace(*reinterpret_cast<const unsigned char*>(numptr))) {
+            ++numptr;
         }
-        if (errno == ERANGE) {
-            return parseError("Value cannot fit in double");
+
+        numend = numptr;
+        // This is not the exact production from the json standard, but should give
+        // the same result for all valid input.
+        while (numend < _input_end) {
+            char ch = *numend;
+            if ((ch >= '0' && ch <= '9')
+                || ch == 'e' || ch == 'E' || ch == '-' || ch == '+' || ch == '.') {
+                ++numend;
+            } else break;
         }
-        // reset errno to make sure that we are getting it from strtoll
-        errno = 0;
-        // SERVER-11920: We should use parseNumberFromString here, but that function requires that
-        // we know ahead of time where the number ends, which is not currently the case.
-        retll = strtoll(_input, &endptrll, 10);
-        if (endptrll < endptrd || errno == ERANGE) {
+
+        StringData numberstr(numptr, numend - numptr);
+
+        if (Status::OK() != parseNumberFromStringWithBase(numberstr, 0, &retd)) {
+            return parseError("Bad characters in value");
+        }
+        if (Status::OK() != parseNumberFromStringWithBase(numberstr, 10, &retll)) {
             // The number either had characters only meaningful for a double or
             // could not fit in a 64 bit int
             MONGO_JSON_DEBUG("Type: double");
@@ -949,7 +960,7 @@ namespace mongo {
             MONGO_JSON_DEBUG("Type: 64 bit int");
             builder.append(fieldName, retll);
         }
-        _input = endptrd;
+        _input = numend;
         if (_input >= _input_end) {
             return parseError("Trailing number at end of input");
         }

This is only a partial fix because parseNumberFromStringWithBase also calls strtod. So it still misparses values with decimal point.

Generated at Wed Feb 07 21:58:38 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.