[SERVER-49217] $dateFromString behavior differs from Date()/ISODate() Created: 30/Jun/20 Updated: 06/Dec/22 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Aggregation Framework |
| Affects Version/s: | 4.2.8 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor - P4 |
| Reporter: | Marshall Meng | Assignee: | Backlog - Query Execution |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | qexec-team | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
macOS 10.14.6 |
||
| Assigned Teams: |
Query Execution
|
| Operating System: | ALL |
| Steps To Reproduce: | > use test switched to db test > version() 4.2.8 > db.test.find({}) {{ { "_id" : ObjectId("5efbaba4250838017f12c045"), "dateString" : "2006-01-02T15:04:05.123456789Z" }}} > db.test.aggregate([\{$project: {date: {$dateFromString: {dateString: '$dateString'}}}}]) 2020-06-30T14:22:04.013-0700 E QUERY [js] uncaught exception: Error: command failed: { "ok" : 0, "errmsg" : "Error parsing date string '2006-01-02T15:04:05.123456789Z'; 0: passing a time zone identifier as part of the string is not allowed '2'", "code" : 241, "codeName" : "ConversionFailure" } : aggregate failed : _getErrorWithCode@src/mongo/shell/utils.js:25:13 doassert@src/mongo/shell/assert.js:18:14 _assertCommandWorked@src/mongo/shell/assert.js:583:17 assert.commandWorked@src/mongo/shell/assert.js:673:16 DB.prototype._runAggregate@src/mongo/shell/db.js:266:5 DBCollection.prototype.aggregate@src/mongo/shell/collection.js:1012:12 @(shell):1:1 > db.test.aggregate([{$project: {date: {$dateFromString: {dateString: {$substrBytes: ['$dateString', 0, 28]}}}])}} {{ { "_id" : ObjectId("5efbaba4250838017f12c045"), "date" : ISODate("2006-01-02T15:04:05.123Z") }}} > db.test.aggregate([{$project: {date: {$dateFromString: {dateString: {$substrBytes: ['$dateString', 0, 29]}}}])}} 2020-06-30T14:21:22.393-0700 E QUERY [js] uncaught exception: Error: command failed: { "ok" : 0, "errmsg" : "Error parsing date string '2006-01-02T15:04:05.123456789'; 0: passing a time zone identifier as part of the string is not allowed '2'", "code" : 241, "codeName" : "ConversionFailure" } : aggregate failed : _getErrorWithCode@src/mongo/shell/utils.js:25:13 doassert@src/mongo/shell/assert.js:18:14 _assertCommandWorked@src/mongo/shell/assert.js:583:17 assert.commandWorked@src/mongo/shell/assert.js:673:16 DB.prototype._runAggregate@src/mongo/shell/db.js:266:5 DBCollection.prototype.aggregate@src/mongo/shell/collection.js:1012:12 @(shell):1:1 > ISODate('2006-01-02T15:04:05.123456789Z') ISODate("2006-01-02T15:04:05.123Z") > Date('2006-01-02T15:04:05.123456789Z') Tue Jun 30 2020 14:23:41 GMT-0700 (PDT) > db.test.updateOne({_id: ObjectId('5efbaba4250838017f12c045')}, {$set: {'dateString': '2006-01-02T15:04:05.12345678Z')}} }} }} }} >
|
| Sprint: | Query 2020-08-24, Query 2020-09-07, Query 2020-09-21 |
| Participants: |
| Description |
|
I have a document with a nanosecond precision ISO timestamp in a string ("0000-00-00T00:00:00.000000000Z"). I do not control the schema or formatting of this document. When I try to use $dateFromString or $toDate on this field from an aggregation, I get the following error: Error parsing date string '2006-01-02T15:04:05.123456789Z'; 0: passing a time zone identifier as part of the string is not allowed '2' If I trim off the nanosecond digit (to 10 ns precision or 8 decimal places), it seems to handle it okay. Additionally, the ISODate() and Date() constructors are able to parse the string just fine. |
| Comments |
| Comment by David Storch [ 08/Oct/20 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
https://github.com/derickr/timelib/issues/93 has been fixed upstream in timelib by commit https://github.com/derickr/timelib/commit/12fff7ad830f79688e4ab66671dd2c0a143f6ba2. This means that the bug should be resolved when we next upgrade our vendorized copy of timelib to a version that incorporates the fix. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by David Storch [ 05/Oct/20 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I've filed an issue upstream with timelib to track the issue of not permitting date strings with nanosecond precision: see https://github.com/derickr/timelib/issues/93. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by David Storch [ 23/Sep/20 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
This patch fixes the issue for input strings that have up to 9 digits of precision expressed in fractional seconds:
There are a few more questions which we need to investigate:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by David Storch [ 23/Sep/20 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
This appears to be a bug in timelib, the third party library that we use for time calculations. The timelib code uses re2c to generate date parsing code, which makes this a bit tricky to debug. It looks like we have both parse_date.re and the corresponding generated file, parse_date.c checked into the third_party directory. We only run re2c when we upgrade the timelib code. The build process relies on the generated file, parse_date.c. I was able to patch the generated file so that the timelib code generates debug output during date string parsing:
In the problematic case, this resulted in the following debug output:
I'm not sure how useful the output from the generated re2c code is, but the following lines are interesting:
I believe this first line consists of various abbreviations for date string formats. The input date string matches the "soap" format, which is YYYY-MM-DDTHH:MM:SS.FFFFFFFFF with an optional timezone:
Since the input string has the "Z" for zulu time, we are falling through to the code for parsing the timezone, and it is here that something is going wrong. I'll have to dig a bit further to determine how to fix it. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Asya Kamsky [ 14/Jul/20 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Here is an interesting observation of where this fails and where it succeeds:
Note that up to eight characters after the decimal converts just fine (even though we don't technically support nanoseconds or even microseconds, so they properly just drop the digits past first three). The 9th character when it's a number causes failure. If it's a letter (any letter) then it's ignored. If there are only eight characters after the dot, it works whether that character is a number or a letter. So it appears that it always looks to strip off the last letter (if there is more than one it fails) and then as long as you're left with at most eight digits, it succeeds. |