[SERVER-74002] Day-of-year (%j) parsing inconsistency Created: 14/Feb/23  Updated: 30/Mar/23  Resolved: 30/Mar/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor - P4
Reporter: Maxim Katcharov Assignee: Kyle Suarez
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
related to SERVER-50336 $dateFromString support for additiona... Closed
Assigned Teams:
Query Execution
Sprint: QE 2023-03-06, QE 2023-03-20, QE 2023-04-03
Participants:

 Description   

When I specify both %j for day of year and %m for month {'$dateFromString': {'dateString': '2007-01-11', 'format': '%Y-%j-%m'}} (note the order), the result is 2007-11-02T00:00:00.000Z - $j is treated like $d (but with an initial value of 0).

If I specify %Y-%j-%m-%d (input 2007-01-11-05), %j is ignored, and the result is 2007-11-05T00:00:00.000Z.

The strings "%Y-%m-%d-%j" and "%Y-%j-%m-%d" give different results for the same (re-ordered) values.

Since it is always an error to specify a day of year together with either a month or day of month, I would expect an error (similar to how we treat potentially incompatible timezones).



 Comments   
Comment by Ana Meza [ 14/Mar/23 ]

Assigning this to you kyle.suarez@mongodb.com to raise DOCS ticket

Comment by Kyle Suarez [ 07/Mar/23 ]

After discussion with kateryna.kamenieva@mongodb.com, amr.elhelw@mongodb.com and others at the needs triage meeting, we don't feel that the engineering effort is worth fixing this ambiguous case.

I will file a DOCS ticket to consider documenting the undefined behavior.

Comment by Jennifer Peshansky (Inactive) [ 14/Feb/23 ]

I see a few options for resolving this.

  1. We could resolve it on the mongo side, the way we do when conflicting timezone information is specified. However, conflicting timezones are a result of an argument passed to fromString conflicting with information inside the string we're parsing to a date. This, on the other hand, is a conflict between two format specifiers both setting the same field.
  2. We could try to resolve this on the timelib side by having %m surface an error if the month has previously been set (same for %d, %j, etc.) This seems like the cleaner approach as long as it does not cause any performance issues. However, this could conflict with existing design decisions for timelib, and would potentially be a breaking change.
  3. We could live with this silent bug, citing that we do not make guarantees for the parsing behavior of invalid date formats.

cc arun.banala@mongodb.com kyle.suarez@mongodb.com

Comment by Jennifer Peshansky (Inactive) [ 14/Feb/23 ]

When we call dateFromString, we call a function from timelib and pass along the date it returns.

The reason this behavior occurs is because timelib loops over the format specifiers one by one, and does not check that each one it is processing has no conflicts with values that have already been set.

So in the case of {'$dateFromString': {'dateString': '2007-01-11', 'format': '%Y-%j-%m'}}, timelib:

  1. sees %Y and sets the year to 2007.
  2. sees %j and sets m = 1, d = (1+1) = 2.
  3. sees %m and sets m = 11

In other words, %m overrides the month that was previously set by %j, but the day remains unchanged.

%Y-%j-%m-%d behaves as if it ignores %j because %m overwrites the month and %d overwrites the day. But in %Y-%m-%d-%j, %j overwrites what both %m and %d have set.

Generated at Thu Feb 08 06:26:13 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.