[SERVER-405] dates before epoch sorting Created: 03/Nov/09  Updated: 12/Jul/16  Resolved: 08/Aug/11

Status: Closed
Project: Core Server
Component/s: Indexing
Affects Version/s: None
Fix Version/s: 1.9.1

Type: Bug Priority: Critical - P2
Reporter: Eliot Horowitz Assignee: Dwight Merriman
Resolution: Fixed Votes: 49
Labels: None

Issue Links:
Depends
depends on SERVER-1845 add version number to indexes Closed
is depended on by SERVER-960 Dates before epoch not retrieved corr... Closed
Related
is related to SERVER-429 btree change (Sorting and other) mast... Closed
Participants:

 Comments   
Comment by Durran Jordan [ 16/Apr/10 ]

Can we get a better description on this issue since the other I submitted was closed as a duplicate? This is a pretty large bug in my opinion (we cant query times accurately at all) but this one can't get votes if the public doesn't don't know what it is.

Thanks!

Comment by Eliot Horowitz [ 16/Apr/10 ]

the server is using an unsigned number rather than signed for dates, so anything before the epoch rolls over far into the future.
so this will cause all sorts of problems with dates before 1970

the annoying part of this case is that if there is an existing index with an old date, and we make this change without forcing an index-rebuild, it could lead to corruption.

so we've been holding off until we're ready to force a file format change

Comment by Durran Jordan [ 17/Apr/10 ]

Well I am considering handling this in the meantime at the ODM level, and passing the longs to the db to store... When querying I'd have to do this conversion as well... Do you think this a viable workaround for the moment - I'll need to supply a migration script for all that use Mongoid if I do this - just want to get an opionion first... Right now it's affecting us in a large way (not sure how we missed this before) in that we can't query for people who are over age 40.

Comment by Eliot Horowitz [ 17/Apr/10 ]

That would work though doing that migration is even more costly than what we would have to do.
There may be other creative solutions, i'll think about it for a while.
The trick is selectively upgrading indexes - can do a scan, but that at startup would have to scan all indexes, which is expensive.
Though perhaps I have an idea...
Should be easy to find bad index keys actually...

Comment by Durran Jordan [ 29/Apr/10 ]

If this is geared to 1.7.x - is that around Q4 2010?

Comment by Eliot Horowitz [ 14/May/10 ]

end Q3/early Q4

Comment by Jerry Cheung [ 17/Jun/10 ]

mongoimport also seems to have problems parsing dates pre-epoch:

exception:Invalid use of reserved field name
{"dob":

{"$date":-1884384000000}

, ... }

Comment by Eliot Horowitz [ 23/Sep/10 ]

Rather than forcing every index to be re-created, we can correct the behavior for new indexes and leave old ones as they were.

Comment by Nathan Ehresman [ 13/Jan/11 ]

Unfortunately the target date for this keeps getting bumped back. Can someone give us any idea what "2011 - phase 1" means, and when this is scheduled to happen? We've been developing our project under the assumption that this was going to make it into 1.8, but we're going to need to do something else for dates at this point.

Comment by Eliot Horowitz [ 13/Jan/11 ]

There was a change made in 1.8 that had to be released before we could safely do this.
This will be in 2.0, and in an early 1.9

Comment by Vicente Mundim [ 21/Mar/11 ]

Eliot,

Could you explain the complexity of implementing this? I'm a little confused here. How datetimes are store in mongodb, and why sort works this way?

I have an application which contains videos that were exhibited at dates before epoch. When I use sort(

{exhibited_at: -1}

) it brings me videos from 1969, 1965, etc first, then videos from 2011. I'm considering the option of storing longs instead of datetimes, but this seems like a hack...

I have to agree with Durran, this is a pretty large bug, and from the change history, it is being pushed again and again to next releases. Can we be confident that this will be released in 2.0?

Comment by Scott Monsees [ 21/Mar/11 ]

This is a big deal for me as well.

Comment by Eliot Horowitz [ 21/Mar/11 ]

Its trivial to fix going forward, the problem is that if done nievely, every exiting index would have to be reIndexes or things would be corrupt.
In 1.8 we took steps such that we can do this without forcing a full reIndex.
So in 2.0 we'll be able to fix.

Comment by Vicente Mundim [ 22/Mar/11 ]

Thanks for the clarification Eliot. My application is not yet in production, so I could easily reIndex it. Maybe that's the case of other people here. Is there any way to create a patch for 1.8, since it is trivial to fix?

I could make it if I knew where to start.

Comment by Eliot Horowitz [ 22/Mar/11 ]

The key is bson/util/misc.h

// TODO: make signed (and look for related TODO's)
unsigned long long millis;

But there are a few changes and needs a lot of testing.
If you miss a spot for example, you'll get data corruption.

Comment by Vicente Mundim [ 22/Mar/11 ]

I've followed your advice and manage to make it work. Basically, I've changed Date_t declaration in bson/util/misc.h to:

struct Date_t {
long long millis;
Date_t(): millis(0) {}
Date_t(unsigned long long m): millis(static_cast<long long>(m)) {}
operator long long&()

{ return millis; }
operator const long long&() const { return millis; }

string toString() const

{ char buf[64]; time_t_to_String(millis/1000, buf); return buf; }

};

I've maintained the 'unsigned long long' constructor, which typecasts it to 'long long'. I've also changed the return of operators to 'long long' and the millis variable.

I've found two related TODOs, one in scripting/engine.cpp:

case Date:
// TODO: make signed
builder.appendDate( fieldName , Date_t((unsigned long long)getNumber( scopeName )) );
break;

And another one in scripting/sm_db.cpp:

if ( JS_InstanceOf( c->_context , o, &js_DateClass , 0 ) )

{ jsdouble d = js_DateGetMsecSinceEpoch( c->_context , o ); //TODO: make signed b.appendDate( name , Date_t((unsigned long long)d) ); return true; }

I didn't changed those files as the constructor of Date_t will already typecast the values in both of them.

Since I'm not a C++ expert, and I'm not familiar with MongoDB code, I don't know if there are other places where a change is needed.

Any suggestions or advices here?

Comment by Leif Mortenson [ 07/Apr/11 ]

It sounds like much of the delays on resolving this issue revolve around concerns with corruption of data in the indexes. I have not gone into the source so sorry if this a silly question. Would it be possible to add a format version number to the beginning of all indexes and collections? The first format version would be identified by the lack of the version identifier. MongoDB would load in these versions ids on startup so it knew how to handle the data.

This way applications which need to continue using the current broken format could continue to do so, but new collections or indexes would use version 2 (with the fix). If you rebuilt your index or collections then their versions would be incremented to the newest version.

The drawback is that there would be two versions of some data structures and the related logic. These upgrades could probably be removed after a full major version increase. If would let you upgrade from 1.6.5 to 1.8.x without rebuilding indexes and data, but you would need to go all the way up to 2.0.x for example.

I spent much of a day trying to figure out why certain of our queries were not working and it turned out to be this issue. We make heavy use of dates with some of it well before 1970 so we are also looking forward to a fix for this.

Cheers,
Leif

Comment by Dwight Merriman [ 01/Jul/11 ]

this should work now but i'm not resolving until i add a test

Comment by auto [ 06/Jul/11 ]

Author:

{u'login': u'dwight', u'name': u'dwight', u'email': u'dwight@10gen.com'}

Message: SERVER-405 signed dates a test
Branch: master
https://github.com/mongodb/mongo/commit/c0130065860db820488cdfaa55b43805a017e7ed

Comment by Tony Hannan [ 08/Aug/11 ]

An index created on a date field before upgrading to version 1.9.1 still sorts incorrectly after upgrading. Is this by design? I gather yes from Eliot's comment above "Rather than forcing every index to be re-created, we can correct the behavior for new indexes and leave old ones as they were". After dropping and recreating the index, it sorts correctly.

Comment by Tony Hannan [ 08/Aug/11 ]

Old index answers range queries differently if $lt comparison used. To recreate:

In old version (1.8.2):
> db.foo.save(

{x:1, d: new Date(1000)}

)
> db.foo.save(

{x:2, d: new Date(-1000)}

)
> db.foo.ensureIndex(

{d: 1}

)

In new version (1.9.2-pre) on same db files:
> db.foo.find({},{_id:0})

{ "x" : 1, "d" : ISODate("1970-01-01T00:00:01Z") } { "x" : 1, "d" : ISODate("1969-12-31T23:59:59Z") }

> db.foo.find({d:{$lt:new Date(0)}},{_id:0})
> db.foo.find({d:{$gt:new Date(0)}},{_id:0})

{ "x" : 1, "d" : ISODate("1970-01-01T00:00:01Z") }

Notice $lt incorrectly returns empty set.
This may also be by design if we are ignoring old indexes.

Comment by Tony Hannan [ 08/Aug/11 ]

Eliot or Dwight, can you answer above two comments? In a nutshell, is it OK that old indexes don't return right results or sort correctly?

Comment by Eliot Horowitz [ 08/Aug/11 ]

@tony - Yes - old indexes will still not sort correctly.

Comment by Tony Hannan [ 08/Aug/11 ]

They also don't filter correctly. Is that OK too? See my $lt, $gt example above.

Comment by Eliot Horowitz [ 08/Aug/11 ]

Yes - no changes for old indexes.

Comment by auto [ 09/Aug/11 ]

Author:

{u'login': u'TonyGen', u'name': u'Tony Hannan', u'email': u'tony@10gen.com'}

Message: new jstests/date3.js for SERVER-405 & SERVER-960
Branch: master
https://github.com/mongodb/mongo/commit/ad2f56a651b0b25f7a63ca4f533bdbc48b8c6ef0

Comment by auto [ 25/Aug/11 ]

Author:

{u'login': u'stbrody', u'name': u'Spencer T Brody', u'email': u'spencer@10gen.com'}

Message: Add more tests for sorting pre-epoch dates. SERVER-405 & QA-40.
Branch: master
https://github.com/mongodb/mongo/commit/812e49a673293f34a09f091cb5e9124cc5685c6d

Generated at Thu Dec 13 08:24:13 UTC 2018 using Jira 7.12.1#712002-sha1:609a50578ba6bc73dbf8b05dddd7c04a04b6807c.