[SERVER-10285] Seg Fault when parsing Point using 2dsphere Indexes Created: 22/Jul/13  Updated: 11/Jul/16  Resolved: 04/Oct/13

Status: Closed
Project: Core Server
Component/s: Geo
Affects Version/s: 2.4.5
Fix Version/s: 2.4.7, 2.5.3

Type: Bug Priority: Critical - P2
Reporter: Adrian Grealish Assignee: hari.khalsa@10gen.com
Resolution: Done Votes: 1
Labels: crash
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

AWS EC2 machine running latest stable build of mongo 2.4.5


Attachments: Zip Archive messages.mongo2.log.zip    
Operating System: Linux
Steps To Reproduce:

Hard to know, we ran the latest in production for 3 whole days before we saw any problem. We are storing Points (lat, lon) and using 2dspherical index and query ing Mongo using geoNear command.

We are using the latest mongo java driver 2.9.3

Participants:

 Description   
Issue Status as of October 22nd, 2013

ISSUE SUMMARY
In very rare and intermittent cases, queries that use a 2dsphere index with legacy coordinate pairs crashed the server. The crash occurred when the server converted a point from lat / long to radians but floating-point inaccuracy resulted in values that were slightly out of bounds.

USER IMPACT
Rare segfaults.

SOLUTION
After converting a legacy coordinate pair from lat / long to radians and attempting to normalize it (i.e., wrapping its lat and long to valid radian values), the server now checks if the point is valid. If not, it uasserts with "coords invalid after normalization" instead of crashing.

WORKAROUNDS
None.

PATCHES
Production release v2.4.7 contains the fix for this issue, and production release v2.6.0 will contain the fix as well.

Our production system crashed hard yesterday afternoon with what looks like a Segmentation Fault parsing a Point. It took out all 6 Nodes including the master.

We unfortunately don't have the query (lat,lon) that cause the issue. Here is the stack trace on master.

There seem to be a lot of GEO code changes in this release https://jira.mongodb.org/browse/SERVER-8349 and https://github.com/mongodb/mongo/commit/ba239918c950c254056bf589a943a5e88fd4144c

We are rolling back to previous version before all these changes were made.

Jul 22 00:08:06 ip-10-38-67-89 mongod.27017[8632]: Backtrace:#0120xdd9e31 0x6d0d09 0x7fe2f0f01920 0x7fe2f0f018a5 0x7fe2f0f03085 0xea2752 0x992d5f 0x994d9d 0x9a7769 0x98a933 0x8d4f0a 0x8d7042 0x8d80b2 0xa7e220 0xa82aec 0x9f6919 0x9f7e43 0x6e8b68 0xdc659e 0x7fe2f1c10851 #012 /opt/mongodb/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xdd9e31]#012 /opt/mongodb/bin/mongod(_ZN5mongo10abruptQuitEi+0x399) [0x6d0d09]#012 /lib64/libc.so.6(+0x32920) [0x7fe2f0f01920]#012 /lib64/libc.so.6(gsignal+0x35) [0x7fe2f0f018a5]#012 /lib64/libc.so.6(abort+0x175) [0x7fe2f0f03085]#012 /opt/mongodb/bin/mongod(_ZNK8S2LatLng7ToPointEv+0xf2) [0xea2752]#012 /opt/mongodb/bin/mongod(_ZN5mongo9GeoParser10parsePointERKNS_7BSONObjEP7Vector3IdE+0x17f) [0x992d5f]#012 /opt/mongodb/bin/mongod(_ZN5mongo9NearQuery16parseFromGeoNearERKNS_7BSONObjEd+0x45d) [0x994d9d]#012 /opt/mongodb/bin/mongod(_ZN5mongo18run2DSphereGeoNearERKNS_12IndexDetailsERNS_7BSONObjERKNS_16GeoNearArgumentsERSsRNS_14BSONObjBuilderE+0x179) [0x9a7769]#012 /opt/mongodb/bin/mongod(_ZN5mongo16Geo2dFindNearCmd3runERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x773) [0x98a933]#012 /opt/mongodb/bin/mongod(_ZN5mongo12_execCommandEPNS_7CommandERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x3a) [0x8d4f0a]#012 /opt/mongodb/bin/mongod(_ZN5mongo7Command11execCommandEPS0_RNS_6ClientEiPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0xc02) [0x8d7042]#012 /opt/mongodb/bin/mongod(_ZN5mongo12_runCommandsEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x5f2) [0x8d80b2]#012 /opt/mongodb/bin/mongod(_ZN5mongo11runCommandsEPKcRNS_7BSONObjERNS_5CurOpERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x40) [0xa7e220]#012 /opt/mongodb/bin/mongod(_ZN5mongo8runQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1_+0xd7c) [0xa82aec]#012 /opt/mongodb/bin/mongod() [0x9f6919]#012 /opt/mongodb/bin/mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0x383) [0x9f7e43]#012 /opt/mongodb/bin/mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21Abs



 Comments   
Comment by auto [ 02/Oct/13 ]

Author:

{u'username': u'monkey101', u'name': u'Dan Pasette', u'email': u'dan@10gen.com'}

Message: SERVER-10285 uassert that s2latlng is valid, print out if it's not

Backport original commit: ce6607fd28df1cc7c57a6ea03dcb0676e7a009ce
Branch: v2.4
https://github.com/mongodb/mongo/commit/33865cba588b666b32b75c42f309b26d21652762

Comment by Daniel Pasette (Inactive) [ 16/Sep/13 ]

The fix has not been backported to the 2.4 branch yet. 2.5.3 will be a development release only, not meant for production use. It would be great if you can test 2.5.3 in a development environment though when it comes out.

Comment by Eyllo [ 11/Sep/13 ]

Hi there, so does this mean this is solved? and that we should upgrade to 2.5.3?
Thank you very much!

Comment by auto [ 10/Sep/13 ]

Author:

{u'username': u'hkhalsa', u'name': u'Hari Khalsa', u'email': u'hkhalsa@10gen.com'}

Message: SERVER-10285 uassert that s2latlng is valid, print out if it's not
Branch: master
https://github.com/mongodb/mongo/commit/ce6607fd28df1cc7c57a6ea03dcb0676e7a009ce

Comment by Eyllo [ 22/Aug/13 ]

Hi Hari,

I dunno if this makes any difference or not but we are using a 2d index together with a 2dsphere index.
location_2d using as key: location" : "2d”
scenarioId_1_loc_2dsphere using as key: "loc" : "2dsphere, scenarioId" : 1

Comment by Eyllo [ 22/Aug/13 ]

Hi Hari,
We are also running in AWS EC2 machine running latest stable build of mongo 2.4.5, that is why I posted my issue here. It was the only place I found something related.
I have no request to log as it crashes while doing a DBCursor cursor.hasNext() operation. Maybe this is something data related?
Thanks in advance!

Comment by hari.khalsa@10gen.com [ 21/Aug/13 ]

eyllo Hello. Is there any chance you can log the request and provide me with what is exactly crashing it?

Also, what platform are you running on?

Comment by Eyllo [ 21/Aug/13 ]

Hi,

We are also experiencing a lot of server crashes with something related to this. The crashing problem is really critical as our application also stops just because of some data error? Our server log is as follows:

Tue Aug 20 21:34:28.592 Backtrace:
0xdd9e31 0x6d0d09 0x7f86d0cc49c0 0x7f86d0cc4945 0x7f86d0cc625b 0xea2752 0x98b791 0x98d65b 0x994361 0x995cf5 0x995de9 0x9acfef 0xb417a2 0xb58692 0xb5e5f5 0xb5e7ee 0xa8073a 0xa83838 0x9f6919 0x9f7e43
/usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xdd9e31]
/usr/bin/mongod(_ZN5mongo10abruptQuitEi+0x399) [0x6d0d09]
/lib64/libc.so.6(+0x349c0) [0x7f86d0cc49c0]
/lib64/libc.so.6(gsignal+0x35) [0x7f86d0cc4945]
/lib64/libc.so.6(abort+0x17b) [0x7f86d0cc625b]
/usr/bin/mongod(_ZNK8S2LatLng7ToPointEv+0xf2) [0xea2752]
/usr/bin/mongod(_ZN5mongo9GeoParser16parseLegacyPointERKNS_7BSONObjEP7Vector3IdE+0x101) [0x98b791]
/usr/bin/mongod(_ZN5mongo9GeoParser23parseLegacyCenterSphereERKNS_7BSONObjEP5S2Cap+0x28b) [0x98d65b]
/usr/bin/mongod(_ZN5mongo17GeometryContainer9parseFromERKNS_7BSONObjE+0x861) [0x994361]
/usr/bin/mongod(_ZN5mongo8GeoQuery16parseLegacyQueryERKNS_7BSONObjE+0x195) [0x995cf5]
/usr/bin/mongod(_ZN5mongo8GeoQuery9parseFromERKNS_7BSONObjE+0x19) [0x995de9]
/usr/bin/mongod(_ZNK5mongo11S2IndexType9newCursorERKNS_7BSONObjES3_i+0x27f) [0x9acfef]
/usr/bin/mongod(_ZNK5mongo9QueryPlan9newCursorERKNS_7DiskLocEb+0x62) [0xb417a2]
/usr/bin/mongod(_ZN5mongo15CursorGenerator16singlePlanCursorEv+0x202) [0xb58692]
/usr/bin/mongod(_ZN5mongo15CursorGenerator8generateEv+0xa5) [0xb5e5f5]
/usr/bin/mongod(_ZN5mongo25NamespaceDetailsTransient9getCursorERKNS_10StringDataERKNS_7BSONObjES6_RKNS_24QueryPlanSelectionPolicyERKN5boost10shared_ptrIKNS_11ParsedQueryEEEbPNS_16QueryPlanSummaryE+0x3e) [0xb5e7ee]
/usr/bin/mongod(_ZN5mongo23queryWithQueryOptimizerEiRKSsRKNS_7BSONObjERNS_5CurOpES4_S4_RKN5boost10shared_ptrINS_11ParsedQueryEEES4_RKNS_12ChunkVersionERNS7_10scoped_ptrINS_25PageFaultRetryableSectionEEERNSG_INS_19NoPageFaultsAllowedEEERNS_7MessageE+0x12a) [0xa8073a]
/usr/bin/mongod(ZN5mongo8runQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1+0x1ac8) [0xa83838]
/usr/bin/mongod() [0x9f6919]
/usr/bin/mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0x383) [0x9f7e43]

Comment by hari.khalsa@10gen.com [ 24/Jul/13 ]

Yes, if you can get the query of death it would make fixing the problem much easier. Looking forward to fixing this.

Comment by Adrian Grealish [ 24/Jul/13 ]

we are going to continue to run the load test and have turned up logging on the remaining nodes

Comment by Adrian Grealish [ 24/Jul/13 ]

We just saw this same error on a replica node that crashed while we were running load tests against 2.4.3

Jul 23 22:56:57 ip-10-60-69-130 mongod.27017[18174]: Tue Jul 23 22:56:57.680 [conn17514] command location.$cmd command: { geoNear: "presence", near: [ -0.7597462961209844, 52.02757513346383 ], spherical: true, query: { age:

{ $gte: 88, $lte: 93 }

}, limit: 400, maxDistance: 0.1 } ntoreturn:1 keyUpdates:0 locks(micros) r:232891 reslen:29813 232ms
Jul 23 22:56:57 ip-10-60-69-130 mongod.27017[18174]: Tue Jul 23 22:56:57.688
Jul 23 22:56:57 ip-10-60-69-130 mongod.27017[18174]: Got signal: 6 (Aborted).
Jul 23 22:56:57 ip-10-60-69-130 mongod.27017[18174]:
Jul 23 22:56:57 ip-10-60-69-130 mongod.27017[18174]: Tue Jul 23 22:56:57.703
Jul 23 22:56:57 ip-10-60-69-130 mongod.27017[18174]: Backtrace:#0120xdcf361 0x6cf729 0x7fc24312d920 0x7fc24312d8a5 0x7fc24312f085 0xe97e32 0x98ff7f 0x9933ed 0x9a4999 0x987a39 0x8d236a 0x8d5535 0x8d6592 0xa7c97b 0xa80360 0x9f44d4 0x9f57e2 0x6e747a 0xdbbb7e 0x7fc243e3c851 #012 /opt/mongodb/bin/mongod(ZN5mongo15printStackTraceERSo+0x21) [0xdcf361]#012 /opt/mongodb/bin/mongod(_ZN5mongo10abruptQuitEi+0x399) [0x6cf729]#012 /lib64/libc.so.6(+0x32920) [0x7fc24312d920]#012 /lib64/libc.so.6(gsignal+0x35) [0x7fc24312d8a5]#012 /lib64/libc.so.6(abort+0x175) [0x7fc24312f085]#012 /opt/mongodb/bin/mongod(_ZNK8S2LatLng7ToPointEv+0xf2) [0xe97e32]#012 /opt/mongodb/bin/mongod(_ZN5mongo9GeoParser10parsePointERKNS_7BSONObjEP7Vector3IdE+0x17f) [0x98ff7f]#012 /opt/mongodb/bin/mongod(_ZN5mongo9NearQuery16parseFromGeoNearERKNS_7BSONObjEd+0x45d) [0x9933ed]#012 /opt/mongodb/bin/mongod(_ZN5mongo18run2DSphereGeoNearERKNS_12IndexDetailsERNS_7BSONObjERKNS_16GeoNearArgumentsERSsRNS_14BSONObjBuilderE+0x179) [0x9a4999]#012 /opt/mongodb/bin/mongod(_ZN5mongo16Geo2dFindNearCmd3runERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x759) [0x987a39]#012 /opt/mongodb/bin/mongod(_ZN5mongo12_execCommandEPNS_7CommandERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x3a) [0x8d236a]#012 /opt/mongodb/bin/mongod(_ZN5mongo7Command11execCommandEPS0_RNS_6ClientEiPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0xbd5) [0x8d5535]#012 /opt/mongodb/bin/mongod(_ZN5mongo12_runCommandsEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x5e2) [0x8d6592]#012 /opt/mongodb/bin/mongod(_ZN5mongo11runCommandsEPKcRNS_7BSONObjERNS_5CurOpERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x3b) [0xa7c97b]#012 /opt/mongodb/bin/mongod(_ZN5mongo8runQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1+0xd50) [0xa80360]#012 /opt/mongodb/bin/mongod() [0x9f44d4]#012 /opt/mongodb/bin/mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0x392) [0x9f57e2]#012 /opt/mongodb/bin/mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21Ab

Comment by Adrian Grealish [ 23/Jul/13 ]

We are running an off the shelf AMI on a m1.xlarge instance in EC2.

We were running 2.4.5 until the crash, then we downgraded.

Comment by hari.khalsa@10gen.com [ 23/Jul/13 ]

Hello! The changes you linked were not put into the 2.4 branch.

Looking at the logs, I'm not clear on whether or not you're running 2.4.4 or 2.4.5. There are lines in the logs that suggest it's 2.4.4:

Jul 22 02:09:34 ip-10-40-123-164 mongod.27017[21395]: Mon Jul 22 02:09:34.330 [initandlisten] db version v2.4.4
Jul 22 02:09:34 ip-10-40-123-164 mongod.27017[21395]: Mon Jul 22 02:09:34.330 [initandlisten] git version: 4ec1fb96702c9d4c57b1e06dd34eb73a16e407d2

Can you tell me about what kind of platform/computer you're running this on? Looks like an off-the-shelf Amazon EC2 instance?

Comment by Adrian Grealish [ 22/Jul/13 ]

Not sure if related but we cannot store Point object in the document with this version and latest driver.

The point class generated this JSON from here http://geojson.org/geojson-spec.html#id29

{ "type": "Point", "coordinates": [100.0, 0.0] }

when using Gson. Not sure why BSON can't handle it.

Caused by: java.lang.IllegalArgumentException: can't serialize class com.grindr.presence.model.Point
at org.bson.BasicBSONEncoder._putObjectField(BasicBSONEncoder.java:270)
at org.bson.BasicBSONEncoder.putObject(BasicBSONEncoder.java:174)
at org.bson.BasicBSONEncoder._putObjectField(BasicBSONEncoder.java:226)
at org.bson.BasicBSONEncoder.putObject(BasicBSONEncoder.java:174)
at org.bson.BasicBSONEncoder.putObject(BasicBSONEncoder.java:120)
at com.mongodb.DefaultDBEncoder.writeObject(DefaultDBEncoder.java:27)
at com.mongodb.OutMessage.putObject(OutMessage.java:289)
at com.mongodb.OutMessage.writeUpdate(OutMessage.java:175)
at com.mongodb.OutMessage.update(OutMessage.java:62)
at com.mongodb.DBApiLayer$MyCollection.update(DBApiLayer.java:326)
at com.mongodb.DBCollection.update(DBCollection.java:160)
at com.mongodb.DBCollection.update(DBCollection.java:191)
at com.mongodb.DBCollection.update(DBCollection.java:203)
... 33 more

Comment by Adrian Grealish [ 22/Jul/13 ]

crashed at Jul 22 00:08:03

Comment by Daniel Pasette (Inactive) [ 22/Jul/13 ]

Can you attach more of the log (or whole log compressed) to this ticket?

Generated at Thu Feb 08 03:22:46 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.