[SERVER-19388] assertion in sort.cpp Created: 14/Jul/15  Updated: 05/Feb/16  Resolved: 15/Jul/15

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: None
Fix Version/s: 3.1.6

Type: Bug Priority: Major - P3
Reporter: Eric Milkie Assignee: David Storch
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-13732 Predicates in top-level implicit AND ... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Quint Iteration 6
Participants:

 Description   

We have a build that runs nightly that runs the MMS integration tests against the head of MongoDB master. This morning we had the following failure.

FAILED:  com.xgen.svc.brs.dao.RestoreJobDaoIntTests.testPullRestoreJob
Error Message:
{ "serverUsed" : "127.0.0.1:26000" , "ok" : 0.0 , "errmsg" : "assertion src/mongo/db/exec/sort.cpp:233" , "code" : 8}
 
Stack Trace:
com.mongodb.CommandFailureException: { "serverUsed" : "127.0.0.1:26000" , "ok" : 0.0 , "errmsg" : "assertion src/mongo/db/exec/sort.cpp:233" , "code" : 8}
        at com.mongodb.CommandResult.getException(CommandResult.java:76)
        at com.mongodb.CommandResult.throwOnError(CommandResult.java:140)
        at com.mongodb.DBCollection.findAndModify(DBCollection.java:484)
        at com.mongodb.DBCollection.findAndModify(DBCollection.java:424)
        at com.mongodb.DBCollection.findAndModify(DBCollection.java:532)
        at com.xgen.svc.brs.dao.RestoreJobDao.getRestoreJobDocument(RestoreJobDao.java:684)
        at com.xgen.svc.brs.dao.RestoreJobDao.getNewJob(RestoreJobDao.java:80)
        at com.xgen.svc.brs.dao.RestoreJobDao.getNewJob(RestoreJobDao.java:57)
        at com.xgen.svc.brs.dao.RestoreJobDaoIntTests.testPullRestoreJob(RestoreJobDaoIntTests.java:166)



 Comments   
Comment by Githook User [ 15/Jul/15 ]

Author:

{u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}

Message: SERVER-19388 fix assertion in $or planning
Branch: master
https://github.com/mongodb/mongo/commit/3f301ac62ea983d864f9a5ad017876171e2f1104

Comment by J Rassi [ 14/Jul/15 ]

Filed SERVER-19394 (SORT_MERGE doesn't perform sort key generation) and SERVER-19397 (sort key generation doesn't handle complex $or queries). Also filed SERVER-19402 to consider changing the query sort semantics, which would get us out of this business of having special array handling for sort.

Per discussion with Dave, the tentative plan for this ticket is to revert the changes to the three-argument version of CanonicalQuery::canonicalize(). This wouldn't fix the issue of the wrong predicate being passed to the sort key generation process inside subplanned clauses, but I believe this is benign, as it does not affect the tags generated for the subplanned clause. We will need some other fix for viewing the plan cache entries for subplanned clauses, though (which was the original motivation for changing the three-argument version of canonicalize()); Dave will file a ticket for this.

Comment by J Rassi [ 14/Jul/15 ]

The regression was introduced by the fact that the three-argument CanonicalQuery::canonicalize() was changed in 15c72c85 to initialize the underlying LiteParsedQuery with a filter serialized from the parsed MatchExpression, instead of using the original query object for the filter. The issue is that the sort stage key generator re-parses the LiteParsedQuery's filter in order to generate bounds, and query predicates created from the parsing+serialization process aren't necessarily valid for re-parsing.

The issue can be reproduced as follows:

db.foo.ensureIndex({a:1,c:1})
db.foo.ensureIndex({a:1,d:1})
db.foo.find({$or:[{a:{$ne:1}},{a:2}]}).sort({b:1}) // trips statusWithQueryForSort.isOK() assertion in sort.cpp

When the subplanner receives the above query, it invokes the three-argument canonicalize() function in order to create a new CanonicalQuery object for the first OR child. The re-underlying LiteParsedQuery's re-serialized filter for this child is {$not: {a: {$eq: 1}}}, and the plan enumeration process for this child generates two SORT <= FETCH <= IXSCAN plans. When the plan selection process for this child is invoked, the sort stage attempts to re-parse the LiteParsedQuery's filter as part of sort key generation; this fails with the error "unknown top level operator: $not".

Reverting changes to canonical_query.cpp from the above commit fixes the immediate issue of tripping the assertion in sort.cpp, however I believe it's actually incorrect for the full $or expression to be used as the predicate for sort key generation here. I'll file tickets separately for the issue(s) I've uncovered in the sort key generation process as part of investigating this bug report, and link them here.

Comment by Cory Mintz [ 14/Jul/15 ]

MongoDB log:

-07-14T06:11:38.904-0400 I -        [conn266] Assertion failure statusWithQueryForSort.isOK() src/mongo/db/exec/sort.cpp 233
2015-07-14T06:11:38.913-0400 I CONTROL  [conn266]
 0x10dd6b2 0x108bcd4 0x10780d4 0xaa36b3 0xaa3e9d 0xaa4c31 0xa8bcef 0xa8c7c7 0xaacd5c 0xaae89f 0xc6f88e 0xc70c1d 0xc714af 0xc3753f 0x9c3d97 0xa4b782 0xa4c14a 0x99eead 0xb36993 0xb39240 0x839b8d 0x1098b45 0x3ab56079d1 0x3ab4ee8b6d
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"CDD6B2"},{"b":"400000","o":"C8BCD4"},{"b":"400000","o":"C780D4"},{"b":"400000","o":"6A36B3"},{"b":"400000","o":"6A3E9D"},{"b":"400000","o":"6A4C31"},{"b":"400000","o":"68BCEF"},{"b":"400000","o":"68C7C7"},{"b":"400000","o":"6ACD5C"},{"b":"400000","o":"6AE89F"},{"b":"400000","o":"86F88E"},{"b":"400000","o":"870C1D"},{"b":"400000","o":"8714AF"},{"b":"400000","o":"83753F"},{"b":"400000","o":"5C3D97"},{"b":"400000","o":"64B782"},{"b":"400000","o":"64C14A"},{"b":"400000","o":"59EEAD"},{"b":"400000","o":"736993"},{"b":"400000","o":"739240"},{"b":"400000","o":"439B8D"},{"b":"400000","o":"C98B45"},{"b":"3AB5600000","o":"79D1"},{"b":"3AB4E00000","o":"E8B6D"}],"processInfo":{ "mongodbVersion" : "3.1.6-pre-", "gitVersion" : "63863aefa21b33e6f84b8b466f70581dd921dba7", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "2.6.32-358.14.1.el6.x86_64", "version" : "#1 SMP Mon Jun 17 15:54:20 EDT 2013", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000" }, { "b" : "7FFF17AFF000", "elfType" : 3 }, { "path" : "/lib64/librt.so.1", "elfType" : 3 }, { "path" : "/lib64/libdl.so.2", "elfType" : 3 }, { "path" : "/usr/lib64/libstdc++.so.6", "elfType" : 3 }, { "path" : "/lib64/libm.so.6", "elfType" : 3 }, { "path" : "/lib64/libgcc_s.so.1", "elfType" : 3 }, { "path" : "/lib64/libpthread.so.0", "elfType" : 3 }, { "path" : "/lib64/libc.so.6", "elfType" : 3 }, { "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3 } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x32) [0x10dd6b2]
 mongod(_ZN5mongo10logContextEPKc+0x134) [0x108bcd4]
 mongod(_ZN5mongo12verifyFailedEPKcS1_j+0xB4) [0x10780d4]
 mongod(_ZN5mongo21SortStageKeyGenerator16getBoundsForSortERKNS_7BSONObjES3_+0x883) [0xaa36b3]
 mongod(_ZN5mongo21SortStageKeyGeneratorC1EPKNS_10CollectionERKNS_7BSONObjES6_+0x7DD) [0xaa3e9d]
 mongod(_ZN5mongo9SortStage4workEPm+0x331) [0xaa4c31]
 mongod(_ZN5mongo14MultiPlanStage12workAllPlansEmPNS_15PlanYieldPolicyE+0xFF) [0xa8bcef]
 mongod(_ZN5mongo14MultiPlanStage12pickBestPlanEPNS_15PlanYieldPolicyE+0x87) [0xa8c7c7]
 mongod(_ZN5mongo12SubplanStage23choosePlanForSubqueriesEPNS_15PlanYieldPolicyE+0x23C) [0xaacd5c]
 mongod(_ZN5mongo12SubplanStage12pickBestPlanEPNS_15PlanYieldPolicyE+0x8F) [0xaae89f]
 mongod(_ZN5mongo12PlanExecutor12pickBestPlanENS0_11YieldPolicyE+0x3E) [0xc6f88e]
 mongod(_ZN5mongo12PlanExecutor4makeEPNS_16OperationContextESt10unique_ptrINS_10WorkingSetESt14default_deleteIS4_EES3_INS_9PlanStageES5_IS8_EES3_INS_13QuerySolutionES5_ISB_EES3_INS_14CanonicalQueryES5_ISE_EEPKNS_10CollectionERKSsNS0_11YieldPolicyE+0x14D) [0xc70c1d]
 mongod(_ZN5mongo12PlanExecutor4makeEPNS_16OperationContextESt10unique_ptrINS_10WorkingSetESt14default_deleteIS4_EES3_INS_9PlanStageES5_IS8_EES3_INS_13QuerySolutionES5_ISB_EES3_INS_14CanonicalQueryES5_ISE_EEPKNS_10CollectionENS0_11YieldPolicyE+0xBF) [0xc714af]
 mongod(_ZN5mongo17getExecutorUpdateEPNS_16OperationContextEPNS_10CollectionEPNS_12ParsedUpdateEPNS_7OpDebugE+0xA2F) [0xc3753f]
 mongod(_ZN5mongo16CmdFindAndModify3runEPNS_16OperationContextERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderE+0x7E7) [0x9c3d97]
 mongod(_ZN5mongo7Command3runEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS3_21ReplyBuilderInterfaceE+0x322) [0xa4b782]
 mongod(_ZN5mongo7Command11execCommandEPNS_16OperationContextEPS0_RKNS_3rpc16RequestInterfaceEPNS4_21ReplyBuilderInterfaceE+0x3CA) [0xa4c14a]
 mongod(_ZN5mongo11runCommandsEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS2_21ReplyBuilderInterfaceE+0x1AD) [0x99eead]
 mongod(+0x736993) [0xb36993]
 mongod(_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0x940) [0xb39240]
 mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortE+0xDD) [0x839b8d]
 mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x265) [0x1098b45]
 libpthread.so.0(+0x79D1) [0x3ab56079d1]
 libc.so.6(clone+0x6D) [0x3ab4ee8b6d]
-----  END BACKTRACE  -----

Comment by Eric Milkie [ 14/Jul/15 ]

Cory says the query that caused the failure has a mix of $or and $and, so I'm suspecting the recent change to $or query planning may be related.

Generated at Thu Feb 08 03:50:47 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.