[SERVER-18266] Standardize token-length limits between RLP and Snowball Created: 29/Apr/15  Updated: 05/Feb/16  Resolved: 09/Jun/15

Status: Closed
Project: Core Server
Component/s: Text Search
Affects Version/s: 3.1.2
Fix Version/s: 3.1.5

Type: Bug Priority: Major - P3
Reporter: Kamran K. Assignee: Mark Benvenuto
Resolution: Done Votes: 0
Labels: 32qa
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-18372 QuerySolution leak when exception is ... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

(function() {
    'use strict';
 
    var t = db.fts_rlp;
    t.drop();
 
    assert.commandWorked(t.ensureIndex({a: 'text'}));
 
    assert.writeOK(t.insert({a: new Array(1024 * 16 + 2).join('a'), language: 'en'}));
    assert.writeOK(t.insert({a: new Array(1024 * 16 + 2).join('a'), language: 'zht'}));
}());

Sprint: Platform 4 06/05/15, Platform 5 06/26/16
Participants:

 Description   

The 16KB limit for RLP tokens leads to inconsistencies when issuing write operations across different languages. It'd be nice to remove (or sufficiently increase) the limit to make the language handling transparent to clients:

> db.foo.insert({a: new Array(1024 * 16 + 2).join('a'), language: 'en'});
WriteResult({ "nInserted" : 1 })
 
> db.foo.insert({a: new Array(1024 * 16 + 2).join('a'), language: 'zht'});
WriteResult({
	"nInserted" : 0,
	"writeError" : {
		"code" : 28632,
		"errmsg" : "Maximum token size reached"
	}
})



 Comments   
Comment by Githook User [ 09/Jun/15 ]

Author:

{u'username': u'markbenvenuto', u'name': u'Mark Benvenuto', u'email': u'mark.benvenuto@mongodb.com'}

Message: SERVER-18266: Standardize token-length limits between RLP and Snowball
Branch: master
https://github.com/10gen/mongo-enterprise-modules/commit/fb196ac4b4b91eddf5c0667e6f8c47ff81781da3

Comment by Kamran K. [ 07/May/15 ]

The leak is now filed as SERVER-18372.

Comment by J Rassi [ 07/May/15 ]

kamran.khan, thanks for finding this leak. Could you file a separate ticket in the Querying component, please?

Comment by Kamran K. [ 07/May/15 ]

Triggering the "Maximum token size reached" uassert causes a leak when querying:

> db.foo.ensureIndex({a: 'text'})
{
	"createdCollectionAutomatically" : false,
	"numIndexesBefore" : 2,
	"numIndexesAfter" : 2,
	"note" : "all indexes already exist",
	"ok" : 1
}
> db.foo.find({$text: {$search: new Array(1024 * 16 + 2).join('a'), $language: 'zht'}});
Error: error: { "$err" : "Maximum token size reached", "code" : 28632 }

==19613== Thread 1:
==19613== 37,235 (40 direct, 37,195 indirect) bytes in 1 blocks are definitely lost in loss record 5,468 of 5,481
==19613==    at 0x4C2B0E0: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==19613==    by 0x173FD58: mongo::QueryPlannerAnalysis::analyzeDataAccess(mongo::CanonicalQuery const&, mongo::QueryPlannerParams const&, mongo::QuerySolutionNode*) (planner_analysis.cpp:531)
==19613==    by 0x174B468: mongo::QueryPlanner::plan(mongo::CanonicalQuery const&, mongo::QueryPlannerParams const&, std::vector<mongo::QuerySolution*, std::allocator<mongo::QuerySolution*> >*) (query_planner.cpp:772)
==19613==    by 0x16F1163: mongo::(anonymous namespace)::prepareExecution(mongo::OperationContext*, mongo::Collection*, mongo::WorkingSet*, mongo::CanonicalQuery*, unsigned long, mongo::PlanStage**, mongo::QuerySolution**) (get_executor.cpp:364)
==19613==    by 0x16F1F22: mongo::getExecutor(mongo::OperationContext*, mongo::Collection*, mongo::CanonicalQuery*, mongo::PlanExecutor::YieldPolicy, mongo::PlanExecutor**, unsigned long) (get_executor.cpp:453)
==19613==    by 0x16F31FD: mongo::getExecutorFind(mongo::OperationContext*, mongo::Collection*, mongo::NamespaceString const&, mongo::CanonicalQuery*, mongo::PlanExecutor::YieldPolicy, mongo::PlanExecutor**) (get_executor.cpp:641)
==19613==    by 0x16EDEEB: mongo::runQuery(mongo::OperationContext*, mongo::QueryMessage&, mongo::NamespaceString const&, mongo::CurOp&, mongo::Message&) (find.cpp:560)
==19613==    by 0x15B889C: mongo::receivedQuery(mongo::OperationContext*, mongo::NamespaceString const&, mongo::Client&, mongo::DbResponse&, mongo::Message&) (instance.cpp:368)
==19613==    by 0x15B9205: mongo::assembleResponse(mongo::OperationContext*, mongo::Message&, mongo::DbResponse&, mongo::HostAndPort const&) (instance.cpp:504)
==19613==    by 0x128B8A9: mongo::MyMessageHandler::process(mongo::Message&, mongo::AbstractMessagingPort*) (db.cpp:167)
==19613==    by 0x1B12528: mongo::PortMessageServer::handleIncomingMsg(void*) (message_server_port.cpp:227)
==19613==    by 0x640F181: start_thread (pthread_create.c:312)

Generated at Thu Feb 08 03:47:07 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.