Standardize token-length limits between RLP and Snowball

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Done
    • Priority: Major - P3
    • 3.1.5
    • Affects Version/s: 3.1.2
    • Component/s: Text Search
    • Fully Compatible
    • ALL
    • Hide
      (function() {
          'use strict';
      
          var t = db.fts_rlp;
          t.drop();
      
          assert.commandWorked(t.ensureIndex({a: 'text'}));
      
          assert.writeOK(t.insert({a: new Array(1024 * 16 + 2).join('a'), language: 'en'}));
          assert.writeOK(t.insert({a: new Array(1024 * 16 + 2).join('a'), language: 'zht'}));
      }());
      
      Show
      (function() { 'use strict'; var t = db.fts_rlp; t.drop(); assert.commandWorked(t.ensureIndex({a: 'text'})); assert.writeOK(t.insert({a: new Array(1024 * 16 + 2).join('a'), language: 'en'})); assert.writeOK(t.insert({a: new Array(1024 * 16 + 2).join('a'), language: 'zht'})); }());
    • Platform 4 06/05/15, Platform 5 06/26/16
    • None
    • 3
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      The 16KB limit for RLP tokens leads to inconsistencies when issuing write operations across different languages. It'd be nice to remove (or sufficiently increase) the limit to make the language handling transparent to clients:

      > db.foo.insert({a: new Array(1024 * 16 + 2).join('a'), language: 'en'});
      WriteResult({ "nInserted" : 1 })
      
      > db.foo.insert({a: new Array(1024 * 16 + 2).join('a'), language: 'zht'});
      WriteResult({
      	"nInserted" : 0,
      	"writeError" : {
      		"code" : 28632,
      		"errmsg" : "Maximum token size reached"
      	}
      })
      

              Assignee:
              Mark Benvenuto
              Reporter:
              Kamran K. (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: