[SERVER-3556] Using Lookaround with Regex Created: 09/Aug/11  Updated: 30/Mar/12  Resolved: 31/Oct/11

Status: Closed
Project: Core Server
Component/s: JavaScript
Affects Version/s: 1.8.2
Fix Version/s: None

Type: Bug Priority: Minor - P4
Reporter: POITTEVIN Raphael Assignee: Antoine Girbal
Resolution: Done Votes: 0
Labels: eval, java, regex
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

uname -a :
Linux 2.6.38-10-generic-pae #46-Ubuntu SMP Tue Jun 28 16:54:49 UTC 2011 i686 i686 i386 GNU/Linux


Operating System: Linux
Participants:

 Description   

Hello,

I'm using MongoDb in a GWT(2.3) project (with Morphia 0.99).
My GWT project is calling a mongoside function (stored in db.system.js) thanks to DB.eval(). One of my function's argument is a Regex. When i'm using a Regex like "\bFOO\b", it's working fine, but when my regexp become like "(?<!<[^>])\bFOO\b(?>![^<]>)" then, i get this error:

Erreur serveur: eval failed:

{ "assertion" : "assertion scripting/engine_spidermonkey.cpp:634" , "errmsg" : "db assertion failure" , "ok" : 0.0}

com.mongodb.MongoException: eval failed:

{ "assertion" : "assertion scripting/engine_spidermonkey.cpp:634" , "errmsg" : "db assertion failure" , "ok" : 0.0}

at com.mongodb.DB.eval(DB.java:223)

This error occurs in my GWT project (so, in jetty, as it is on the server side of my GWT project).
The regex i'm passing as an argument to my db.eval() is compiled by Java.

So my java code is something like :
String exec = "return serverSideFunction(args[0]);";
Pattern regex = Pattern.compile("(?<!<[^>]

{0,10})foo(?>![^<]{0,10}

>)");
Object result = db.eval(exec, regex);

Nb : In my Pattern.compile(), I use

{0,10}

, as Java doesn't support * in regex's look-around.

Nb2 : I made some little test without all javaDrivers and db.eval(),
this looks like
> db.Test.save(

{'key':'<mark key="value">value</mark>'}

);
> db.Test.find();

{ "_id" : ObjectId("4e40d9c750d109bc7de98858"), "key" : "<mark key=\"value\">value</mark>" }

> db.Test.find().forEach(function

{x.key.replace(new RegExp('(?<!<[^>]*)value(?>![^<]*>)',''),'VALUE'); db.Test.save(x);}

);
Tue Aug 9 09:04:47 SyntaxError: invalid quantifier ?<!<[^>])value(?>![^<]>) (shell):1

The error I get is not the same, but unless i'm doing it wrong, lookaround's features aren't really working in Mongo, aren't them?

This is my first bug reporting ever, so please tell me if I'm doing it wrong.
I stay at your disposal if you need any further information.



 Comments   
Comment by Antoine Girbal [ 31/Oct/11 ]

raphael,
for now it seems the issue is tied to 3rd party library, so marking as resolved.
please reopen if needed

Comment by Antoine Girbal [ 29/Sep/11 ]

The lookbehind feature may not be well supported by the different libraries.
Even though it compiles in java on the client side, it gets passed as a string to the server, which will interpret it using pcre in C++.
Additionally if you try to compare it within javascript (say using a "$where") then it will be interpreted in the JS environment (not sure what spidermonkey uses).

What exactly are you trying to match?
It seems your RE would match both a value within markup and a value with no markup, but not a value with partial markup.
It doesnt seem to be the goal.

Comment by POITTEVIN Raphael [ 09/Aug/11 ]

In fact, I'm using regex in a larger javascript function.

I've been doing some test on my computer, here are the results:

Nb: For all my tests, I created a collection 'Test' :
> db.Test.save(

{'key':'<mark key="value"> value </mark>'}

);
> db.Test.save(

{'key':'value without mark'}

);
> db.Test.find();

{ "_id" : ObjectId("4e413d8c18bb189aed0c7b2f"), "key" : "<mark key=\"value\"> value </mark>" } { "_id" : ObjectId("4e413ffc18bb189aed0c7b30"), "key" : "value without mark" }

Test #1
I first tried to call a simple (in my GWT project, where db is a com.mongodb.DB object):
Pattern regex = Pattern.compile("(?<!<[^>]

{0,50})value(?![^<]{0,50}

>)");
DBCursor result = db.getCollection("Test").find(new BasicDBObject("key", regex));
logger.debug(result);

This returns me
> Cursor id=0, ns=gertrude.Test, query={ "key" : { "$regex" : "(?<!<[^>]

{0,50})value(?![^<]{0,50}

>)" , "$options" : ""}}, numIterated=0

So it's working pretty well (at least, there is no error). But I thought there should be some results. This regex is supposed to match the word 'value' when there is no '<' without '>' before, and no '>' without '<' after (ie : this regex match 'value' when it's not in a markup). I also tried to lunch a find() directly in mongo :
> db.Test.find({ "key" : { "$regex" : "(?<!<[^>]

{0,50})value(?![^<]{0,50}

>)" , "$options" : ""}});
But this had the same result.

I asked myself if Lookaround where allowed in mongo. Therefore I lunched the following line:
> db.Test.find({ "key" : { "$regex" : "(?<!<)mark(?!>)", "$options" : ""}});

{ "_id" : ObjectId("4e413ffc18bb189aed0c7b30"), "key" : "value without mark" }

This is working quite well. It's when I try to add the [^<]* that problems occurs. Any Ideas? Is my regex clumsy? Am I doing something wrong?

Test #2
As I said earlier, I'm not using regex alone. In fact, I'm trying to create a function that make some replacement in some of the entities fields. So regex are used with the String.replace(regex,String) function.

In order to reproduce the error, I simplified my function to the minimum. First of all, I stored that simple function in my mongo database:
db.system.js.save({
_id:'serverSideFunction1',
value:function(regex)

{ return "<mark key=\"value\">value</mark>".replace(regex,"VALUE"); }

});
Nb: the content of my Test collection is style the same.

Then, I just tried to lunch a db.eval() on this function. So my java code looks like that :
Pattern regex = Pattern.compile("(?<!<[^>]

{0,50})value(?![^<]{0,50}

>)");
String exec = "return serverSideFunction1(args[0]);";
logger.debug(regex.matcher("<mark key=\"value\">value</mark>").replaceAll("VALUE"));
CommandResult result = db.doEval(exec, regex);
logger.debug("the call to db.doEval(\"" + exec + "\", " + regex + " ) returned " + result);

When I execute those lines, I get the following debug:
DEBUG - <mark key="value">VALUE</mark>
DEBUG - the call to db.doEval("return serverSideFunction1(args[0]);", (?<!<[^>]

{0,50})value(?![^<]{0,50}

>) ) returned

{ "assertion" : "assertion scripting/engine_spidermonkey.cpp:634" , "errmsg" : "db assertion failure" , "ok" : 0.0}

If I call db.eval() directly in mongo, it doesn't throws any error :
> db.eval("return serverSideFunction1({ '$regex' : '(?<!<[^>]

{0,1})mark(?![^<]{0,1}

>)', '$options' : ''});");
<mark key="value">value</mark>

But the replacement isn't done.

Hope this help you understand what is this problem about. Personally, I have no idea about what's wrong with what I'm doing.

POITTEVIN Raphael

Comment by Eliot Horowitz (Inactive) [ 09/Aug/11 ]

If all you're doing is regex, then you shouldn't use javascript at all.

You would just do

Pattern regex = Pattern.compile("(?<!<[^>]

{0,10})foo(?>![^<]{0,10}

>)");
find( { new BasicDBObject( "foo" , regex ) );

Can you try that?

Generated at Thu Feb 08 03:03:23 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.