[JAVA-366] BSON decoding compiles Java Pattern instances: this is error prone and bad for performance Created: 02/Jun/11  Updated: 31/Mar/15  Resolved: 13/Jun/14

Status: Closed
Project: Java Driver
Component/s: API
Affects Version/s: 2.6.1
Fix Version/s: 3.0.0

Type: Bug Priority: Minor - P4
Reporter: Tal Liron Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Backwards Compatibility: Minor Change

 Description   

In BasicBSONCallback.gotRegEx, Pattern.compile is called on the regular expression that was stored in the BSON structure.

There are two problems with this:

1) JavaScript regular expression are not equivalent to Java regular expressions. This means that it is possible that this Pattern.compile would throw a PatternSyntaxException. (Which is not handled here!) This can be very bad, because it would mean that certain MongoDB documents that contain such regular exceptions would simply be un-retrievable via the Java driver.

2) Pattern.compile is slow, and would currently be done for every single regular expression encountered by the Java driver. However, in many cases the user does not care to use the compiled regular expression as a Pattern instance, and is only interested in the string content of it. This slow regex compilation is thus superfluous.

My recommendation is to have a new dedicated class for BSON regular expressions: org.bson.types.RegExp. This class would not normally compile the pattern into a Java Pattern, however it could offer a compile() method to attempt to do this compilation for users who want to use this pattern, with the documented caveat that they might get a PatternSyntaxException if the regular expression is not compatible with Java. (In which case they may want to use Rhino to handle the regular expression.)

I posted something similar to this to the MongoDB mailing list, and was told that "most users would prefer to use Pattern." Well, OK for them, not great for users like me who do not want breakage or pay performance costs for a feature they do not use. May I suggest that this could be controlled with a flag? The code would look something like this:

public void gotRegex( String name , String pattern , String flags ) {
RegExp bsonRegExp = new RegExp(pattern , BSON.regexFlags(flags)); // This class does not compile the pattern, simply stores it
_put(name, compileRegExpFlag ? bsonRegExp.compile() : bsonRegExp); // The compile() method returns a Java Pattern (though it may throw an exception)
}



 Comments   
Comment by Jeffrey Yemin [ 31/Mar/15 ]

Closing all resolved 3.0.0 issues, as 3.0.0 has been tagged and released.

Comment by Jeffrey Yemin [ 10/Jun/14 ]

So as not to break binary compatibility, the default behavior of decoding to a java.util.Pattern will have to remain.

We will fix this for new users with the introduction of the org.bson.types.RegularExpression, to be used with the new Document class that will serve as a replacement for DBObject.

Comment by Tal Liron [ 02/Jun/11 ]

I agree with you that a RegExp BSON type does not seem necessary. However, it is part of the BSON spec and it is supported by the Java driver. We don't want broken code in it, do we? It seems like it would be easy to solve. I did set the bug priority on "low."

I do not personally have an application that uses this, but I am the developer of the Rhino driver for MongoDB, which sits on top of the Java driver, and it also supports RegExp. In writing the unit tests I discovered this part of the code that did not seem right to me.

Comment by Antoine Girbal [ 02/Jun/11 ]

I see your point, but so far very few people have actually stored regexp natively in documents.
regexp are usually only used in queries to the server.
For storage, you may as well use regular strings that represent a regexp.
Do you have a use case?
thx

Generated at Thu Feb 08 08:52:07 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.