[JAVA-366] BSON decoding compiles Java Pattern instances: this is error prone and bad for performance Created: 02/Jun/11 Updated: 31/Mar/15 Resolved: 13/Jun/14 |
|
| Status: | Closed |
| Project: | Java Driver |
| Component/s: | API |
| Affects Version/s: | 2.6.1 |
| Fix Version/s: | 3.0.0 |
| Type: | Bug | Priority: | Minor - P4 |
| Reporter: | Tal Liron | Assignee: | Unassigned |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Backwards Compatibility: | Minor Change |
| Description |
|
In BasicBSONCallback.gotRegEx, Pattern.compile is called on the regular expression that was stored in the BSON structure. There are two problems with this: 1) JavaScript regular expression are not equivalent to Java regular expressions. This means that it is possible that this Pattern.compile would throw a PatternSyntaxException. (Which is not handled here!) This can be very bad, because it would mean that certain MongoDB documents that contain such regular exceptions would simply be un-retrievable via the Java driver. 2) Pattern.compile is slow, and would currently be done for every single regular expression encountered by the Java driver. However, in many cases the user does not care to use the compiled regular expression as a Pattern instance, and is only interested in the string content of it. This slow regex compilation is thus superfluous. My recommendation is to have a new dedicated class for BSON regular expressions: org.bson.types.RegExp. This class would not normally compile the pattern into a Java Pattern, however it could offer a compile() method to attempt to do this compilation for users who want to use this pattern, with the documented caveat that they might get a PatternSyntaxException if the regular expression is not compatible with Java. (In which case they may want to use Rhino to handle the regular expression.) I posted something similar to this to the MongoDB mailing list, and was told that "most users would prefer to use Pattern." Well, OK for them, not great for users like me who do not want breakage or pay performance costs for a feature they do not use. May I suggest that this could be controlled with a flag? The code would look something like this: public void gotRegex( String name , String pattern , String flags ) { |
| Comments |
| Comment by Jeffrey Yemin [ 31/Mar/15 ] |
|
Closing all resolved 3.0.0 issues, as 3.0.0 has been tagged and released. |
| Comment by Jeffrey Yemin [ 10/Jun/14 ] |
|
So as not to break binary compatibility, the default behavior of decoding to a java.util.Pattern will have to remain. We will fix this for new users with the introduction of the org.bson.types.RegularExpression, to be used with the new Document class that will serve as a replacement for DBObject. |
| Comment by Tal Liron [ 02/Jun/11 ] |
|
I agree with you that a RegExp BSON type does not seem necessary. However, it is part of the BSON spec and it is supported by the Java driver. We don't want broken code in it, do we? It seems like it would be easy to solve. I did set the bug priority on "low." I do not personally have an application that uses this, but I am the developer of the Rhino driver for MongoDB, which sits on top of the Java driver, and it also supports RegExp. In writing the unit tests I discovered this part of the code that did not seem right to me. |
| Comment by Antoine Girbal [ 02/Jun/11 ] |
|
I see your point, but so far very few people have actually stored regexp natively in documents. |