[JAVA-2303] Regression in LazyBSONObject.entrySet() performance in the 3.x driver Created: 09/Sep/16  Updated: 19/Oct/16  Resolved: 22/Sep/16

Status: Closed
Project: Java Driver
Component/s: BSON, Performance
Affects Version/s: 3.0.0
Fix Version/s: 3.4.0

Type: Bug Priority: Major - P3
Reporter: Steve Briskin (Inactive) Assignee: Jeffrey Yemin
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

There is a regression in the performance of LazyBSONObject.entrySet() in the 3.x driver vs. the 2.x driver.

The below runs in ~80ms with the 2.14 driver and ~8000ms with the 3.3 driver.

package misc;
 
import com.mongodb.*;
 
import java.util.*;
import java.util.Map.Entry;
 
import org.bson.*;
 
public class LazyBsonWalk {
 
    public static void main(String args[]) throws Exception {
        MongoClient mongo = new MongoClient();
        DBCollection coll = mongo.getDB("test").getCollection("doc");
        coll.drop();
 
        coll.setDBDecoderFactory(LazyDBDecoder.FACTORY);
 
        coll.insert(generateDoc());
 
        long start = System.currentTimeMillis();
        walkLazyBSONObject((LazyBSONObject)coll.findOne());
        long end = System.currentTimeMillis();
 
        System.out.println(end - start);
    }
 
    public static void walkLazyBSONObject(LazyBSONObject lazyObj) {
        Set<Entry<String, Object>> set = lazyObj.entrySet();
 
        for(Entry<String, Object> entry : set) {
 
            Object value = null;
            value = entry.getValue();
 
            if (value instanceof LazyBSONObject) {
                walkLazyBSONObject((LazyBSONObject)value);
            }
        }
    }
 
    public static DBObject generateDoc() {
        DBObject obj = new BasicDBObject();
 
        for (int i = 0; i < 10; i++) {
            DBObject obji = new BasicDBObject();
 
            for (int j = 0; j < 10; j++ ) {
                DBObject objj = new BasicDBObject();
 
                for (int k = 0; k < 100; k++) {
                    DBObject data = new BasicDBObject();
                    data.put("clicks", 0);
                    data.put("impressions", 1);
                    data.put("revenue", 100);
 
                    objj.put(String.valueOf(k), data);
                }
                obji.put(String.valueOf(j), objj);
            }
            obj.put(String.valueOf(i), obji);
        }
 
        return obj;
    }
}



 Comments   
Comment by Githook User [ 22/Sep/16 ]

Author:

{u'username': u'jyemin', u'name': u'Jeff Yemin', u'email': u'jeff.yemin@10gen.com'}

Message: JAVA-2303: Improve performance of LazyBSONObject.entrySet and hashCode

The root cause of the performance problem was that the entrySet
implementation inserted all the entries into an actual HashSet,
which in turn called hashCode on all embedded LazyBSONOBject
instances. The hashCode implemenation examined every byte in the
array instead of just the bytes between offset and size.

The fix is two-fold. The first is to use a private Set implementation
for the entrySet that just wraps an ArrayList and therefore doesn't
need to call hashCode anymore at all. While calls to entrySet().contains
and entrySet().containsAll will be slower, that is an acceptable trade-off
as usage of those methods is likely to be rare.

The second change is to ensure that LazyBSONObject.hashCode only examines the
bytes between offset and size, just in case clients are sticking instances in
their own hash tables.
Branch: master
https://github.com/mongodb/mongo-java-driver/commit/2ffc7c8f72bb1d0be95ad8b891eee454815096b7

Comment by Jeffrey Yemin [ 09/Sep/16 ]

Tracked it down to a badly implemented hash function in LazyBSONObject, which is being utilized indirectly in LazyBSONObject.entrySet when it stick the value in a hash map

Generated at Thu Feb 08 08:56:52 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.