Uploaded image for project: 'Java Driver'
  1. Java Driver
  2. JAVA-622

Looking for performance tips loading results from a find() with the java driver.

    XMLWordPrintableJSON

Details

    • Icon: Task Task
    • Resolution: Done
    • Icon: Major - P3 Major - P3
    • None
    • None
    • None
    • None
    • 2.8.0 java driver. Mac OS X (development) & CentOS (production)

    Description

      I had a question about improving the performance of loading data from Mongo.
      I'm doing a query as follows:

      val prefixString = "^" + Pattern.quote(path);
      val prefixPattern: Pattern = Pattern.compile(prefixString);
      val query: BasicDBObject = new BasicDBObject(ID_FIELD_NAME, prefixPattern);
      val cursor = this.collection.find(query).batchSize(10000);
      val arr = cursor.toArray();

      I'm using the 2.8.0 java driver (even though the code is written in scala).

      When I do an "explain" of this query, I get the following:

      { "cursor" : "BtreeCursor id multi" , "nscanned" : 5020 , "nscannedObjects" : 5020 , "n" : 5020 , "millis" : 23 , "nYields" : 0 , "nChunkSkips" : 0 , "isMultiKey" : false , "indexOnly" : false , "indexBounds" : { "_id" : [ [ "" , { }] , [

      { "$regex" : "^\\Q\\E" , "$options" : ""}

      ,

      { "$regex" : "^\\Q\\E" , "$options" : ""}

      ]]} , "allPlans" : [ { "cursor" : "BtreeCursor id multi" , "indexBounds" : { "_id" : [ [ "" , { }] , [

      { "$regex" : "^\\Q\\E" , "$options" : ""}

      ,

      { "$regex" : "^\\Q\\E" , "$options" : ""}

      ]]}}] , "oldPlan" : { "cursor" : "BtreeCursor id multi" , "indexBounds" : { "_id" : [ [ "" , { }] , [

      { "$regex" : "^\\Q\\E" , "$options" : ""}

      ,

      { "$regex" : "^\\Q\\E" , "$options" : ""}

      ]]}}}

      The "explain" says it took 23 milliseconds, but the actual time it takes to do the toArray is closer to 600 ms. This suprises me as I'm doing this testing on localhost, so I would expect the data transfer to go quickly. What can I do to speed this operation up? I want to load all query results into memory as quickly as possible. I took a look in Wireshark and the total data is only 180k, so I'd be surprised if the data transfer were the only issue.

      Thanks!

      Attachments

        Activity

          People

            Unassigned Unassigned
            startupandrew Andrew Lee
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: