[GODRIVER-1594] slow on Find Created: 28/Apr/20  Updated: 27/Oct/23  Resolved: 29/Apr/20

Status: Closed
Project: Go Driver
Component/s: CRUD
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Hardi N/A Assignee: Unassigned
Resolution: Works as Designed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Ubuntu 18.04



 Description   

I'm trying to get 10000 document at a time in mongodb

Information :

Code :

 

package main
 
import (
    "context"
    "fmt"
    "net/http"
    "os"
    "time"
 
    "go.mongodb.org/mongo-driver/bson"
    "go.mongodb.org/mongo-driver/mongo"
    "go.mongodb.org/mongo-driver/mongo/options"
)
 
var database *mongo.Database
 
func main() {
 
    ctx, _ := context.WithTimeout(context.Background(), 10*time.Second)
    client, err := mongo.Connect(ctx, options.Client().ApplyURI("mongodb://20.20.20.43:27017"))
    if err != nil {
        panic(err)
    }
 
    database = client.Database("chat_data")
 
    chatText := make([]chat, 0)
    now := time.Now().Unix()
    ctx, _ = context.WithTimeout(context.Background(), 30*time.Second)
 
    // mongodb batch option
    opt := options.Find()
    opt.SetBatchSize(15_000)
    opt.SetAllowPartialResults(false)
 
    // mongodb filter
    filter := bson.M{"timestamp": bson.M{"$gte": now - 108000}}
 
    cur, err := database.Collection("chat").Find(ctx, filter, opt)
    if err != nil {
        // fmt.Fprint(w, err)
        fmt.Println(err)
        return
    }
    defer cur.Close(ctx)
 
    for cur.Next(ctx) {
        var result chat
        err := cur.Decode(&result)
        if err != nil {
            fmt.Println(err)
            continue
        }
        // do something with result....
        // fmt.Println(result)
        chatText = append(chatText, result)
    }
    if err := cur.Err(); err != nil {
        // fmt.Fprint(w, cur.Err())
        fmt.Println(err)
        return
    }
 
    fmt.Println("done")
    fmt.Println(len(chatText))
}

it's takes more than 30 second to only get the full result, while pymongo only need 0m2.159s for 36k document (with same filter)

 

 



 Comments   
Comment by Divjot Arora (Inactive) [ 29/Apr/20 ]

Thanks for confirming suyatno.hardi@gmail.com! I'm going to close out this issue, but feel free to leave another comment or open a new one if you have any other questions.

– Divjot

Comment by Hardi N/A [ 29/Apr/20 ]

thankyou Divjot, the limitation was on my network speed, so there is no issue for GODRIVER

Comment by Divjot Arora (Inactive) [ 29/Apr/20 ]

suyatno.hardi@gmail.com The batch size can help, but the server is also limited because all of the results come back in a single BSON document, so the max size is 16 MB. If you have large documents, this will limit the number of documents that can be returned at once, even if the batch size is high. Also, how do you know that the data is not coming back in a single batch? Each batch is buffered internally in the driver so it could all come back in one batch and each Next call could be returning the next document from that batch.

To actually figure out exactly how many commands are being sent, I recommend the command monitoring approach I described in my last comment. It will show us exactly what commands are being sent to the server and how many requests we are doing. Note that because there are so many documents, it might be helpful to print evt.CommandName rather than the evt.Command and evt.Reply fields.

Let us know if you need further guidance on this.

– Divjot

Comment by Hardi N/A [ 29/Apr/20 ]

how to close this issue

Comment by Hardi N/A [ 29/Apr/20 ]

sorry for my missunderstanding, Go & Python driver has same result, however what i'm trying to do is returning all find() data at once, what i already do is using `SetAllowPartialResults(false)` and setBatchSize(30_000), but nothing works

for example this simple code :

    cur, err := database.Collection("chat").Find(ctx, filter, opt)
    if err != nil {
        // fmt.Fprint(w, err)
        fmt.Println(err)
        return
    }
    defer cur.Close(ctx)
 
    for cur.Next(ctx) {
        var result chat
        err := cur.Decode(&result)
        if err != nil {
            fmt.Println(err)
            continue
        }
        // do something with result....
        chatText = append(chatText, result)
    }

it's takes more than 30sec just for returning 36k data, i try to set batch size to return 36k data at once, but doesn't work

Comment by Divjot Arora (Inactive) [ 29/Apr/20 ]

suyatno.hardi@gmail.com As more debugging info, can you provide the Python code you're using and pymongo version as well? If none of the Go-specific ideas give any extra info, having the Python script might at least give us a way to create a repro on our end.

– Divjot

Comment by Divjot Arora (Inactive) [ 28/Apr/20 ]

Hi suyatno.hardi@gmail.com,

Thanks for the detailed report. There's very little driver-specific logic for the Find operation, so it's strange that the performance difference is this big. To further debug this, can you try the following:

  1. Run Collection.CountDocuments with the same filter that you're using for the Find to ensure that the filter is matching the correct number of documents.
  2. Enable command monitoring to print out all of the communication between the driver and the server. I've written up a small example that you can integrate into your code for this at https://play.golang.org/p/tnfcCFyC-xH. If you do try this, please put all of the output in a file and upload it in a comment here so we can take a look too.

Can you also provide the driver version, server version, and server topology (e.g. standalone/replica set/sharded) that you're using?

– Divjot

Generated at Thu Feb 08 08:36:43 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.