[GODRIVER-1599] Add a way to determine if a call to Next or TryNext would block Created: 30/Apr/20  Updated: 28/Oct/23  Resolved: 07/May/20

Status: Closed
Project: Go Driver
Component/s: None
Affects Version/s: None
Fix Version/s: 1.4.0

Type: Improvement Priority: Major - P3
Reporter: David Bartley Assignee: Divjot Arora (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

Cursor has a TryNext method, but it's sort of confusing what it does. At first glance, it sounds like it'd return the next document of the underlying cursor batch if available, and return false if that's empty and a getMore would have to be issued. However, that's not what it does; instead, it will always issue a getMore if needed; the "try" variant simply prevents a subsequent getMore from being issued if no documents were returned.

The simplest thing to do would be to add an IsBatchEmpty/WouldNextBlock/BatchDocsRemaining/etc... method. You could also add a 3rd variant of Next, though that'd probably make things even more confusing.

As a workaround, I'm basically writing my tailer to look something like:

    cursor, err := client.Database("foo").Collection("bar").Find(ctx, bson.D{},
        options.Find().SetCursorType(options.TailableAwait))
    if err != nil {
        return err
    }
    defer cursor.Close(ctx)
    if err == nil {
        // Create a context that will cause any blocking call (i.e. `getMore`)
        // to fail; when this happens we'll sleep for a bit to prevent DoS'ing
        // the node.
        canceledCtx, cancel := context.WithCancel(ctx)
        cancel()
        nextCtx := canceled
 
        for {
            for cursor.TryNext(nextCtx) {
                nextCtx = canceledCtx
                // Process current doc...
            }
 
            if err := cursor.Err(); err == context.Canceled {
                // The driver must have tried to issue a `getMore`; sleep
                // for a bit and then allow `TryNext` to block once.
                time.Sleep(100 * time.Millisecond)
                nextCtx = ctx
                continue
            } else if err != nil {
                break
            }
        }
    }
    // Handle cursor error...

If I had something like IsBatchEmpty, I could write this as:

    cursor, err := client.Database("foo").Collection("bar").Find(ctx, bson.D{},
        options.Find().SetCursorType(options.TailableAwait))
    if err != nil {
        return err
    }
    defer cursor.Close(ctx)
    if err == nil {
        for cursor.TryNext(ctx) {
            // Process current doc...
 
            if cursor.IsBatchEmpty() {
                // Avoid sending too many `getMore`s.
                time.Sleep(100 * time.Millisecond)
            }
        }
        err = cursor.Err()
    }
    // Handle cursor error...



 Comments   
Comment by Githook User [ 07/May/20 ]

Author:

{'name': 'Divjot Arora', 'email': 'divjot.arora@10gen.com', 'username': 'divjotarora'}

Message: GODRIVER-1599 Expose batch length in Cursor (#402)
Branch: master
https://github.com/mongodb/mongo-go-driver/commit/ac7f51c1a3eb9e20458a51b75ca0a64b68b920a8

Comment by Divjot Arora (Inactive) [ 07/May/20 ]

https://github.com/mongodb/mongo-go-driver/pull/402

Comment by David Bartley [ 06/May/20 ]

Comment by Divjot Arora (Inactive) [ 06/May/20 ]

Thought about this a bit more and we'll be implementing a slightly more general Cursor.RemainingBatchLength function (name TBD) to get the number of documents left in the current batch. The bsoncore.DocumentSequence type has a DocumentCount method, so we can record the initial batch length on every batch and manually decrement on successful Next/TryNext calls.

Comment by Divjot Arora (Inactive) [ 06/May/20 ]

I'm leaning towards the single-function solution as well. You should note that the command monitor solution will work, but will force additional copies because we copy the command for CommandStartedEvent and the server reply document for CommandSucceededEvent. We already have the code for checking the state of the current batch in cursor.go, so adding a function like this should be straightforward.

Comment by David Bartley [ 06/May/20 ]

I agree that special-casing cancellation isn't great, and it seems reasonable to say that any timeout/cancellation while a cursor is running should abort the entire cursor as a whole.

Thanks, I didn't know about the 1s default timeout; I agree it'd be great to document that  I'm not sure that matters a whole lot in our case, since we typically see >> 1 writes/s to the collection we're tailing; at the same time, a given getMore is fast enough that we'll usually only get back a single doc per getMore (hence the sleep to try to force those to batch up).  You're right that the TryNext approach originally outlined doesn't work.

I ended up implementing this by adding an event monitor that just sleeps on getMore commands, and that seems to work well enough.  As far as options go, I'd think adding a new method would be more straightforward than introducing a new batching API.  For some additional context, we've been slowly migrating applications from our internal fork of mgo to mongo-driver, and one of things we added was a "NeedsNetwork" call to mgo's Cursor, which is basically what we'd be adding here.

Comment by Divjot Arora (Inactive) [ 06/May/20 ]

bartle Sorry for the delay. There's a lot of pieces here, so I wanted to make sure I can provide a comprehensive response. A few things came to mind when reading over your code samples and comments:

  1. The Next and TryNext functions return immediately if there was a previous error because otherwise, the batch could have been iterated on the server but not returned to the driver, so doing another getMore would result in a batch being missed. This isn't true for the context.Canceled error specifically because the context cancellation is only checked before the driver does the network request for the getMore, but I'm hoping that future versions of the driver will be able to cancel in-flight network operations, so explicitly ignoring context.Canceled in the Cursor code seems risky.
  2. You mention that specifying SetCursorType(TailableAwait) means that the server blocks until at least one document is available. I don't think this is correct. The getMore sent by the driver will not have a maxTimeMS value set, but the server uses a 1-second default maxTimeMS for tailalbe await cursors. The relevant server code is https://github.com/mongodb/mongo/blob/b430e084d39d9315cf272383c17b0a5fb4d83395/src/mongo/db/commands/getmore_cmd.cpp#L200-L207 and there is a ticket for the docs team to fix the getMore command documentation to reflect this (DOCS-13346). Given this, I think your TryNext code would have to handle the case where TryNext returns false but there was no error.

Given all of that, we do understand that there's no way of satisfying your use case with the current Next/TryNext functions. Our options are to either add a batch-oriented cursor API like NextBatch which does a single getMore and returns the entire batch of documents or to add a function like the one you suggested.

I'll be investigating these approaches over the next couple of days and comment here once I have an idea of what we want to do. In the meantime, I'd like to hear your thoughts on my comments about the code from the ticket description and the options I've outlined.

– Divjot

Comment by David Bartley [ 02/May/20 ]

I'm already specifying TailableAwait as the cursor type, and so if no results are available the getMore does block until at least one document is available.  The issue is that the getMore doesn't wait until some minimum number of results are available, so in an extreme case you could you end up with one getMore per document write.

Comment by Divjot Arora (Inactive) [ 30/Apr/20 ]

bartle Seems like you essentially want to rate-limit the getMore requests so the server isn't spammed. Could you use the MaxAwaitTime option for this? That will make it so the request waits on the server rather than in the driver.

Comment by David Bartley [ 30/Apr/20 ]

Another option might be to configure a command monitor that sleeps on any getMore?

Comment by David Bartley [ 30/Apr/20 ]

I think the only way to achieve this is to use "BatchCursorFromCursor", which lets you observe when you'll need to call getMore, but that function is deprecated.

Comment by David Bartley [ 30/Apr/20 ]

Actually, that first code snippet does't work, because Next will immediately return false if "c.err != nil", which it always will.

Generated at Thu Feb 08 08:36:44 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.