[JAVA-1925] Tailable cursor blocks on tryNext Created: 14/Aug/15  Updated: 02/Dec/15  Resolved: 08/Oct/15

Status: Closed
Project: Java Driver
Component/s: Query Operations
Affects Version/s: 2.13.2, 3.0.3
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Devin Smith Assignee: Unassigned
Resolution: Cannot Reproduce Votes: 0
Labels: Bug, driver, query, replicaset
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

I can pretty reliably cause an oplog tailable cursor to block on tryNext by running a bunch of cursors in parallel. Sometimes my test program finishes successfully, other times it blocks on tryNext. The database is relatively small

rslocal:PRIMARY> db.oplog.rs.stats()
{
"ns" : "local.oplog.rs",
"count" : 1063,
"size" : 1166114,
"avgObjSize" : 1097,
"storageSize" : 344064,
"capped" : true,
"max" : -1,
"maxSize" : 1038090240,
"wiredTiger"...

I'm running a 3.0.4 server standalone replica.

I'm able to demonstrate the exact same issue w/ 2.13.2 and 3.0.3 drivers:

https://github.com/devinrsmith/mongocursorexample

https://github.com/devinrsmith/mongocursorexample/tree/v3

Am I using the api/cursors incorrectly?



 Comments   
Comment by Arcadius Ahouansou [ 02/Dec/15 ]

Thank you very much drsmith

Comment by Devin Smith [ 30/Nov/15 ]

We are using docker mongo:3.0.6 in prod (NO vagrant) without issue so far.
The boxes are much beefier and in triplicate replication, probably with
some other configuration differences, so it's not an apples to apples
comparison.

On Mon, Nov 30, 2015 at 11:40 AM Arcadius Ahouansou (JIRA) <jira@mongodb.org>

Comment by Arcadius Ahouansou [ 30/Nov/15 ]

Hello again drsmith
We are seeing similar issue here.

Please, are you using docker in Prod or just for Dev?
If yes, do you see that issue in Prod as well?

Thanks.

Comment by Devin Smith [ 30/Nov/15 ]

Unfortunately not, but didn't dig into it too far. Running mongo locally
(OS X) as necessary for dev work that I need to do.

On Mon, Nov 30, 2015 at 10:00 AM Arcadius Ahouansou (JIRA) <jira@mongodb.org>

Comment by Arcadius Ahouansou [ 30/Nov/15 ]

Thanks drsmith
Please, apart from avoiding docker, have you found any other solution/workaround for this issue?
Thanks.

Comment by Devin Smith [ 30/Nov/15 ]

Yes, I believe it was.

On Mon, Nov 30, 2015, 7:23 AM Arcadius Ahouansou (JIRA) <jira@mongodb.org>

Comment by Arcadius Ahouansou [ 30/Nov/15 ]

Hello drsmith
In your last comment, you stated that the issue is only reproducible on Vagrant.
Is Vagrant running docker by any chance?

Thanks.

Comment by Jeffrey Yemin [ 08/Oct/15 ]

Thanks Devin,

I'm closing this issue now but please add further comments should you have more information at any time in the future, and I'll reopen it.

Comment by Devin Smith [ 17/Aug/15 ]

So, my coworkers was able to reproduce this on one of his mongo instances, but not another. For local dev we use Vagrant (on OSX), and it seems the only times we can repro the issue is in Vagrant. I don't have the time to trace it any further than this at the moment... might try to get a minimally reproducible vagrant image sometime and pass it off to both teams for further investigation... for now though, we'll just be weary of mongo in Vagrant.

Comment by Devin Smith [ 14/Aug/15 ]

It seems to me that I get mongo into a bad state by doing lots of cursors concurrently... then any manner of using cursors (serial or parallel) seems to fail. Maybe you can reproduce it by upping the number of concurrent cursors?

I've updated my code to count instead of storing the results, as well as loop through 100 times.

I'm also logging any MongoExceptions I receive, but specifically, that's not the case that I'm worried about.

Comment by Devin Smith [ 14/Aug/15 ]

mongod.conf shown in previous comment, default storage engine.

Comment by Jeffrey Yemin [ 14/Aug/15 ]

What's in mongod.conf? Is this still with wired tiger?

Comment by Devin Smith [ 14/Aug/15 ]

$ cat mongo/config/mongod.conf
replication:
  replSetName: rslocal

Brought up by docker-compose:

mongo:
  image: mongo:3.0.4
  volumes:
   - ./mongo/config:/config:ro
  ports:
   - "27017:27017"
  command: --config /config/mongod.conf

Comment by Jeffrey Yemin [ 14/Aug/15 ]

With what server configuration?

Comment by Devin Smith [ 14/Aug/15 ]

I just modified my test, and I was able to hit the issue using only 1 cursor at a time:

public static void main( String[] args ) throws UnknownHostException, InterruptedException {
final MongoClient client = new MongoClient("localhost", 27017);
final DB db = client.getDB("local");
if (!db.collectionExists("oplog.rs"))

{ throw new IllegalStateException("No oplog.rs is present"); }

final DBCollection oplog = db.getCollection("oplog.rs");
for (int i = 0; i < 100; ++i)

{ new MongoCursorExample(oplog, 1).run(); }

}

Comment by Devin Smith [ 14/Aug/15 ]

I've got 3.0.4 running with the default storage engine. At first, things were looking really good. I did my test about 10 times, each with 10 cursors or so and things finished smoothly. Then it started blocking. Maybe there is some cursor buildup on the server that is causing issues?

Comment by Jeffrey Yemin [ 14/Aug/15 ]

I pegged 8 CPUs for a while, but even with 16 runners the program you supplied eventually completed against 3.0 with the default storage engine. Curious to see your results as well.

The only change I made: just to avoid garbage collection overhead, I changed your program to count the number of results rather than storing them in a list. In my environment, there were about 314K oplog entries.

Comment by Devin Smith [ 14/Aug/15 ]

I'll try to repro w/ 3.0 no wired tiger.

Comment by Jeffrey Yemin [ 14/Aug/15 ]

Hi Devin,

Are you able to reproduce this with MongoDB 2.6 or MongoDB 3.0 with the default storage engine?

Comment by Devin Smith [ 14/Aug/15 ]

I should also mention that the DB's data is static while I'm running these tests.

Generated at Thu Feb 08 08:55:52 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.