[SERVER-20182] killCursor command causes fassert failure in mongod with low limit for open files Created: 28/Aug/15  Updated: 20/Mar/16  Resolved: 20/Mar/16

Status: Closed
Project: Core Server
Component/s: Admin
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Anna Herlihy (Inactive) Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Participants:

 Description   

PyMongo is probably doing something wrong, but in replacing OP_KILLCURSOR with the killCursors command, we are causing the database (version 3.1.6) to exit with the log:

2015-08-28T13:21:46.080-0400 I REPL     [ReplicationExecutor] Error in heartbeat request to mongodb.home:27017; HostUnreachable Failed attempt to connect to mongodb.home:27017; couldn't initialize connection to host mongodb.home, address is invalid
2015-08-28T13:21:46.498-0400 E STORAGE  [repl writer worker 6] WiredTiger (24) [1440782506:498605][5975:0x110c89000], WT_SESSION.create: /Users/annaherlihy/data/mongodb/rs0-2/index-1918-8285583706914688281.wt: Too many open files
2015-08-28T13:21:46.499-0400 E REPL     [repl writer worker 6] Failed command { create: "test" } on pymongo_test with status UnknownError 24: Too many open files during oplog application
2015-08-28T13:21:46.499-0400 F REPL     [repl writer worker 6] Error applying operation ({ ts: Timestamp 1440782501000|2, h: 2043797443416079560, v: 2, op: "c", ns: "pymongo_test.$cmd", o: { create: "test" } }): UnknownError 24: Too many open files
2015-08-28T13:21:46.499-0400 I -        [repl writer worker 6] Fatal Assertion 16359
2015-08-28T13:21:46.499-0400 I -        [repl writer worker 6]
 
***aborting after fassert() failure



 Comments   
Comment by Adam Midvidy [ 28/Aug/15 ]

Could you try with 3.1.7 using mmapv1 instead of wiredTiger?

Comment by J Rassi [ 28/Aug/15 ]

A couple of questions:

  • Could you attempt to reproduce with 3.1.7?
  • Is PyMongo opening a new connection for each "kill cursor" operation? If so, I could imagine a scenario where the driver is opening hundreds of connections for "kill cursor" operations, but the OP_KILL_CURSORS connections are short-lived (since they generate no response) whereas the killCursor command connections are longer-lived (since they generate a response that the driver is presumably waiting to consume). I could see this making it more likely in the latter case to reach a point where >256 connections are open concurrently.
Comment by Ramon Fernandez Marina [ 28/Aug/15 ]

Hey anna.herlihy@10gen.com, do you have a small repro we can use?

Also the error message is complaining about too many open files, can you please send the output of ulimit -a on your system?

Thanks,
Ramón.

Generated at Thu Feb 08 03:53:26 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.