[CDRIVER-1556] driver aborts after "mongoc_stream_writev(): precondition failed: stream" Created: 22/Sep/16  Updated: 01/Jul/19  Resolved: 23/Sep/16

Status: Closed
Project: C Driver
Component/s: Bulk API, libmongoc
Affects Version/s: 1.4.0
Fix Version/s: 1.5.0

Type: Bug Priority: Major - P3
Reporter: Christoph Schwarz Assignee: Hannes Magnusson
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Debian 8 (Jessie) 64 bit


Issue Links:
Related
related to CDRIVER-3195 Driver aborts during bulk write Closed

 Description   

Driver aborted during execution of a bulk operation, which consists of a large number of upserts.

Console output:

2016/09/22 00:08:06.0861: [22725]:  WARNING:       stream: Failure to buffer 36 bytes: Failed to buffer 36 bytes within 300000 milliseconds.
src/mongoc/mongoc-stream.c:152 mongoc_stream_writev(): precondition failed: stream
Aborted (core dumped)

GDB backtrace:

Program terminated with signal SIGABRT, Aborted.
#0  0x00007f8f3e215067 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56	../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  0x00007f8f3e215067 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007f8f3e216448 in __GI_abort () at abort.c:89
#2  0x0000000000d74cfe in mongoc_stream_writev (stream=<optimized out>, iov=<optimized out>, iovcnt=<optimized out>, timeout_msec=<optimized out>)
    at src/mongoc/mongoc-stream.c:153
#3  0x0000000000d75326 in _mongoc_stream_writev_full (stream=0x3f2b, stream@entry=0x7f8eb4006a40, iov=0x58c5, iovcnt=6, timeout_msec=300000, 
    error=0x7f8dba711568) at src/mongoc/mongoc-stream.c:497
#4  0x0000000000d5fda0 in mongoc_cluster_run_command_internal (cluster=cluster@entry=0x7f8eb40029c0, stream=0x7f8eb4006a40, server_id=1, 
    flags=flags@entry=MONGOC_QUERY_NONE, db_name=db_name@entry=0x7f8df2ed80d0 "some_db", command=command@entry=0x7f8e6effba80, monitored=true, 
    host=0x7f8dbb193008, reply=0x7f8e6effbb00, error=0x7f8dba711568) at src/mongoc/mongoc-cluster.c:240
#5  0x0000000000d600c9 in mongoc_cluster_run_command_monitored (cluster=cluster@entry=0x7f8eb40029c0, 
    server_stream=server_stream@entry=0x7f8db890c220, flags=flags@entry=MONGOC_QUERY_NONE, db_name=db_name@entry=0x7f8df2ed80d0 "some_db", 
    command=command@entry=0x7f8e6effba80, reply=reply@entry=0x7f8e6effbb00, error=0x7f8dba711568) at src/mongoc/mongoc-cluster.c:377
#6  0x0000000000d7f9cd in _mongoc_write_command (command=command@entry=0x7f8dbbc14b00, client=client@entry=0x7f8eb40029b0, 
    server_stream=server_stream@entry=0x7f8db890c220, database=database@entry=0x7f8df2ed80d0 "some_db", collection=<optimized out>, 
    write_concern=<optimized out>, offset=<optimized out>, result=0x7f8dba7112e0, error=0x7f8dba711568) at src/mongoc/mongoc-write-command.c:1304
#7  0x0000000000d7fb2c in _mongoc_write_command_execute (command=0x7f8dbbc14b00, client=0x7f8eb40029b0, server_stream=0x7f8db890c220, 
    database=0x7f8df2ed80d0 "some_db", collection=0x7f8df367afb0 "some_collection", write_concern=0x7f8dba711870, offset=37000, 
    result=0x7f8dba7112e0) at src/mongoc/mongoc-write-command.c:1367
#8  0x0000000000d5c95a in mongoc_bulk_operation_execute (bulk=0x7f8dba711260, reply=0x58c5, error=0x6) at src/mongoc/mongoc-bulk-operation.c:451
...

Note that we did not encounter any problem with this kind of operation and the driver has executed thousands of such operations successfully before the crash.

We noted a connectivity problem around the time the crash happened. A different thread that is monitoring database connectivity logged:

mongoc_cursor_next(some_collection): Failed to send "find" command with database "some_db": Failed to read 36 bytes from socket within 10000 milliseconds.

However, that thread did not crash the application and it does not perform bulk upserts but rather a simple finds.



 Comments   
Comment by Githook User [ 23/Sep/16 ]

Author:

{u'username': u'bjori', u'name': u'Hannes Magnusson', u'email': u'bjori@php.net'}

Message: CDRIVER-1556: Crash when multi bulk write fails with stream failure
Branch: master
https://github.com/mongodb/mongo-c-driver/commit/5b49ffdfe3f8a04e39cd8bc2240193c5e6834bf7

Comment by Hannes Magnusson [ 22/Sep/16 ]

Could be. From the driver eyes, a database response over our sockettimeout is the same as network failure (black hole in a firewall for example).

I think I have a fix by now, just trying to repro this scenario in unit test

Comment by Christoph Schwarz [ 22/Sep/16 ]

This is the last log message on the database:

2016-09-22T00:08:05.902-0700 I COMMAND  [conn2303] command some_db.$cmd command: update { update: "some_collection", writeConcern: {}, ordered: false, updates: 1000 } keyUpdates:0 writeConflicts:0 numYields:0 reslen:55 locks:{ Global: { acquireCount: { r: 1009, w: 1009 } }, Database: { acquireCount: { w: 1009 } }, Collection: { acquireCount: { w: 1009 } } } protocol:op_query 308520ms

Maybe there was no connectivity issue. Rather than that the database took longer than 5 minutes to process a batch of 1000 upserts.

Comment by Christoph Schwarz [ 22/Sep/16 ]

Yes, the writes are unordered and there are 50,000 upsert operations in this particular batch.

Comment by Hannes Magnusson [ 22/Sep/16 ]

Are you doing unordered writes?

Comment by Hannes Magnusson [ 22/Sep/16 ]

Interesting.
The first warning is a socket read operation that failed, and then the subsequent socket write failed.

When you say "large number of upserts", I suspect that means over a 1000?
We ship 1000 operations at a time, so it appears to have successfully shipped some batches, and then upon reading the server reply from one of the batches, we lost connectivity, and disconnected the stream – but still attempted to ship the next batch operation.

Generated at Wed Feb 07 21:12:52 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.