[CDRIVER-788] Hang in large bulk upsert Created: 10/Aug/15  Updated: 19/Oct/16  Resolved: 31/Aug/15

Status: Closed
Project: C Driver
Component/s: Bulk API, libmongoc
Affects Version/s: None
Fix Version/s: 1.2-beta1

Type: Bug Priority: Major - P3
Reporter: A. Jesse Jiryu Davis Assignee: Hannes Magnusson
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Solaris 11.


Issue Links:
Related
is related to CDRIVER-756 Unchecked errors on failed network wr... Closed
is related to CDRIVER-787 segfault on network err in legacy upd... Closed

 Description   

On Solaris 11 with MongoDB 2.4.14 and C Driver 1.2 unreleased, "test_upsert_large" segfaults:

1. The test constructs an update document that is intended to exactly meet the 16MB max bson size, like update({_id: 1}, {$set: {x: <... 16777179-byte string ...>}}).

2. On legacy servers, it is sent as an OP_UPDATE in _mongoc_write_command_update_legacy, eventually via mongoc_cluster_sendv_to_server

3. mongoc_cluster_sendv_to_server calls mongoc_stream_writev.

4. mongoc_stream_writev eventually results in a standard sendmsg call which fails with errno 97, EMSGSIZE, "Message too long" . http://docs.oracle.com/cd/E19455-01/806-1075/msgs-1643/index.html

5. mongoc_cluster_sendv_to_server incorrectly checks mongoc_stream_writev's error return: it considers -1 a success. This is part of the CDRIVER-756 class of bugs.

6. mongoc_cluster_sendv_to_server thinks the call succeeded so it blocks the standard sockettimeoutms of 5 minutes awaiting GLE. When it finally decides the GLE has failed it crashes trying to free the NULL response document CDRIVER-787.

Questions:

1. Does CDRIVER-756 already cover the bug in step 3?

2. What is a reasonable approach to EMSGSIZE? Split the iovec and retry? Are we certain none of the message was sent? Should the driver record that "n bytes was too large" and split all future iovecs up to that size, in an attempt to adapt to its system?



 Comments   
Comment by Githook User [ 31/Aug/15 ]

Author:

{u'username': u'bjori', u'name': u'Hannes Magnusson', u'email': u'bjori@php.net'}

Message: CDRIVER-788: Return how many bytes we wrote.

If a iovec is to large to send over sendmsg()/WSASend() we get
[WSA]EMSGSIZE and hit the _slow version and try to send() it instead.
Wether or not we finish writing the entire thing (say, hit [WSA]EWOULDBLOCK)
we must return how much we actually wrote so we can resume from the
correct location once _mongoc_socket_wait() finishes (e.g. poll(POLLOUT) succeeds)
and we have enough socketTimeoutMS left to continue writing
Branch: 1.2.0-dev
https://github.com/mongodb/mongo-c-driver/commit/2ae1e8b80d328a36a31880ed4ffc727d10aa32c6

Comment by A. Jesse Jiryu Davis [ 11/Aug/15 ]

Assigning to you as part of the CDRIVER-756 / CDRIVER-770 constellation of network bugs.

Generated at Wed Feb 07 21:10:38 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.