[CDRIVER-3195] Driver aborts during bulk write Created: 17/Jun/19  Updated: 06/Apr/23  Resolved: 12/Aug/19

Status: Closed
Project: C Driver
Component/s: Bulk API, libmongoc
Affects Version/s: 1.13.0
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Bh Sr Assignee: Kevin Albertson
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Windows 10


Issue Links:
Depends
depends on CDRIVER-3239 Driver aborts during OP_MSG bulk writ... Closed
Related
is related to CDRIVER-1556 driver aborts after "mongoc_stream_wr... Closed

 Description   

Driver aborts during bulk write because a precondition for "stream" fails.

 

Here's the callstack.

 

inmation.exe!abort() Line 77

at d:\th\minkernel\crts\ucrt\src\appcrt\startup\abort.cpp(77)

inmation.exe!mongoc_stream_writev(_mongoc_stream_t * stream, mongoc_iovec_t * iov, unsigned __int64 iovcnt, int timeout_msec) Line 154

at c:\agent_work\5\s\buildtrees\mongo-c-driver\src\1.13.0-6ade1fe75c\src\libmongoc\src\mongoc\mongoc-stream.c(154)

[Inline Frame] inmation.exe!mongoc_stream_writev(_mongoc_stream_t * timeout_msec, mongoc_iovec_t *) Line 165

at c:\agent_work\5\s\buildtrees\mongo-c-driver\src\1.13.0-6ade1fe75c\src\libmongoc\src\mongoc\mongoc-stream.c(165)

inmation.exe!_mongoc_stream_writev_full(_mongoc_stream_t * stream, mongoc_iovec_t * iov, unsigned __int64 iovcnt, int timeout_msec, _bson_error_t * error) Line 433

at c:\agent_work\5\s\buildtrees\mongo-c-driver\src\1.13.0-6ade1fe75c\src\libmongoc\src\mongoc\mongoc-stream.c(433)

inmation.exe!mongoc_cluster_run_opmsg(_mongoc_cluster_t * cluster, _mongoc_cmd_t * cmd, _bson_t * reply, _bson_error_t * error) Line 2771

at c:\agent_work\5\s\buildtrees\mongo-c-driver\src\1.13.0-6ade1fe75c\src\libmongoc\src\mongoc\mongoc-cluster.c(2771)

inmation.exe!mongoc_cluster_run_command_monitored(_mongoc_cluster_t * cluster, _mongoc_cmd_t * cmd, _bson_t * reply, _bson_error_t * error) Line 554

at c:\agent_work\5\s\buildtrees\mongo-c-driver\src\1.13.0-6ade1fe75c\src\libmongoc\src\mongoc\mongoc-cluster.c(554)

inmation.exe!_mongoc_write_opmsg(mongoc_write_command_t * command, _mongoc_client_t * client, _mongoc_server_stream_t * server_stream, const char * database, const char * collection, const _mongoc_write_concern_t * write_concern, unsigned int index_offset, _mongoc_client_session_t * cs, mongoc_write_result_t * result, _bson_error_t * error) Line 585

at c:\agent_work\5\s\buildtrees\mongo-c-driver\src\1.13.0-6ade1fe75c\src\libmongoc\src\mongoc\mongoc-write-command.c(585)

inmation.exe!_mongoc_write_command_execute_idl(mongoc_write_command_t * command, _mongoc_client_t * client, _mongoc_server_stream_t * server_stream, const char * database, const char * collection, unsigned int offset, const _mongoc_crud_opts_t * crud, mongoc_write_result_t * result) Line 973

at c:\agent_work\5\s\buildtrees\mongo-c-driver\src\1.13.0-6ade1fe75c\src\libmongoc\src\mongoc\mongoc-write-command.c(973)

inmation.exe!_mongoc_write_command_execute(mongoc_write_command_t * command, _mongoc_client_t * client, _mongoc_server_stream_t * server_stream, const char * database, const char * collection, const _mongoc_write_concern_t * write_concern, unsigned int offset, _mongoc_client_session_t * cs, mongoc_write_result_t * result) Line 874

at c:\agent_work\5\s\buildtrees\mongo-c-driver\src\1.13.0-6ade1fe75c\src\libmongoc\src\mongoc\mongoc-write-command.c(874)

inmation.exe!mongoc_bulk_operation_execute(_mongoc_bulk_operation_t * bulk, _bson_t * reply, _bson_error_t * error) Line 792

at c:\agent_work\5\s\buildtrees\mongo-c-driver\src\1.13.0-6ade1fe75c\src\libmongoc\src\mongoc\mongoc-bulk-operation.c(792)

inmation.exe!mongocxx::v_noabi::bulk_write::execute() Line 171

at c:\agent_work\5\s\buildtrees\mongo-cxx-driver\src\r3.4.0-2d6ad5f494\src\mongocxx\bulk_write.cpp(171)

 

A few things to note:

1) We only do un-ordered writes.

2) The mongodb server was very busy and probably unresponsive when this happened.

3) My debugger says that error parameter to " _mongoc_stream_writev_full" has domain 2, code 4 and the message "Failed to send "update" command with database "HistoryTest": Failed to read 4 bytes: socket error or timeout".

4) This crash seems to be the same as CDRIVER-1556 which is "fixed".



 Comments   
Comment by Bh Sr [ 13/Aug/19 ]

Thanks Kevin.Albertson! I will test it once 1.15.0 is released.

Comment by Kevin Albertson [ 12/Aug/19 ]

Hi srinarasi, we have reproduced a crash for bulk writes using OP_MSG that produces a similar stack trace as you have provided and fixed it as part of CDRIVER-3239, which will be included in 1.15.0 when it is released (which should be this week). Upgrading should hopefully resolve your issue. If it does not, please open another ticket.

Thanks,
Kevin

Comment by Kevin Albertson [ 15/Jul/19 ]

Hi srinarasi I have been able to reproduce this issue, and it appears to still be an issue in master. CDRIVER-1556 is similar, but that applied to sending write commands over the OP_QUERY wire protocol. Since MongoDB 3.6, we've supported the OP_MSG protocol, and it appears the logic to fix CDRIVER-1556 wasn't properly ported during the implementation of OP_MSG. I've created a new ticket to describe and track this issue: CDRIVER-3239.

Comment by Bh Sr [ 02/Jul/19 ]

I meant to say CDRIVER-2814 in my previous comment.

Comment by Bh Sr [ 02/Jul/19 ]

Hi @Kevin Albertson,

We have done a  couple of changes to the driver. One is to fix CDRIVER-2701 and the other is to support it on Windows XP. Neither of those changes affect the part of the code where the crash occurred ( this crash did not occur on Windows XP). We certainly don't change mongoc-stream.c. Could the line number mismatch be a result of Visual Studio not being able to show the correct line number for the optimized release builds?

 

We do use a pooled client. It's difficult to share the code which crashes but in effect, I can represent what we do in the pseudocode below.  The pool is created much before this code runs and is not destroyed until much later.

 

auto bw = <<collection>>.create_bulk_write(mongocxx::options::bulk_write().ordered(false));
for (...)
    bw.append(mongocxx::model::replace_one(<<some document here>>).upsert(true));
bw.execute();

 

 

We have not upgraded to 1.14 yet. We have only seen this crash twice in the last few months. So, even if we update in the next few weeks, it may be many many months before it crashes again.

 

Is there any more information that I can provide?

Comment by Kevin Albertson [ 01/Jul/19 ]

Hi srinarasi, thank you for the detailed bug report!

Can you reproduce on the latest C driver release, 1.14.0?

at c:\agent_work\5\s\buildtrees\mongo-c-driver\src\1.13.0-6ade1fe75c\src\libmongoc\src\mongoc\mongoc-stream.c(433)

Line 433 in 1.13.0 does not correspond to the call to mongoc_stream_writev_full. And 6ade1fe75c is not a known git hash in the C driver. Are you using a modified C driver, and would those modifications be relevant?

Is the client you are using a pooled client or single threaded client? If possible, can you provide the relevant part of the code that creates the client and performs the bulk write?

Generated at Wed Feb 07 21:17:21 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.