[CXX-1129] Legacy 1.1.x driver calls abort() when query/document's size is less but close to 16Mb Created: 13/Nov/16  Updated: 28/Feb/18  Resolved: 07/Feb/17

Status: Closed
Project: C++ Driver
Component/s: Implementation
Affects Version/s: legacy-1.1.0, legacy-1.1.1, legacy-1.1.2
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Arseny Vakhrushev Assignee: David Golden
Resolution: Won't Fix Votes: 0
Labels: legacy-cxx
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

CentOS 6.8



 Description   

We are encountering sporadic crashes of the legacy 1.1.x driver. Debugging revealed that the driver's code contains subtle discrepancies in uassert() checks (normal driver exceptions) and invariant() calls that eventually lead to abort() and crash our system. These discrepancies manifest themselves for BSON objects whose size is less than the default maximum size (16777216) but more than 16777216 - 8200 = 16769016.

Please see the following debugging session which illustrates the problem.

{{
[...]
Program received signal SIGABRT, Aborted.
0x00007ffff6f615e5 in raise () from /lib64/libc.so.6

(gdb) bt
#0 0x00007ffff6f615e5 in raise () from /lib64/libc.so.6
#1 0x00007ffff6f62dc5 in abort () from /lib64/libc.so.6
#2 0x00007ffff6868aef in mongo::invariantFailed (expr=0x7ffff6887f38 "_fits(batch.get(), *batch_iter)", file=0x7ffff6887f10 "src/mongo/client/command_writer.cpp", line=51)
at src/mongo/util/assert_util.cpp:91
#3 0x00007ffff67e9281 in mongo::CommandWriter::write (this=0x618a10, ns=..., write_operations=std::vector of length 1, capacity 1 =

{...}, ordered=true,
bypassDocumentValidation=false, writeConcern=0x61a560, writeResult=0x7fffffffdd00) at src/mongo/client/command_writer.cpp:51
#4 0x00007ffff67fac97 in mongo::DBClientBase::_write (this=0x61a4e0, ns="test1.test1", writes=std::vector of length 1, capacity 1 = {...}

, ordered=true,
bypassDocumentValidation=false, writeConcern=0x0, writeResult=0x7fffffffdd00) at src/mongo/client/dbclient.cpp:2056
#5 0x00007ffff67fb57c in mongo::DBClientBase::update (this=0x61a4e0, ns="test1.test1", query=..., obj=..., flags=1, wc=0x0) at src/mongo/client/dbclient.cpp:2143
#6 0x00007ffff67fb399 in mongo::DBClientBase::update (this=0x61a4e0, ns="test1.test1", query=..., obj=..., upsert=true, multi=false, wc=0x0) at src/mongo/client/dbclient.cpp:2127

(gdb) fr 3
#3 0x00007ffff67e9281 in mongo::CommandWriter::write (this=0x618a10, ns=..., write_operations=std::vector of length 1, capacity 1 =

{...}, ordered=true,
bypassDocumentValidation=false, writeConcern=0x61a560, writeResult=0x7fffffffdd00) at src/mongo/client/command_writer.cpp:51
51 invariant(_fits(batch.get(), *batch_iter));
(gdb) l
46 boost::scoped_ptr<BSONArrayBuilder> batch(new BSONArrayBuilder);
47 std::vector<WriteOperation*>::const_iterator batch_iter = batch_begin;
48
49 // We must be able to fit the first item of the batch. Otherwise, the calling code
50 // passed an over size write operation in violation of our contract.
51 invariant(_fits(batch.get(), *batch_iter));
52
53 // Set the current operation type
54 const WriteOpType batchOpType = (*batch_iter)->operationType();
55

(gdb) bre 51
[...]

Breakpoint 1, mongo::CommandWriter::write (this=0x618a10, ns=..., write_operations=std::vector of length 1, capacity 1 = {...}

, ordered=true, bypassDocumentValidation=false,
writeConcern=0x61a560, writeResult=0x7fffffffdd00) at src/mongo/client/command_writer.cpp:51
51 invariant(_fits(batch.get(), *batch_iter));

[...]
(gdb) s
mongo::CommandWriter::_fits (this=0x618a10, builder=0x61eef0, operation=0x61dfa0) at src/mongo/client/command_writer.cpp:112
112 int opSize = operation->incrementalSize();
(gdb) n
113 int maxSize = _client->getMaxBsonObjectSize();
(gdb) n
116 uassert(0, "update command exceeds maxBsonObjectSize", opSize <= maxSize);
(gdb) n
118 return (builder->len() + opSize + kOverhead) <= maxSize;
(gdb) print opSize
$2 = 16773902
(gdb) print maxSize
$3 = 16777216
(gdb) print opSize <= maxSize
$4 = true
(gdb) print builder->len() + opSize + kOverhead <= maxSize
$5 = false
(gdb) print builder->len() + kOverhead
$6 = 8200
(gdb) print opSize <= maxSize - builder->len() - kOverhead
$7 = false
}}

Please also note that similar uassert() checks are passed even before:

{{
(gdb) fr 3
#3 0x00007ffff67fb57c in mongo::DBClientBase::update (this=0x61a4e0, ns="test1.test1", query=..., obj=..., flags=1, wc=0x0) at src/mongo/client/dbclient.cpp:2143
2143 _write(ns, updates.ops, true, bypassDocumentValidation, wc, &writeResult);
(gdb) l -

2133 uassert(0,
2134 "update selector exceeds maxBsonObjectSize",
2135 query.obj.objsize() <= getMaxBsonObjectSize());
2136 uassert(
2137 0, "update document exceeds maxBsonObjectSize", obj.objsize() <= getMaxBsonObjectSize());

2138 updates.enqueue(new UpdateWriteOperation(query.obj, obj, flags));
2139
2140 bool bypassDocumentValidation = flags & UpdateOption_BypassDocumentValidation;
2141
2142 WriteResult writeResult;
2143 _write(ns, updates.ops, true, bypassDocumentValidation, wc, &writeResult);
2144 }
}}



 Comments   
Comment by David Golden [ 07/Feb/17 ]

After further discussion internally about resource planning, I'm closing this ticket as "Won't Fix".

Comment by David Golden [ 14/Nov/16 ]

One of the many reasons we encourage people to migrate to the stable mongocxx release is that you have much greater control. Among other things, you can build up bson objects with bsoncxx and check their length as you go.

Comment by Arseny Vakhrushev [ 13/Nov/16 ]

I appreciate your feedback, David. We are aware that the legacy driver is not getting anything other than critical - we wouldn't bother you guys otherwise. We can't see though how things like this are not critical to be honest. They can cause abortions and hence lead to stability issues and data loss in the calling code. What else is critical then?

It's also not always possible to check against a document's size when the legacy driver is wrapped around as a module for another language and a document is formed from a native associative array or table for example. The issue needs to be propagated upstream in this case.

Comment by David Golden [ 13/Nov/16 ]

Thanks for the bug report. At one time, the server required update commands to fit into the maximum BSON object size, so the assertion checks that there is room for 8kb overhead for the command. The server currently allows a little extra overhead above the max BSON object size, but the assertions in the legacy driver haven't been updated to match. There is also a related problem in SERVER-12305.

The legacy driver is getting critical fixes only, so we don't anticipate putting resources against this bug any time soon. The workaround as you found is to ensure that your documents are smaller than 16MB less 8kb.

Regards,
David

Generated at Wed Feb 07 22:01:30 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.