[SERVER-78409] Mongo client fails with Error: attempt to copy-construct an iterator from a singular iterator. Created: 23/Jun/23  Updated: 29/Oct/23  Resolved: 07/Jul/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 7.1.0-rc0

Type: Bug Priority: Major - P3
Reporter: Adi Zaimi Assignee: Adi Zaimi
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by SERVER-77573 avoid copying invalid iterator in Str... Closed
Assigned Teams:
Sharding NYC
Backwards Compatibility: Fully Compatible
Operating System: ALL
Participants:

 Description   

While testing a patch build, this error shows up in one of the tests for evergreen run:
https://parsley.mongodb.com/resmoke/bc209470685e5078abcf0d977284ff8d/test/176b6138c752b1214d5f30fc4853bf4c?bookmarks=0,39&shareLine=0

[js_test:server21632] /opt/mongodbtoolchain/revisions/69f4f0673ffcb290ce2307560a4883ecf2ad138c/stow/gcc-v4.35T/include/c++/11.3.0/debug/safe_iterator.h:195:
[js_test:server21632] In function:
[js_test:server21632]     __gnu_debug::_Safe_iterator<_Iterator, _Sequence,
[js_test:server21632]     _Category>::_Safe_iterator(__gnu_debug::_Safe_iterator<_Iterator,
[js_test:server21632]     _Sequence, _Category>&&) [with _Iterator =
[js_test:server21632]     std::__cxx1998::_List_iterator<std::shared_ptr<mongo::StreamableReplicaSetMonitor::HostQuery>
[js_test:server21632]     >; _Sequence =
[js_test:server21632]     std::__debug::list<std::shared_ptr<mongo::StreamableReplicaSetMonitor::HostQuery>
[js_test:server21632]     >; _Category = std::forward_iterator_tag]
[js_test:server21632] 
[js_test:server21632] Error: attempt to copy-construct an iterator from a singular iterator.
[js_test:server21632]
[js_test:server21632] Objects involved in the operation:
[js_test:server21632]     iterator "this" @ 0x0x7fec5cde62f8 {
[js_test:server21632]       type = std::__cxx1998::_List_iterator<std::shared_ptr<mongo::StreamableReplicaSetMonitor::HostQuery> > (mutable iterator);
[js_test:server21632]       state = singular;
[js_test:server21632]     }
[js_test:server21632]     iterator "other" @ 0x0x7fec54240ec0 {
[js_test:server21632]       type = std::__cxx1998::_List_iterator<std::shared_ptr<mongo::StreamableReplicaSetMonitor::HostQuery> > (mutable iterator);
[js_test:server21632]       state = singular;
[js_test:server21632]       references sequence with type 'std::__debug::list<std::shared_ptr<mongo::StreamableReplicaSetMonitor::HostQuery>, std::allocator<std::shared_ptr<mongo::StreamableReplicaSetMonitor::HostQuery> > >' @ 0x0x7fec5423c6d8
[js_test:server21632]     }
[js_test:server21632] | 2023-06-23T19:43:03.473Z F  CONTROL  6384300 [ReplicaSetMonitor-TaskExecutor] "Writing fatal message","attr":{"message":"\n"}
[js_test:server21632] | 2023-06-23T19:43:03.474Z F  CONTROL  6384300 [ReplicaSetMonitor-TaskExecutor] "Writing fatal message","attr":{"message":"Got signal: 6 (Aborted).\n"}



 Comments   
Comment by Githook User [ 06/Jul/23 ]

Author:

{'name': 'Adi Zaimi', 'email': 'adi.zaimi@mongodb.com', 'username': 'adizaimi'}

Message: SERVER-78409 Erase from outstanding queries list by query in mongo client

Removing from the list by iterator queryIter resulted in failure because
queryIter can be invalidated by the time the lambda function is called and
calls the copy constructor of the queryIter (since we don't hold the lock).
Here we remove the query from the list by traversing the list, which is not
expected to be very large.
Branch: master
https://github.com/mongodb/mongo/commit/a9ae809fa5992c51abc435d1e28899a6c490e948

Comment by Adi Zaimi [ 06/Jul/23 ]

Created https://jira.mongodb.org/browse/SERVER-78741 to test performance and add any future fixes.

Comment by Adi Zaimi [ 06/Jul/23 ]

Actually having a reference to an invalid iterator is the issue here, and the only way to not have this situation is to not refer to the iterator which may be invalid. We should simply erase the element by query (using list.find()). This may introduce a bit of a performance concern if the list is really large. The list of connections can not physically be too large, but even so, in future we may want to convert the list into a map.

Comment by Adi Zaimi [ 28/Jun/23 ]

I think the issue here is that the iterator queryIter has been invalidated and the list container is empty by the time the lambda function is called and calls the copy constructor of the queryIter:

424     // Add the query to the list of outstanding queries.
425     auto queryIter = _outstandingQueries.insert(_outstandingQueries.end(), query);
426 
427     // After a deadline or when the input cancellation token is canceled, cancel this query. If the
428     // query completes first, the deadlineCancelSource will be used to cancel this task.
429     _executor->sleepUntil(deadline, query->deadlineCancelSource.token())
430         .getAsync([this, query, queryIter, self = shared_from_this(), cancelToken](Status status) {
 

Parameter queryIter to lambda should be either by reference (is that valid if parent queryIter goes out of scope – this is an async call: can queryIter go out of scope?), or we should use move operation.

Comment by Adi Zaimi [ 28/Jun/23 ]

If I understand this correctly, in the code, line 425 we have inserted an item

  auto queryIter = _outstandingQueries.insert

but that iterator has been invalidated because (in frame 5 of the stack trace above) the list is empty:
  (gdb) print (*this->__this)._outstandingQueries
  $7 = empty std::__debug::list

 

Comment by Adi Zaimi [ 28/Jun/23 ]

> gdb mongo dump_Replica.xecutor.9471.core

#0  0x00007fc6ecef193f in raise () from /lib64/libc.so.6
#1  0x00007fc6ecedbc95 in abort () from /lib64/libc.so.6
#2  0x00007fc6ef95aba3 in __gnu_debug::_Error_formatter::_M_error() const [clone .cold] () from /data/debug/lib/libfmt.so
#3  0x00007fc6eb32544a in __gnu_debug::_Safe_iterator<std::__cxx1998::_List_iterator<std::shared_ptr<mongo::StreamableReplicaSetMonitor::HostQuery> >, std::__debug::list<std::shared_ptr<mongo::StreamableReplicaSetMonitor::HostQuery>, std::allocator<std::shared_ptr<mongo::StreamableReplicaSetMonitor::HostQuery> > >, std::forward_iterator_tag>::_Safe_iterator (this=0x7fc6bade72f8, __x=...) at /opt/mongodbtoolchain/revisions/69f4f0673ffcb290ce2307560a4883ecf2ad138c/stow/gcc-v4.35T/include/c++/11.3.0/debug/safe_iterator.h:195
#4  0x00007fc6eb31e7cb in __gnu_debug::_Safe_iterator<std::__cxx1998::_List_iterator<std::shared_ptr<mongo::StreamableReplicaSetMonitor::HostQuery> >, std::__debug::list<std::shared_ptr<mongo::StreamableReplicaSetMonitor::HostQuery>, std::allocator<std::shared_ptr<mongo::StreamableReplicaSetMonitor::HostQuery> > >, std::bidirectional_iterator_tag>::_Safe_iterator (this=0x7fc6bade72f8) at /opt/mongodbtoolchain/revisions/69f4f0673ffcb290ce2307560a4883ecf2ad138c/stow/gcc-v4.35T/include/c++/11.3.0/debug/safe_iterator.h:544
#5  0x00007fc6eb309cc6 in <lambda>(struct {...} &&) (this=0x7fc6bade72e0) at src/mongo/client/streamable_replica_set_monitor.cpp:430
#6  0x00007fc6eb309db1 in operator() (__closure=0x7fc6ac23fcf8, arg=Status(CallbackCanceled, "Callback canceled")) at src/mongo/util/future.h:709
#7  0x00007fc6eb30b04e in mongo::future_details::call<mongo::ExecutorFuture<void>::getAsync<mongo::CleanupFuturePolicy<false>, mongo::StreamableReplicaSetMonitor::_enqueueOutstandingQuery(mongo::WithLock, const mongo::ReadPreferenceSetting&, const std::__debug::vector<mongo::HostAndPort>&, const mongo::CancellationToken&, const mongo::Date_t&)::<lambda(mongo::Status)> >(mongo::CleanupFuturePolicy<false>, mongo::StreamableReplicaSetMonitor::_enqueueOutstandingQuery(mongo::WithLock, const mongo::ReadPreferenceSetting&, const std::__debug::vector<mongo::HostAndPort>&, const mongo::CancellationToken&, const mongo::Date_t&)::<lambda(mongo::Status)>&&) &&::<lambda(mongo::StatusOrStatusWith<void>)>&>(struct {...} &, mongo::StatusWith<mongo::future_details::FakeVoid>) (func=..., sw=StatusWith(CallbackCanceled, "Callback canceled")) at src/mongo/util/future_impl.h:301
#8  0x00007fc6eb30b24e in operator() (__closure=0x7fc6ac23fcf8, ssb=0x7fc6ac23ef30) at src/mongo/util/future_impl.h:965
#9  0x00007fc6eb30c065 in SpecificImpl::call (this=0x7fc6ac23fcf0, args#0=@0x7fc6bade7420: 0x7fc6ac23ef30) at src/mongo/util/functional.h:258
#10 0x00007fc6ec7bd508 in mongo::unique_function<void (mongo::future_details::SharedStateBase*)>::operator()(mongo::future_details::SharedStateBase*) const (this=0x7fc6ac23ef48, args#0=0x7fc6ac23ef30) at src/mongo/util/functional.h:216
#11 0x00007fc6ec7b758b in mongo::future_details::SharedStateBase::transitionToFinished (this=0x7fc6ac23ef30) at src/mongo/util/future_impl.h:482
#12 0x00007fc6ec7b77ef in mongo::future_details::SharedStateBase::setError (this=0x7fc6ac23ef30, statusArg=Status::OK()) at src/mongo/util/future_impl.h:508

Comment by Adi Zaimi [ 23/Jun/23 ]

Another instance of the crash can be found here: https://parsley.mongodb.com/resmoke/4e834bb436eea0eeb16799dc5a24e5f3/test/176b68718792e9ad4d5f30fc4863419e?bookmarks=0,461&shareLine=0

Generated at Thu Feb 08 06:38:17 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.