[SERVER-16427] replsets_prefetch_stress.js failed under ASAN slow2_WT test Created: 05/Dec/14  Updated: 11/Jul/16  Resolved: 07/Dec/14

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 2.8.0-rc3

Type: Bug Priority: Major - P3
Reporter: Andrew Morrow (Inactive) Assignee: Andrew Morrow (Inactive)
Resolution: Done Votes: 0
Labels: address-sanitizer
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Steps To Reproduce:

Waiting to see. Will probably re-launch this task once other MCI ASAN tasks have completed.

Participants:

 Description   

This is a novel failure after migrating the ASAN build to the mongodb-master project.

https://mci.10gen.com/task/mongodb_mongo_master_ubuntu1404_debug_asan_e66eecff9487add89506add70a6bb6a23735763f_14_12_04_07_32_07_slow2_WT_ubuntu1404_debug_asan

http://buildlogs.mongodb.org/MCI_ubuntu1404-debug-asan/builds/392473/test/slow2_WT_0/replsets_prefetch_stress.js

Interestingly, both secondaries died with the same error (Ctrl-F Address).



 Comments   
Comment by Githook User [ 07/Dec/14 ]

Author:

{u'username': u'acmorrow', u'name': u'Andrew Morrow', u'email': u'acm@mongodb.com'}

Message: SERVER-16427 Return a correct empty StringData for fieldNameStringData on EOO
Branch: master
https://github.com/mongodb/mongo/commit/a6788ec72a9dabb38bd92a8ab653e32034e9e02e

Comment by Andrew Morrow (Inactive) [ 05/Dec/14 ]

And, finally, the best answer is to make BSONElement's default constructor set a sane fieldNameSize_.

Comment by Andrew Morrow (Inactive) [ 05/Dec/14 ]

After some discussion, a better fix is to make BSONElement::fieldNameStringData return a correct empty StringData so that it has the same semantics as BSONElement::fieldName() for EOO elements.

Comment by Andrew Morrow (Inactive) [ 05/Dec/14 ]

This behavior is now understood.

The following code may call BSONElement::fieldNameStringData on an EOO BSONElement: https://github.com/mongodb/mongo/blob/19ad2749f3cba8c074f50b29400af897eb4df3ab/src/mongo/db/repl/sync_tail.cpp#L110

This could happen in a doc-level locking build if no _id element was detected here: https://github.com/mongodb/mongo/blob/19ad2749f3cba8c074f50b29400af897eb4df3ab/src/mongo/db/repl/sync_tail.cpp#L396-L407

We didn't notice this before we switched from lazy to eager StringData because with a lazy StringData, the call to BSONElement::fieldNameStringData on an EOO BSONElement would cause the StringData constructor to be invoked with (&"", -1). It just so happens that that -1 aliases std::string::npos, meaning that in the era of lazy StringData, we wouldn't notice the confusion because even though we called the StringData constructor specifying a length, the internal state of StringData would appear to be that generated by calling StringData(const char*) (i.e. the non-length specifying constructor), so when the length of the StringData was later requested, we would invoke strlen("") and the -1 would become 0.

With eager StringData length calculations, the call to StringData(&"", -1) produces an entry which refers into the constant table, but with a huge positive length.

The fix is to not attempt to hash the field name of an EOO element in the hashBSONElement function in db/repl/sync_tail.cpp.

Comment by Andrew Morrow (Inactive) [ 05/Dec/14 ]

We did change StringData recently.

Comment by Eric Milkie [ 05/Dec/14 ]

Looking through the murmurhash function, it seems like we are attempting to hash a StringData whose length value is greater than the actual underlying buffer size, and that particular length % 16 is 15. Or else the StringData refers to freed memory, but this seems unlikely given that points into the global string literal data segment. I'm not sure how that could happen or why this has started to occur only recently.

Comment by Andrew Morrow (Inactive) [ 05/Dec/14 ]

New instance of this alert in gle_auth_write_cmd_WT:

https://mci.10gen.com/task/mongodb_mongo_master_ubuntu1404_debug_asan_e66eecff9487add89506add70a6bb6a23735763f_14_12_04_07_32_07_gle_auth_write_cmd_WT_ubuntu1404_debug_asan

http://buildlogs.mongodb.org/MCI_ubuntu1404-debug-asan/builds/397672/test/gle_auth_write_cmd_WT_0/gle_sharded_wc.js

Generated at Thu Feb 08 03:41:01 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.