[SERVER-16427] replsets_prefetch_stress.js failed under ASAN slow2_WT test Created: 05/Dec/14 Updated: 11/Jul/16 Resolved: 07/Dec/14 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | 2.8.0-rc3 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Andrew Morrow (Inactive) | Assignee: | Andrew Morrow (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | address-sanitizer | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Operating System: | ALL |
| Steps To Reproduce: | Waiting to see. Will probably re-launch this task once other MCI ASAN tasks have completed. |
| Participants: |
| Description |
|
This is a novel failure after migrating the ASAN build to the mongodb-master project. Interestingly, both secondaries died with the same error (Ctrl-F Address). |
| Comments |
| Comment by Githook User [ 07/Dec/14 ] |
|
Author: {u'username': u'acmorrow', u'name': u'Andrew Morrow', u'email': u'acm@mongodb.com'}Message: |
| Comment by Andrew Morrow (Inactive) [ 05/Dec/14 ] |
|
And, finally, the best answer is to make BSONElement's default constructor set a sane fieldNameSize_. |
| Comment by Andrew Morrow (Inactive) [ 05/Dec/14 ] |
|
After some discussion, a better fix is to make BSONElement::fieldNameStringData return a correct empty StringData so that it has the same semantics as BSONElement::fieldName() for EOO elements. |
| Comment by Andrew Morrow (Inactive) [ 05/Dec/14 ] |
|
This behavior is now understood. The following code may call BSONElement::fieldNameStringData on an EOO BSONElement: https://github.com/mongodb/mongo/blob/19ad2749f3cba8c074f50b29400af897eb4df3ab/src/mongo/db/repl/sync_tail.cpp#L110 This could happen in a doc-level locking build if no _id element was detected here: https://github.com/mongodb/mongo/blob/19ad2749f3cba8c074f50b29400af897eb4df3ab/src/mongo/db/repl/sync_tail.cpp#L396-L407 We didn't notice this before we switched from lazy to eager StringData because with a lazy StringData, the call to BSONElement::fieldNameStringData on an EOO BSONElement would cause the StringData constructor to be invoked with (&"", -1). It just so happens that that -1 aliases std::string::npos, meaning that in the era of lazy StringData, we wouldn't notice the confusion because even though we called the StringData constructor specifying a length, the internal state of StringData would appear to be that generated by calling StringData(const char*) (i.e. the non-length specifying constructor), so when the length of the StringData was later requested, we would invoke strlen("") and the -1 would become 0. With eager StringData length calculations, the call to StringData(&"", -1) produces an entry which refers into the constant table, but with a huge positive length. The fix is to not attempt to hash the field name of an EOO element in the hashBSONElement function in db/repl/sync_tail.cpp. |
| Comment by Andrew Morrow (Inactive) [ 05/Dec/14 ] |
|
We did change StringData recently. |
| Comment by Eric Milkie [ 05/Dec/14 ] |
|
Looking through the murmurhash function, it seems like we are attempting to hash a StringData whose length value is greater than the actual underlying buffer size, and that particular length % 16 is 15. Or else the StringData refers to freed memory, but this seems unlikely given that points into the global string literal data segment. I'm not sure how that could happen or why this has started to occur only recently. |
| Comment by Andrew Morrow (Inactive) [ 05/Dec/14 ] |
|
New instance of this alert in gle_auth_write_cmd_WT: |