[SERVER-9095] Windows builds failing on initial_sync3.js and repl_monitor_stress.js Created: 22/Mar/13  Updated: 11/Jul/16  Resolved: 26/Mar/13

Status: Closed
Project: Core Server
Component/s: Testing Infrastructure
Affects Version/s: None
Fix Version/s: 2.2.4

Type: Improvement Priority: Major - P3
Reporter: Ian Whalen (Inactive) Assignee: Eric Milkie
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Participants:

 Comments   
Comment by Tad Marshall [ 25/Mar/13 ]

The above commit resolves issue 1 above by removing the Client creation (so there is nothing to call shutdown() on) and answers the question in item 5.

Comment by auto [ 25/Mar/13 ]

Author:

{u'date': u'2013-01-25T15:33:46Z', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: SERVER-9095 fix reconfig.js on Windows by removing Client from heartbeat thread

The heartbeat thread has no use for a Client context, and it has trouble
at process shutdown time – it tends to hit an access violation on Windows
when it tries to destruct the Client context.
Branch: v2.2
https://github.com/mongodb/mongo/commit/568e95e3dad7ee511565056db4f03586cd5ad018

Comment by Tad Marshall [ 25/Mar/13 ]

The easiest fix is to backport the change that makes 2.4 not have this problem:

2c1ca7a77b395623216ef35964383813830a2ddd .

Comment by Tad Marshall [ 25/Mar/13 ]

There are multiple things going on here, and some issues that require additional research and perhaps some additional tickets.

1) The ReplSetHealthPollTask is not calling shutdown() on its client object before its thread is destroyed.
2) The client object tries to log a message in its destructor about the failure to call shutdown(), but the LogStream it uses to log messages has been destroyed already, causing an access violation (segfault).
3) The Windows unhandled exception filter tries to use the same LogStream to display information about the access violation and generates a nested exception, terminating the process with no indication of what happened.
4) The Windows 64-bit build hits the same access violations running jstests\replsets\initial_sync3.js, but incorrectly reports success anyway. 32-bit correctly reports failure.
5) It is not clear why this problem is happening in the 2.2 branch but not in the 2.4 branch.

The immediate fix is to item 1, which makes the tests pass, but the other issues should be investigated as well.

Generated at Thu Feb 08 03:19:22 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.