[SERVER-21752] slow2_wt fails by exhausting host machine's memory Created: 05/Nov/15  Updated: 16/Nov/16  Resolved: 03/Dec/15

Status: Closed
Project: Core Server
Component/s: Testing Infrastructure
Affects Version/s: None
Fix Version/s: 3.2.1, 3.3.0

Type: Bug Priority: Major - P3
Reporter: Spencer Brody (Inactive) Assignee: Matt Dannenberg
Resolution: Done Votes: 0
Labels: test-only
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-21691 Insert stalls during dirty writeback Closed
is related to SERVER-21509 rollback4.js failed waiting for repli... Closed
is related to WT-2251 Memory leak from split code Closed
is related to SERVER-22204 Tests should lower WT cache size Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Completed:
Sprint: Repl C (11/20/15), Repl D (12/11/15)
Participants:
Linked BF Score: 0

 Description   

task
logs

	
[js_test:rollback4] 2015-11-05T20:48:42.975+0000 d20011| 2015-11-05T20:48:42.974+0000 I REPL     [ReplicationExecutor] syncing from: ip-10-186-159-19:20010
[js_test:rollback4] 2015-11-05T20:48:42.975+0000 d20010| 2015-11-05T20:48:42.975+0000 I NETWORK  [initandlisten] connection accepted from 10.186.159.19:33816 #13 (5 connections now open)
[js_test:rollback4] 2015-11-05T20:48:42.976+0000 d20010| 2015-11-05T20:48:42.976+0000 I NETWORK  [conn13] end connection 10.186.159.19:33816 (4 connections now open)
[js_test:rollback4] 2015-11-05T20:48:42.976+0000 d20010| 2015-11-05T20:48:42.976+0000 I NETWORK  [initandlisten] connection accepted from 10.186.159.19:33817 #14 (5 connections now open)
[js_test:rollback4] 2015-11-05T20:48:42.976+0000 d20011| 2015-11-05T20:48:42.976+0000 I ASIO     [NetworkInterfaceASIO] Successfully connected to ip-10-186-159-19:20010
[js_test:rollback4] 2015-11-05T20:48:43.363+0000 d20010| 2015-11-05T20:48:43.363+0000 I NETWORK  [conn8] SocketException handling request, closing client connection: 9001 socket exception [SEND_ERROR] server [10.186.159.19:33733]
[js_test:rollback4] 2015-11-05T20:48:43.364+0000 d20010| 2015-11-05T20:48:43.363+0000 I NETWORK  [conn12] SocketException handling request, closing client connection: 9001 socket exception [SEND_ERROR] server [10.186.159.19:33815]
[js_test:rollback4] 2015-11-05T20:48:43.364+0000 d20010| 2015-11-05T20:48:43.363+0000 I NETWORK  [conn9] SocketException handling request, closing client connection: 9001 socket exception [SEND_ERROR] server [10.186.159.19:33734]
[js_test:rollback4] 2015-11-05T20:48:44.253+0000 2015-11-05T20:48:44.253+0000 I NETWORK  [thread1] DBClientCursor::init call() failed
[js_test:rollback4] 2015-11-05T20:48:44.366+0000 d20011| 2015-11-05T20:48:44.365+0000 I REPL     [ReplicationExecutor] Error in heartbeat request to ip-10-186-159-19:20010; HostUnreachable End of file
[js_test:rollback4] 2015-11-05T20:48:44.366+0000 d20011| 2015-11-05T20:48:44.366+0000 I REPL     [ReplicationExecutor] Error in heartbeat request to ip-10-186-159-19:20010; HostUnreachable End of file
[js_test:rollback4] 2015-11-05T20:48:44.367+0000 d20010| 2015-11-05T20:48:44.367+0000 I NETWORK  [initandlisten] connection accepted from 10.186.159.19:33818 #15 (3 connections now open)
[js_test:rollback4] 2015-11-05T20:48:44.367+0000 d20011| 2015-11-05T20:48:44.367+0000 I ASIO     [NetworkInterfaceASIO] Successfully connected to ip-10-186-159-19:20010
[js_test:rollback4] 2015-11-05T20:48:44.368+0000 d20011| 2015-11-05T20:48:44.367+0000 I REPL     [ReplicationExecutor] Member ip-10-186-159-19:20010 is now in state SECONDARY
[js_test:rollback4] 2015-11-05T20:48:44.531+0000 d20010| 2015-11-05T20:48:44.530+0000 I COMMAND  [conn1] command db.c command: insert { insert: "c", documents: 1000, ordered: false } ntoreturn:1 ntoskip:0 ninserted:320 keyUpdates:0 writeConflicts:0 exception: Not primary while writing to db.c code:10107 numYields:0 reslen:52353 locks:{ Global: { acquireCount: { r: 1005, w: 1005 }, acquireWaitCount: { w: 1 }, timeAcquiringMicros: { w: 370135 } }, MMAPV1Journal: { acquireCount: { w: 1005 }, acquireWaitCount: { w: 2 }, timeAcquiringMicros: { w: 88273994 } }, Database: { acquireCount: { w: 1005 } }, Collection: { acquireCount: { W: 685 } }, Metadata: { acquireCount: { w: 320 } }, oplog: { acquireCount: { W: 320 } } } protocol:op_command 11070ms
[js_test:rollback4] 2015-11-05T20:48:44.531+0000 d20010| 2015-11-05T20:48:44.530+0000 I NETWORK  [conn1] SocketException handling request, closing client connection: 9001 socket exception [SEND_ERROR] server [127.0.0.1:36327]
[js_test:rollback4] 2015-11-05T20:48:44.531+0000 d20011| 2015-11-05T20:48:44.531+0000 I REPL     [SyncSourceFeedback] SyncSourceFeedback error sending update: network error while attempting to run command 'replSetUpdatePosition' on host 'ip-10-186-159-19:20010'
[js_test:rollback4] 2015-11-05T20:48:44.532+0000 d20011| 2015-11-05T20:48:44.531+0000 I REPL     [SyncSourceFeedback] updateUpstream failed: HostUnreachable network error while attempting to run command 'replSetUpdatePosition' on host 'ip-10-186-159-19:20010' , will retry
[js_test:rollback4] 2015-11-05T20:48:44.867+0000 d20012| 2015-11-05T20:48:44.867+0000 I REPL     [ReplicationExecutor] Error in heartbeat request to ip-10-186-159-19:20010; HostUnreachable End of file
[js_test:rollback4] 2015-11-05T20:48:44.868+0000 d20010| 2015-11-05T20:48:44.868+0000 I NETWORK  [initandlisten] connection accepted from 10.186.159.19:33819 #16 (3 connections now open)
[js_test:rollback4] 2015-11-05T20:48:44.868+0000 d20012| 2015-11-05T20:48:44.868+0000 I ASIO     [NetworkInterfaceASIO] Successfully connected to ip-10-186-159-19:20010
[js_test:rollback4] 2015-11-05T20:48:44.868+0000 d20012| 2015-11-05T20:48:44.868+0000 I REPL     [ReplicationExecutor] Member ip-10-186-159-19:20010 is now in state SECONDARY
[js_test:rollback4] 2015-11-05T20:48:44.907+0000 2015-11-05T20:48:44.907+0000 E QUERY    [thread1] Error: error doing query: failed :
[js_test:rollback4] 2015-11-05T20:48:44.907+0000 DBQuery.prototype._exec@src/mongo/shell/query.js:108:28
[js_test:rollback4] 2015-11-05T20:48:44.908+0000 DBQuery.prototype.next@src/mongo/shell/query.js:274:5
[js_test:rollback4] 2015-11-05T20:48:44.908+0000 Bulk/executeBatch@src/mongo/shell/bulk_api.js:853:16
[js_test:rollback4] 2015-11-05T20:48:44.908+0000 Bulk/this.execute@src/mongo/shell/bulk_api.js:1139:11
[js_test:rollback4] 2015-11-05T20:48:44.908+0000 @jstests/slow2/rollback4.js:39:16
[js_test:rollback4] 2015-11-05T20:48:44.908+0000 
[js_test:rollback4] 2015-11-05T20:48:44.908+0000 failed to load: jstests/slow2/rollback4.js



 Comments   
Comment by Githook User [ 08/Dec/15 ]

Author:

{u'username': u'dannenberg', u'name': u'matt dannenberg', u'email': u'matt.dannenberg@10gen.com'}

Message: SERVER-21752 remove parallelization of jobs in slow2_WT to avoid out of memory error
Branch: v3.2
https://github.com/mongodb/mongo/commit/b339d3de4e485fd43bb7dbb7658023bb734088a3

Comment by Githook User [ 08/Dec/15 ]

Author:

{u'username': u'dannenberg', u'name': u'matt dannenberg', u'email': u'matt.dannenberg@10gen.com'}

Message: SERVER-21752 reduce number of jobs for slow2_WT to avoid out of memory error
Branch: v3.2
https://github.com/mongodb/mongo/commit/573080105a7e457001b15d9a9bda2ca486329d4b

Comment by Githook User [ 03/Dec/15 ]

Author:

{u'username': u'dannenberg', u'name': u'matt dannenberg', u'email': u'matt.dannenberg@10gen.com'}

Message: SERVER-21752 remove parallelization of jobs in slow2_WT to avoid out of memory error
Branch: master
https://github.com/mongodb/mongo/commit/febcf866edc365b6ab014664117696647fe5b002

Comment by Githook User [ 03/Dec/15 ]

Author:

{u'username': u'dannenberg', u'name': u'matt dannenberg', u'email': u'matt.dannenberg@10gen.com'}

Message: SERVER-21752 reduce number of jobs for slow2_WT to avoid out of memory error
Branch: master
https://github.com/mongodb/mongo/commit/b79f57e485b97753adc691ad7437e53710713fb9

Comment by Matt Dannenberg [ 30/Nov/15 ]

This failure is possibly related to the fallout from the latest wired tiger drop and the behavior seen in SERVER-21691.

Comment by Matt Dannenberg [ 30/Nov/15 ]

The recent increase in failures of this test seems to be related to the latest wired tiger drop. I believe it is problems of General Slowness in the storage engine.

Comment by Spencer Brody (Inactive) [ 05/Nov/15 ]

Looks kind of similar to BF-1352? Maybe related?

Generated at Thu Feb 08 03:58:18 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.