[SERVER-18261] Smoke.py error running jsCore_small_oplog on Windows 64-bit 2008R2+ Created: 29/Apr/15  Updated: 26/Aug/15  Resolved: 26/Aug/15

Status: Closed
Project: Core Server
Component/s: Replication, Testing Infrastructure
Affects Version/s: 3.0.2
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Ernie Hershey Assignee: Matt Dannenberg
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File smokepyrun.txt    
Issue Links:
Related
is related to SERVER-18318 Disable jsCore_small_oplog suite in W... Closed
Operating System: ALL
Participants:
Linked BF Score: 0

 Description   

See MCI failed task

[2015/04/29 10:44:56.986]   File "C:\Python27\lib\site-packages\pymongo\collection.py", line 410, in insert
 [2015/04/29 10:44:56.986]     _check_write_command_response(results)
 [2015/04/29 10:44:56.986]   File "C:\Python27\lib\site-packages\pymongo\helpers.py", line 208, in _check_write_command_response
 [2015/04/29 10:44:56.986]     error.get("code"), error)
 [2015/04/29 10:44:56.986] pymongo.errors.WTimeoutError: waiting for replication timed out



 Comments   
Comment by Ernie Hershey [ 04/May/15 ]

I just saw this in rhel55 although a subsequent run passed.

Comment by Scott Hernandez (Inactive) [ 29/Apr/15 ]

Looks like the slave never finishes replicating the "test_compact_keeps_indexes_drop" database from the "compact_keeps_indexes.js" test. Which blocks the write, and it times out.

2015-04-29T21:12:43.003+0100 I REPL     [replslave] resync: cloning database test_compact_keeps_indexes_drop to get an initial copy
2015-04-29T21:12:43.003+0100 I NETWORK  [initandlisten] connection accepted from 127.0.0.1:51188 #734 (4 connections now open)
2015-04-29T21:12:43.004+0100 I NETWORK  [conn734] end connection 127.0.0.1:51188 (3 connections now open)
2015-04-29T21:12:43.004+0100 I NETWORK  [initandlisten] connection accepted from 127.0.0.1:51189 #735 (4 connections now open)
2015-04-29T21:12:43.009+0100 I REPL     [replslave] resync: done with initial clone for db: test_compact_keeps_indexes_drop
2015-04-29T21:12:43.011+0100 I INDEX    [replslave] allocating new ns file /data/db/sconsTestsSlave\test_compact_keeps_indexes_drop.ns, filling with zeroes...
2015-04-29T21:12:43.021+0100 I NETWORK  [conn735] end connection 127.0.0.1:51189 (3 connections now open)
2015-04-29T21:12:43.263+0100 I STORAGE  [FileAllocator] allocating new datafile /data/db/sconsTestsSlave\test_compact_keeps_indexes_drop.0, filling with zeroes...
2015-04-29T21:12:44.287+0100 I STORAGE  [FileAllocator] done allocating datafile /data/db/sconsTestsSlave\test_compact_keeps_indexes_drop.0, size: 64MB,  took 1.023 secs
2015-04-29T21:12:44.396+0100 I REPL     [replslave] repl:   applied 1 operations
2015-04-29T21:12:44.396+0100 I REPL     [replslave] repl:  end sync_pullOpLog syncedTo: Apr 29 21:12:41 55413b39:2
2015-04-29T21:12:44.396+0100 I REPL     [replslave] repl: syncing from host:localhost:27999
2015-04-29T21:12:44.396+0100 I REPL     [replslave] An earlier initial clone of 'test_compact_keeps_indexes_drop' did not complete, now resyncing.
2015-04-29T21:12:44.396+0100 I REPL     [replslave] resync: dropping database test_compact_keeps_indexes_drop
2015-04-29T21:12:44.451+0100 I JOURNAL  [replslave] journalCleanup...
2015-04-29T21:12:44.452+0100 I JOURNAL  [replslave] removeJournalFiles
2015-04-29T21:12:44.500+0100 I JOURNAL  [replslave] journalCleanup...
2015-04-29T21:12:44.500+0100 I JOURNAL  [replslave] removeJournalFiles
2015-04-29T21:12:44.514+0100 I REPL     [replslave] resync: cloning database test_compact_keeps_indexes_drop to get an initial copy
2015-04-29T21:12:44.515+0100 I NETWORK  [initandlisten] connection accepted from 127.0.0.1:51190 #736 (4 connections now open)
2015-04-29T21:12:44.516+0100 I NETWORK  [conn736] end connection 127.0.0.1:51190 (3 connections now open)
2015-04-29T21:12:44.516+0100 I NETWORK  [initandlisten] connection accepted from 127.0.0.1:51191 #737 (4 connections now open)
2015-04-29T21:12:44.518+0100 I REPL     [replslave] resync: done with initial clone for db: test_compact_keeps_indexes_drop
2015-04-29T21:12:44.518+0100 I NETWORK  [conn737] end connection 127.0.0.1:51191 (3 connections now open)
2015-04-29T21:12:44.518+0100 I INDEX    [replslave] allocating new ns file /data/db/sconsTestsSlave\test_compact_keeps_indexes_drop.ns, filling with zeroes...
2015-04-29T21:12:44.759+0100 I STORAGE  [FileAllocator] allocating new datafile /data/db/sconsTestsSlave\test_compact_keeps_indexes_drop.0, filling with zeroes...
2015-04-29T21:12:45.824+0100 I STORAGE  [FileAllocator] done allocating datafile /data/db/sconsTestsSlave\test_compact_keeps_indexes_drop.0, size: 64MB,  took 1.064 secs
2015-04-29T21:12:45.832+0100 I REPL     [replslave] repl:   applied 1 operations
2015-04-29T21:12:45.833+0100 I REPL     [replslave] repl:  end sync_pullOpLog syncedTo: Apr 29 21:12:41 55413b39:3
2015-04-29T21:12:45.833+0100 I REPL     [replslave] repl: syncing from host:localhost:27999
2015-04-29T21:12:45.833+0100 I REPL     [replslave] An earlier initial clone of 'test_compact_keeps_indexes_drop' did not complete, now resyncing.
2015-04-29T21:12:45.833+0100 I REPL     [replslave] resync: dropping database test_compact_keeps_indexes_drop
....
2015-04-29T21:17:41.674+0100 I REPL     [replslave] repl:   applied 1 operations
2015-04-29T21:17:41.674+0100 I REPL     [replslave] repl:  end sync_pullOpLog syncedTo: Apr 29 21:12:41 55413b39:bb
2015-04-29T21:17:41.675+0100 I REPL     [replslave] repl: syncing from host:localhost:27999
2015-04-29T21:17:41.675+0100 I REPL     [replslave] An earlier initial clone of 'test_compact_keeps_indexes_drop' did not complete, now resyncing.
2015-04-29T21:17:41.675+0100 I REPL     [replslave] resync: dropping database test_compact_keeps_indexes_drop
2015-04-29T21:17:41.733+0100 I JOURNAL  [replslave] journalCleanup...
2015-04-29T21:17:41.735+0100 I JOURNAL  [replslave] removeJournalFiles
2015-04-29T21:17:41.755+0100 I COMMAND  [conn733] command testing.$cmd command: insert { insert: "smokeWait", ordered: true, writeConcern: { wtimeout: 300000, w: 2 }, documents: [ { _id: ObjectId('55413b39bebf3007a4343dc8') } ] } keyUpdates:0 writeConflicts:0 numYields:0 reslen:161 locks:{ Global: { acquireCount: { w: 2 } }, MMAPV1Journal: { acquireCount: { w: 2 } }, Database: { acquireCount: { w: 2 } }, Collection: { acquireCount: { W: 2 } } } 300001ms
2015-04-29T21:17:41.778+0100 I NETWORK  [conn2] end connection 127.0.0.1:50298 (2 connections now open)
[Errno 10061] No connection could be made because the target machine actively refused it
timed out
timed out
timed out
timed out
timed out
timed out
 
Awaiting replicated (w:2, wtimeout:5min) insert (port:27999)
78 tests succeeded
696 tests didn't get run
replication ok for 2571 collections
timed out
Traceback (most recent call last):
  File "buildscripts/smoke.py", line 1459, in <module>
    main()
  File "buildscripts/smoke.py", line 1443, in main
    run_tests(tests)
  File "buildscripts/smoke.py", line 792, in run_tests
    master.wait_for_repl()
  File "buildscripts/smoke.py", line 349, in wait_for_repl
    MongoClient(port=self.port).testing.smokeWait.insert({}, w=2, wtimeout=5*60*1000)
  File "C:\python27\lib\site-packages\pymongo-3.0.1-py2.7-win32.egg\pymongo\collection.py", line 1926, in insert
    check_keys, manipulate, write_concern)
  File "C:\python27\lib\site-packages\pymongo-3.0.1-py2.7-win32.egg\pymongo\collection.py", line 431, in _insert
    _check_write_command_response(results)
  File "C:\python27\lib\site-packages\pymongo-3.0.1-py2.7-win32.egg\pymongo\helpers.py", line 267, in _check_write_command_response
    error.get("errmsg"), error.get("code"), error)
pymongo.errors.WTimeoutError: waiting for replication timed out

Comment by Ernie Hershey [ 29/Apr/15 ]

Thanks Scott.

I attached the output of the smoke.py run with stdout and stderr.

/cygdrive/c/python27/python.exe buildscripts/smoke.py --with-cleanbb --mongod ./mongod --mongo ./mongo --report-file report.json --continue-on-failure --shell-write-mode commands --small-oplog jsCore 2>&1 | tee smokepyrun.txt

Comment by Scott Hernandez (Inactive) [ 29/Apr/15 ]

Please provide the logs from both the master and slave to help diagnose why the write timed out.

Generated at Thu Feb 08 03:47:06 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.