[SERVER-15762] index_retry.js fails due to an inability to execute a query Created: 14/Aug/14  Updated: 16/Apr/15  Resolved: 21/Oct/14

Status: Closed
Project: Core Server
Component/s: Testing Infrastructure
Affects Version/s: None
Fix Version/s: 2.7.8

Type: Bug Priority: Critical - P2
Reporter: Matt Dannenberg Assignee: Mark Benvenuto
Resolution: Done Votes: 0
Labels: timeout_on_slow_IO
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-15036 add logging inside failing test index... Closed
Related
related to SERVER-18087 index_retry.js and index_no_retry.js ... Closed
Operating System: ALL
Participants:

 Description   

https://mci.10gen.com/ui/task/mongodb_mongo_master_windows_64_debug_1f5428d02542074a49e054172f4dac46b06fd97b_14_08_14_14_40_27_noPassthroughWithMongod_windows_64_debug
http://buildlogs.mongodb.org/mci_0.9_windows-64-debug/builds/76361/test/noPassthroughWithMongod_0/index_retry.js

 m31000| 2014-08-14T10:56:44.258-0700 I COMMANDS [conn1] assertion 17007 Unable to execute query: error processing query: ns=test.jstests_slownightly_index_retry limit=0 skip=0
 m31000| Tree: a == 42.0
 m31000| Sort: {}
 m31000| Proj: {}
 m31000|  planner returned error: bad hint ns:test.jstests_slownightly_index_retry query:{ query: { a: 42.0 }, $hint: { a: 1.0 } }
Error: error: {
	"$err" : "Unable to execute query: error processing query: ns=test.jstests_slownightly_index_retry limit=0 skip=0\nTree: a == 42.0\nSort: {}\nProj: {}\n planner returned error: bad hint",
	"code" : 17007
}



 Comments   
Comment by Githook User [ 21/Oct/14 ]

Author:

{u'username': u'markbenvenuto', u'name': u'Mark Benvenuto', u'email': u'mark.benvenuto@mongodb.com'}

Message: SERVER-15762: index_retry.js - wait for journal flush
Branch: master
https://github.com/mongodb/mongo/commit/cadfa581eed203a2d728153945a800a204ccf3bd

Comment by Mark Benvenuto [ 21/Oct/14 ]

It is a race condition between the journal flush, and the kill to abort the index build. On Windows, the journal flush is taking a little to long to run, and so when the mongod restarts, the createIndex command does not exist in the database.

The fix is to wait until the journal is flushed.

Comment by Crystal Horn [ 15/Oct/14 ]

Windows 64-bit 2008R2+

Comment by Gabriel Russell (Inactive) [ 17/Sep/14 ]

ba726b4ee2 Windows 64-bit 2008R2+ noPassthroughWithMongod
https://mci.10gen.com/task/mongodb_mongo_master_windows_64_2k8_ba726b4ee22efc2ea9b86b52724e0a4081c16c53_14_09_17_17_16_47_noPassthroughWithMongod_windows_64_2k8

Comment by Siyuan Zhou [ 15/Sep/14 ]

2d8603ecc5 Windows 64-bit DEBUG noPassthroughWithMongod
https://mci.10gen.com/task/mongodb_mongo_master_windows_64_debug_2d8603ecc535b0ff980fe537a7fe16c06d687bb6_14_09_14_14_14_07_noPassthroughWithMongod_windows_64_debug
http://buildlogs.mongodb.org/MCI_windows-64-debug/builds/43159/test/noPassthroughWithMongod_0/index_retry.js

Comment by Siyuan Zhou [ 12/Sep/14 ]

Happened again
https://mci.10gen.com/ui/task/mongodb_mongo_master_windows_64_2k8_debug_87252bb93d3ebd5bf4b456d3fc06b4bae763a958_14_09_12_12_39_16_noPassthroughWithMongod_windows_64_2k8_debug

Comment by Adinoyi Omuya [ 05/Sep/14 ]

Another instance in f258721ea5 noPassthroughWithMongod on Windows 64-bit 2008R2+ DEBUG.

https://mci.10gen.com/ui/task/mongodb_mongo_master_windows_64_2k8_debug_f258721ea54fcd9746f7070ca1f467a8f92b8b83_14_09_05_02_26_07_noPassthroughWithMongod_windows_64_2k8_debug

http://buildlogs.mongodb.org/MCI_windows-64-2k8-debug/builds/18529/test/noPassthroughWithMongod_0/index_retry.js

Comment by Benety Goh [ 30/Aug/14 ]

79626bc249 Windows 64-bit 2008R2+ DEBUG noPassthroughWithMongod

https://mci.10gen.com/ui/task/mongodb_mongo_master_windows_64_2k8_debug_79626bc249484368fe4d9369ec52f7df70e6a726_14_08_30_13_43_07_noPassthroughWithMongod_windows_64_2k8_debug

http://buildlogs.mongodb.org/MCI_windows-64-2k8-debug/builds/5032/test/noPassthroughWithMongod_0/index_retry.js

 m31000| 2014-08-30T11:10:34.224-0700 I STORAGE  [initandlisten] _DEBUG ReadContext db wasn't open, will try to open local
 m31000| 2014-08-30T11:10:34.224-0700 I STORAGE  [initandlisten] opening db: local
 m31000| 2014-08-30T11:10:34.234-0700 I STORAGE  [initandlisten] _DEBUG ReadContext db wasn't open, will try to open test
 m31000| 2014-08-30T11:10:34.234-0700 I STORAGE  [initandlisten] opening db: test
 m31000| 2014-08-30T11:10:34.247-0700 I NETWORK  [initandlisten] waiting for connections on port 31000
 m31000| 2014-08-30T11:10:34.286-0700 I NETWORK  [initandlisten] connection accepted from 127.0.0.1:50969 #1 (1 connection now open)
 m31000| 2014-08-30T11:10:34.292-0700 I COMMANDS [conn1] assertion 17007 Unable to execute query: error processing query: ns=test.jstests_slownightly_index_retry limit=0 skip=0
 m31000| Tree: a == 42.0
 m31000| Sort: {}
 m31000| Proj: {}
 m31000|  planner returned error: bad hint ns:test.jstests_slownightly_index_retry query:{ query: { a: 42.0 }, $hint: { a: 1.0 } }
Error: error: {
	"$err" : "Unable to execute query: error processing query: ns=test.jstests_slownightly_index_retry limit=0 skip=0\nTree: a == 42.0\nSort: {}\nProj: {}\n planner returned error: bad hint",
	"code" : 17007
}
Index build not yet complete
...
[ ]
assert.soon failed, msg:index builds successfully
Error: assert.soon failed, msg:index builds successfully
    at Error (<anonymous>)
    at doassert (src/mongo/shell/assert.js:11:14)
    at Function.assert.soon (src/mongo/shell/assert.js:181:51)
    at C:\data\mci\shell\src\jstests\noPassthroughWithMongod\index_retry.js:82:8
2014-08-30T11:11:34.421-0700 I QUERY    Error: assert.soon failed, msg:index builds successfully
    at Error (<anonymous>)
    at doassert (src/mongo/shell/assert.js:11:14)
    at Function.assert.soon (src/mongo/shell/assert.js:181:51)
    at C:\data\mci\shell\src\jstests\noPassthroughWithMongod\index_retry.js:82:8 at src/mongo/shell/assert.js:13
failed to load: C:\data\mci\shell\src\jstests\noPassthroughWithMongod\index_retry.js
 m31000| 2014-08-30T11:11:34.422-0700 I -        [eventTerminate] shutdown event signaled, will terminate after current cmd ends

Comment by Benety Goh [ 28/Aug/14 ]

Successful test run contains the following log messages:

6e90ddf91f Windows 64-bit DEBUG noPassthroughWithMongod

https://mci.10gen.com/ui/task/mongodb_mongo_master_windows_64_debug_6e90ddf91fdbbc578e499fcb736365498e47f55a_14_08_28_14_29_06_noPassthroughWithMongod_windows_64_debug

http://buildlogs.mongodb.org/mci_0.9_windows-64-debug/builds/79436/test/noPassthroughWithMongod_0/index_retry.js

 m31000| 2014-08-28T10:38:53.489-0700 I JOURNAL  [initandlisten] checkHash len:557825 hash:C862849CEA5C5DD4E6FD94929A774BB6 current:C862849CEA5C5DD4E6FD94929A774BB6
 m31000| 2014-08-28T10:38:53.550-0700 I JOURNAL  [initandlisten] recover cleaning up
 m31000| 2014-08-28T10:38:53.551-0700 I JOURNAL  [initandlisten] removeJournalFiles
 m31000| 2014-08-28T10:38:53.561-0700 I JOURNAL  [initandlisten] recover done
 m31000| 2014-08-28T10:38:53.601-0700 I STORAGE  [initandlisten] opening db: local
 m31000| 2014-08-28T10:38:53.611-0700 I STORAGE  [initandlisten] opening db: test
 m31000| 2014-08-28T10:38:53.621-0700 I STORAGE  [initandlisten] _DEBUG ReadContext db wasn't open, will try to open admin.system.roles
 m31000| 2014-08-28T10:38:53.621-0700 I STORAGE  [initandlisten] opening db: admin
 m31000| 2014-08-28T10:38:53.625-0700 I STORAGE  [initandlisten] _DEBUG ReadContext db wasn't open, will try to open local
 m31000| 2014-08-28T10:38:53.625-0700 I STORAGE  [initandlisten] opening db: local
 m31000| 2014-08-28T10:38:53.630-0700 I STORAGE  [initandlisten] _DEBUG ReadContext db wasn't open, will try to open test
 m31000| 2014-08-28T10:38:53.630-0700 I STORAGE  [initandlisten] opening db: test
 m31000| 2014-08-28T10:38:53.638-0700 I INDEXING [initandlisten] found 1 index(es) that wasn't finished before shutdown
 m31000| 2014-08-28T10:38:53.639-0700 I INDEXING [initandlisten] found 1 interrupted index build(s) on test.jstests_slownightly_index_retry
 m31000| 2014-08-28T10:38:53.639-0700 I INDEXING [initandlisten] note: restart the server with --noIndexBuildRetry to skip index rebuilds
 m31000| 2014-08-28T10:38:53.640-0700 I INDEXING [initandlisten] build index on: test.jstests_slownightly_index_retry properties: { v: 1, key: { a: 1.0 }, name: "a_1", ns: "test.jstests_slownightly_index_retry", background: true }
 m31000| 2014-08-28T10:38:53.640-0700 I INDEXING [initandlisten] 	 building index using bulk method
2014-08-28T10:38:54.456-0700 W NETWORK  Failed to connect to 127.0.0.1:31000, reason: errno:10061 No connection could be made because the target machine actively refused it.
2014-08-28 13:38:55 EDT	
2014-08-28T10:38:55.677-0700 W NETWORK  Failed to connect to 127.0.0.1:31000, reason: errno:10061 No connection could be made because the target machine actively refused it.
 m31000| 2014-08-28T10:38:56.000-0700 I          [initandlisten] 		Index Build: 121400/500000	24%
2014-08-28T10:38:56.879-0700 W NETWORK  Failed to connect to 127.0.0.1:31000, reason: errno:10061 No connection could be made because the target machine actively refused it.
2014-08-28 13:38:58 EDT	
2014-08-28T10:38:58.077-0700 W NETWORK  Failed to connect to 127.0.0.1:31000, reason: errno:10061 No connection could be made because the target machine actively refused it.
 m31000| 2014-08-28T10:38:59.000-0700 I          [initandlisten] 		Index Build: 271400/500000	54%
2014-08-28T10:38:59.278-0700 W NETWORK  Failed to connect to 127.0.0.1:31000, reason: errno:10061 No connection could be made because the target machine actively refused it.
2014-08-28 13:39:00 EDT	
2014-08-28T10:39:00.481-0700 W NETWORK  Failed to connect to 127.0.0.1:31000, reason: errno:10061 No connection could be made because the target machine actively refused it.
2014-08-28 13:39:01 EDT	
2014-08-28T10:39:01.685-0700 W NETWORK  Failed to connect to 127.0.0.1:31000, reason: errno:10061 No connection could be made because the target machine actively refused it.
 m31000| 2014-08-28T10:39:02.000-0700 I          [initandlisten] 		Index Build: 424700/500000	84%
2014-08-28T10:39:02.888-0700 W NETWORK  Failed to connect to 127.0.0.1:31000, reason: errno:10061 No connection could be made because the target machine actively refused it.
2014-08-28 13:39:04 EDT	
2014-08-28T10:39:04.093-0700 W NETWORK  Failed to connect to 127.0.0.1:31000, reason: errno:10061 No connection could be made because the target machine actively refused it.
2014-08-28 13:39:05 EDT	
2014-08-28T10:39:05.297-0700 W NETWORK  Failed to connect to 127.0.0.1:31000, reason: errno:10061 No connection could be made because the target machine actively refused it.
 m31000| 2014-08-28T10:39:05.339-0700 I INDEXING [initandlisten] build index done.  scanned 500000 total records. 11 secs
 m31000| 2014-08-28T10:39:05.764-0700 I QUERY    [initandlisten] insert local.startup_log ninserted:1 keyUpdates:0 numYields:0  106ms
 m31000| 2014-08-28T10:39:05.764-0700 I NETWORK  [initandlisten] waiting for connections on port 31000

Comment by Benety Goh [ 28/Aug/14 ]

4b3e0275c3 Windows 64-bit DEBUG noPassthroughWithMongod

https://mci.10gen.com/ui/task/mongodb_mongo_master_windows_64_debug_4b3e0275c35e53505542c675bd2ef4a5e8a5a4da_14_08_28_03_09_06_noPassthroughWithMongod_windows_64_debug

http://buildlogs.mongodb.org/mci_0.9_windows-64-debug/builds/79396/test/noPassthroughWithMongod_0/index_retry.js

 m31000| 2014-08-28T00:59:11.474-0700 I JOURNAL  [initandlisten] recover cleaning up
 m31000| 2014-08-28T00:59:11.474-0700 I JOURNAL  [initandlisten] removeJournalFiles
 m31000| 2014-08-28T00:59:11.478-0700 I JOURNAL  [initandlisten] recover done
 m31000| 2014-08-28T00:59:11.518-0700 I STORAGE  [initandlisten] opening db: local
 m31000| 2014-08-28T00:59:11.528-0700 I STORAGE  [initandlisten] opening db: test
 m31000| 2014-08-28T00:59:11.537-0700 I STORAGE  [initandlisten] _DEBUG ReadContext db wasn't open, will try to open admin.system.roles
 m31000| 2014-08-28T00:59:11.537-0700 I STORAGE  [initandlisten] opening db: admin
 m31000| 2014-08-28T00:59:11.540-0700 I STORAGE  [initandlisten] _DEBUG ReadContext db wasn't open, will try to open local
 m31000| 2014-08-28T00:59:11.541-0700 I STORAGE  [initandlisten] opening db: local
 m31000| 2014-08-28T00:59:11.546-0700 I STORAGE  [initandlisten] _DEBUG ReadContext db wasn't open, will try to open test
 m31000| 2014-08-28T00:59:11.546-0700 I STORAGE  [initandlisten] opening db: test
 m31000| 2014-08-28T00:59:11.555-0700 I NETWORK  [initandlisten] waiting for connections on port 31000
...
 m31000| 2014-08-28T01:00:11.616-0700 I COMMANDS [conn1] assertion 17007 Unable to execute query: error processing query: ns=test.jstests_slownightly_index_retry limit=0 skip=0
 m31000| Tree: a == 42.0
 m31000| Sort: {}
 m31000| Proj: {}
 m31000|  planner returned error: bad hint ns:test.jstests_slownightly_index_retry query:{ query: { a: 42.0 }, $hint: { a: 1.0 } }
Error: error: {
	"$err" : "Unable to execute query: error processing query: ns=test.jstests_slownightly_index_retry limit=0 skip=0\nTree: a == 42.0\nSort: {}\nProj: {}\n planner returned error: bad hint",
	"code" : 17007
}
Index build not yet complete:
[ ]
assert.soon failed, msg:index builds successfully
Error: assert.soon failed, msg:index builds successfully
    at Error (<anonymous>)
    at doassert (src/mongo/shell/assert.js:11:14)
    at Function.assert.soon (src/mongo/shell/assert.js:181:51)
    at C:\data\mci\shell\src\jstests\noPassthroughWithMongod\index_retry.js:82:8
2014-08-28T01:00:11.626-0700 I QUERY    Error: assert.soon failed, msg:index builds successfully
    at Error (<anonymous>)
    at doassert (src/mongo/shell/assert.js:11:14)
    at Function.assert.soon (src/mongo/shell/assert.js:181:51)
    at C:\data\mci\shell\src\jstests\noPassthroughWithMongod\index_retry.js:82:8 at src/mongo/shell/assert.js:13
failed to load: C:\data\mci\shell\src\jstests\noPassthroughWithMongod\index_retry.js

Comment by David Storch [ 26/Aug/14 ]

Additional logging added to the test in the following commit:

Author:
{u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}
Message: SERVER-15036 log progress of background index build in index_retry.js
Branch: master
https://github.com/mongodb/mongo/commit/ab45fd3e0c3ea06a5c093e826862618ae3b6205a

This ticket is now waiting for another failure in order to gather more information.

Comment by David Storch [ 26/Aug/14 ]

The test does the following:

  1. Uses the bulk write API to insert 100,000 documents of the form {a: i} for increasing values of i.
  2. In a parallel shell, starts a background index build with key pattern {a: 1}.
  3. In the original shell, uses db.currentOp() to verify that the index build has started.
  4. Shuts down the mongod.
  5. Restarts the mongod (with reuseData=true), and queries with a hint on {a: 1} inside an assert.soon. If the index build is complete, then the query should succeed. If the index build is not complete, then the query will fail with "bad hint".

The test is failing at the final step: i.e. it is taking more than 60 seconds for the index build to complete. This is very likely a timeout due to slow IO on Windows machines. However, I propose adding some logging to the test which will help to determine how fast the index build is proceeding.

Comment by sam.helman@10gen.com [ 25/Aug/14 ]

https://mci.10gen.com/ui/task/mongodb_mongo_master_windows_64_debug_2c510c5c06b4e557efb680d57694d24f1c028464_14_08_25_00_59_07_noPassthroughWithMongod_windows_64_debug

Comment by sam.helman@10gen.com [ 21/Aug/14 ]

https://mci.10gen.com/ui/task/mongodb_mongo_master_windows_64_debug_0dcc9c14f5d8f0f26c174676fa770b18f4aa122c_14_08_21_15_48_25_noPassthroughWithMongod_windows_64_debug

Comment by Michael O'Brien [ 15/Aug/14 ]

https://mci.10gen.com/ui/task_history/mongodb-mongo-master/noPassthroughWithMongod#index_retry.js=fail

Comment by Matt Dannenberg [ 14/Aug/14 ]

again:

https://mci.10gen.com/ui/task/mongodb_mongo_master_windows_64_2k8_debug_10827279fbc6380d4d7774837022b65b3204a988_14_08_14_15_45_51_noPassthroughWithMongod_windows_64_2k8_debug
http://buildlogs.mongodb.org/mci_0.9_windows-64-2k8-debug/builds/84950/test/noPassthroughWithMongod_0/index_retry.js

Generated at Thu Feb 08 03:38:54 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.