[SERVER-27400] RocksDB insertion count failure in concurrency test Created: 13/Dec/16  Updated: 06/Dec/22  Resolved: 03/Jul/18

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Trivial - P5
Reporter: Eric Milkie Assignee: Backlog - Storage Execution Team
Resolution: Won't Fix Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
Assigned Teams:
Storage Execution
Operating System: ALL
Participants:
Linked BF Score: 65

 Description   

concurrency_simultaneous failed on ubuntu1404-rocksdb

Project: mongodb-mongo-master

fsm_all_simultaneous.js - Logs | History

[js_test:fsm_all_simultaneous] 2016-12-02T19:55:12.913+0000 2016-12-02T19:55:12.913+0000 E QUERY    [main] Error: 2 threads threw
[js_test:fsm_all_simultaneous] 2016-12-02T19:55:12.913+0000 
[js_test:fsm_all_simultaneous] 2016-12-02T19:55:12.913+0000         Foreground jstests/concurrency/fsm_workloads/indexed_insert_unordered_bulk.js
[js_test:fsm_all_simultaneous] 2016-12-02T19:55:12.913+0000         Error: [0] != [15] are not equal : undefined
[js_test:fsm_all_simultaneous] 2016-12-02T19:55:12.913+0000 
[js_test:fsm_all_simultaneous] 2016-12-02T19:55:12.913+0000         quietlyDoAssert@jstests/concurrency/fsm_libs/assert.js:53:15
[js_test:fsm_all_simultaneous] 2016-12-02T19:55:12.914+0000         assert.eq@src/mongo/shell/assert.js:54:5
[js_test:fsm_all_simultaneous] 2016-12-02T19:55:12.914+0000         wrapAssertFn@jstests/concurrency/fsm_libs/assert.js:60:13
[js_test:fsm_all_simultaneous] 2016-12-02T19:55:12.914+0000         assertWithLevel/</assertWithLevel[fn]@jstests/concurrency/fsm_libs/assert.js:99:13
[js_test:fsm_all_simultaneous] 2016-12-02T19:55:12.914+0000         find@jstests/concurrency/fsm_workloads/indexed_insert_base.js:47:13
[js_test:fsm_all_simultaneous] 2016-12-02T19:55:12.914+0000         runFSM@jstests/concurrency/fsm_libs/fsm.js:37:13
[js_test:fsm_all_simultaneous] 2016-12-02T19:55:12.914+0000         @<unknown> line 6 > eval:10:9
[js_test:fsm_all_simultaneous] 2016-12-02T19:55:12.914+0000         main@jstests/concurrency/fsm_libs/worker_thread.js:104:17
[js_test:fsm_all_simultaneous] 2016-12-02T19:55:12.914+0000         @<unknown> line 6 > eval:7:1
[js_test:fsm_all_simultaneous] 2016-12-02T19:55:12.914+0000         @<unknown> line 6 > eval:5:24
[js_test:fsm_all_simultaneous] 2016-12-02T19:55:12.914+0000         _threadStartWrapper@:24:16
[js_test:fsm_all_simultaneous] 2016-12-02T19:55:12.914+0000 
[js_test:fsm_all_simultaneous] 2016-12-02T19:55:12.914+0000  :
[js_test:fsm_all_simultaneous] 2016-12-02T19:55:12.915+0000 throwError@jstests/concurrency/fsm_libs/runner.js:339:23
[js_test:fsm_all_simultaneous] 2016-12-02T19:55:12.915+0000 runWorkloads@jstests/concurrency/fsm_libs/runner.js:734:17
[js_test:fsm_all_simultaneous] 2016-12-02T19:55:12.915+0000 parallel@jstests/concurrency/fsm_libs/runner.js:756:1
[js_test:fsm_all_simultaneous] 2016-12-02T19:55:12.915+0000 @jstests/concurrency/fsm_all_simultaneous.js:24:1
[js_test:fsm_all_simultaneous] 2016-12-02T19:55:12.915+0000 failed to load: jstests/concurrency/fsm_all_simultaneous.js



 Comments   
Comment by Ian Whalen (Inactive) [ 19/Jun/17 ]

Moving to backlog to keep a clean view for myself. PMs will own pinging the rocks team occasionally to see if they've fixed.

Comment by Eric Milkie [ 16/Jun/17 ]

Filed https://github.com/mongodb-partners/mongo-rocks/issues/80 for tracking

Comment by Ian Whalen (Inactive) [ 04/Apr/17 ]

igor - have you had a chance to repro this yet?

Comment by Igor Canadi [ 06/Mar/17 ]

Thanks Eric! I'll try reproducing as soon as I can built it (SERVER-28199)

Comment by Eric Milkie [ 03/Mar/17 ]

The EC2 instance type is: c3.4xlarge

Comment by Eric Milkie [ 03/Mar/17 ]

Someone pointed out that we have had success running the gcc version of ASAN (rather than the Clang version), so I may try going down that route now.
Also, I'll determine what EC2 instance we're using to run the failing test and get back to you!

Comment by Igor Canadi [ 03/Mar/17 ]

Can you share EC2 instance on which you're running this test? Sometimes the test will only repro on the select hardware. I'll setup the test to run many times in the loop on the exact same EC2 instance type. Once we have a repro it'll be much easier to debug.

Comment by Eric Milkie [ 02/Mar/17 ]

I tried running a patch on ASAN, but I failed to get the server to start. It was a challenge just to get Clang to build it, and then after that I think there are binary activation problems that our build system isn't picking up.

Comment by Igor Canadi [ 28/Feb/17 ]

Hi Eric, I think I fixed the compile with https://github.com/mongodb-partners/mongo-rocks/commit/f692561c99563f8bc1e3a5b086070d9bfeab7515. I'm afraid I still got nothing on the source of this bug. Would it make sense to run MongoRocks tests under ASAN build? ASAN sometimes just magically finds the root cause of weird issues

Comment by Eric Milkie [ 27/Feb/17 ]

Hi igor
The compilation of this project has broken because we decided to rename a function in the storage API – sorry for that! It should be easy to fix, or we can submit a pull request.
Once this project starts compiling and running the full test suite again, I suspect we will still see failures in the concurrency tests. Do you have any leads on what may be going wrong?
-Eric

Comment by Eric Milkie [ 15/Feb/17 ]

Here's another failure:

concurrency_replication failed on Ubuntu 14.04 (RocksDB)

Project: MongoDB (3.4)
fsm_all_replication.js - Logs | History

Comment by Igor Canadi [ 24/Jan/17 ]

Ah, I though I was lucky! I'll try reproing again I guess. Thanks Eric!

Comment by Eric Milkie [ 18/Jan/17 ]

It turns out I was looking at Jira incorrectly and was mistaken – this type of failure continues to happen in our test suite. Here's one of the latest failures:

concurrency failed on Ubuntu 14.04 (RocksDB)

December 2 is the earliest incidence of failure that I can find.

Comment by Igor Canadi [ 18/Jan/17 ]

Hi Eric, I haven't been able to reproduce unfortunately (I ran the test in a loop for a day). I also looked deep through the code and haven't detected anything that might have caused this.

It might be that there was an underlying bug in RocksDB, since the tests in Evergreen run on RocksDB's master branch.

Would it make sense to close it for now and reopen if we see the issue happening again?

Comment by Eric Milkie [ 18/Jan/17 ]

Hi igor,
Did you happen to commit a fix for this? I haven't seen any failures for a while.

Comment by Igor Canadi [ 14/Dec/16 ]

Thanks Eric, taking a look.

Comment by Eric Milkie [ 13/Dec/16 ]

Finally, here is a similar failure but in a different way:

concurrency failed on ubuntu1404-rocksdb

Project: mongodb-mongo-master

fsm_all.js - Logs | History

	
[js_test:fsm_all] 2016-12-02T19:54:33.535+0000 2016-12-02T19:54:33.534+0000 E QUERY    [main] Error: 1 thread threw
[js_test:fsm_all] 2016-12-02T19:54:33.535+0000 
[js_test:fsm_all] 2016-12-02T19:54:33.535+0000         Foreground jstests/concurrency/fsm_workloads/touch_base.js
[js_test:fsm_all] 2016-12-02T19:54:33.535+0000         Error: [0] != [100] are not equal : collection scan should return the number of documents this thread inserted
[js_test:fsm_all] 2016-12-02T19:54:33.535+0000 
[js_test:fsm_all] 2016-12-02T19:54:33.535+0000         quietlyDoAssert@jstests/concurrency/fsm_libs/assert.js:53:15
[js_test:fsm_all] 2016-12-02T19:54:33.535+0000         assert.eq@src/mongo/shell/assert.js:54:5
[js_test:fsm_all] 2016-12-02T19:54:33.535+0000         wrapAssertFn@jstests/concurrency/fsm_libs/assert.js:60:13
[js_test:fsm_all] 2016-12-02T19:54:33.535+0000         assertWithLevel/</assertWithLevel[fn]@jstests/concurrency/fsm_libs/assert.js:99:13
[js_test:fsm_all] 2016-12-02T19:54:33.535+0000         query@jstests/concurrency/fsm_workloads/touch_base.js:36:1
[js_test:fsm_all] 2016-12-02T19:54:33.535+0000         runFSM@jstests/concurrency/fsm_libs/fsm.js:37:13
[js_test:fsm_all] 2016-12-02T19:54:33.536+0000         @<unknown> line 6 > eval:10:9
[js_test:fsm_all] 2016-12-02T19:54:33.536+0000         main@jstests/concurrency/fsm_libs/worker_thread.js:104:17
[js_test:fsm_all] 2016-12-02T19:54:33.536+0000         @<unknown> line 6 > eval:7:1
[js_test:fsm_all] 2016-12-02T19:54:33.536+0000         @<unknown> line 6 > eval:5:24
[js_test:fsm_all] 2016-12-02T19:54:33.536+0000         _threadStartWrapper@:24:16
[js_test:fsm_all] 2016-12-02T19:54:33.536+0000 
[js_test:fsm_all] 2016-12-02T19:54:33.536+0000  :
[js_test:fsm_all] 2016-12-02T19:54:33.536+0000 throwError@jstests/concurrency/fsm_libs/runner.js:339:23
[js_test:fsm_all] 2016-12-02T19:54:33.536+0000 runWorkloads@jstests/concurrency/fsm_libs/runner.js:734:17
[js_test:fsm_all] 2016-12-02T19:54:33.536+0000 serial@jstests/concurrency/fsm_libs/runner.js:747:1
[js_test:fsm_all] 2016-12-02T19:54:33.536+0000 @jstests/concurrency/fsm_all.js:16:1
[js_test:fsm_all] 2016-12-02T19:54:33.537+0000 failed to load: jstests/concurrency/fsm_all.js

Comment by Eric Milkie [ 13/Dec/16 ]

Here is another instance of the failure:

concurrency_sharded failed on ubuntu1404-rocksdb

Project: mongodb-mongo-master

fsm_all_sharded_replication.js - Logs | History

[js_test:fsm_all_sharded_replication] 2016-12-07T21:27:21.451+0000 2016-12-07T21:27:21.450+0000 E QUERY    [main] Error: 2 threads threw
[js_test:fsm_all_sharded_replication] 2016-12-07T21:27:21.451+0000 
[js_test:fsm_all_sharded_replication] 2016-12-07T21:27:21.451+0000         Foreground jstests/concurrency/fsm_workloads/indexed_insert_ordered_bulk.js
[js_test:fsm_all_sharded_replication] 2016-12-07T21:27:21.451+0000         Error: [0] != [15] are not equal : undefined
[js_test:fsm_all_sharded_replication] 2016-12-07T21:27:21.451+0000 
[js_test:fsm_all_sharded_replication] 2016-12-07T21:27:21.451+0000         quietlyDoAssert@jstests/concurrency/fsm_libs/assert.js:53:15
[js_test:fsm_all_sharded_replication] 2016-12-07T21:27:21.451+0000         assert.eq@src/mongo/shell/assert.js:54:5
[js_test:fsm_all_sharded_replication] 2016-12-07T21:27:21.451+0000         wrapAssertFn@jstests/concurrency/fsm_libs/assert.js:60:13
[js_test:fsm_all_sharded_replication] 2016-12-07T21:27:21.451+0000         assertWithLevel/</assertWithLevel[fn]@jstests/concurrency/fsm_libs/assert.js:99:13
[js_test:fsm_all_sharded_replication] 2016-12-07T21:27:21.451+0000         find@jstests/concurrency/fsm_workloads/indexed_insert_base.js:47:13
[js_test:fsm_all_sharded_replication] 2016-12-07T21:27:21.451+0000         runFSM@jstests/concurrency/fsm_libs/fsm.js:37:13
[js_test:fsm_all_sharded_replication] 2016-12-07T21:27:21.451+0000         @<unknown> line 6 > eval:10:9
[js_test:fsm_all_sharded_replication] 2016-12-07T21:27:21.452+0000         main@jstests/concurrency/fsm_libs/worker_thread.js:104:17
[js_test:fsm_all_sharded_replication] 2016-12-07T21:27:21.452+0000         @<unknown> line 6 > eval:7:1
[js_test:fsm_all_sharded_replication] 2016-12-07T21:27:21.452+0000         @<unknown> line 6 > eval:5:24
[js_test:fsm_all_sharded_replication] 2016-12-07T21:27:21.452+0000         _threadStartWrapper@:24:16
[js_test:fsm_all_sharded_replication] 2016-12-07T21:27:21.452+0000 
[js_test:fsm_all_sharded_replication] 2016-12-07T21:27:21.452+0000  :
[js_test:fsm_all_sharded_replication] 2016-12-07T21:27:21.452+0000 throwError@jstests/concurrency/fsm_libs/runner.js:339:23
[js_test:fsm_all_sharded_replication] 2016-12-07T21:27:21.452+0000 runWorkloads@jstests/concurrency/fsm_libs/runner.js:734:17
[js_test:fsm_all_sharded_replication] 2016-12-07T21:27:21.452+0000 serial@jstests/concurrency/fsm_libs/runner.js:747:1
[js_test:fsm_all_sharded_replication] 2016-12-07T21:27:21.452+0000 @jstests/concurrency/fsm_all_sharded_replication.js:99:1
[js_test:fsm_all_sharded_replication] 2016-12-07T21:27:21.452+0000 failed to load: jstests/concurrency/fsm_all_sharded_replication.js

Comment by Eric Milkie [ 13/Dec/16 ]

Hi igor,
One of our concurrency tests encountered a failure with our RocksDB builder. Can you help diagnose?
-Eric

Generated at Thu Feb 08 04:15:04 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.