[SERVER-2787] Upgrading to 1.8.0 and enabling journaling causing cryptic "_a != -1" error Created: 17/Mar/11  Updated: 12/Jul/16  Resolved: 18/Mar/11

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 1.8.0
Fix Version/s: 1.8.1, 1.9.0

Type: Bug Priority: Major - P3
Reporter: Brian Ploetz Assignee: Aaron Staple
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Mac OSX 10.6.6
Ruby 1.9.2-p0
Mongoid 2.0.0.beta20
Mongo Ruby Driver 1.2.4


Attachments: File diaglog.4d8282ce    
Issue Links:
Depends
depends on SERVER-2791 missing write intent while initializi... Closed
Operating System: OS X
Participants:

 Description   

I'm starting the process of vetting our software with MongoDB 1.8. I had been running MongoDB 1.6.0 on my Mac (which we also currently run in production). Today I installed MongoDB 1.8.0 and pointed it at the data files that had been created with 1.6.0. All of my tests passed and everything seemed fine.

Then I enabled journaling (--journal). Once I did this, one of my tests started failing with this message:

Failures:
1) SaveWave::Model::MongoDB::Upc should create
Failure/Error: @upc = Factory.create(:upc)
: _a != -1

  1. ./spec/integration/mongodb/upc_spec.rb:13:in `block (2 levels) in <top (required)>'

2) SaveWave::Model::MongoDB::Upc should update
Failure/Error: @upc = Factory.create(:upc)
: _a != -1

  1. ./spec/integration/mongodb/upc_spec.rb:13:in `block (2 levels) in <top (required)>'

3) SaveWave::Model::MongoDB::Upc should destroy
Failure/Error: @upc = Factory.create(:upc)
: _a != -1

  1. ./spec/integration/mongodb/upc_spec.rb:13:in `block (2 levels) in <top (required)>'

When I turn off journaling, the test passes. Line 13 of the test where it keeps failing is:

before :each do
@upc = Factory.create(:upc) # <-- Line 13
end

And the relevant FactoryGirl definition is:

Factory.define :upc, :class => SaveWave::Model::MongoDB::Upc do |upc|
upc.name "Coca Cola"
upc.active true
upc.start_date DateTime.civil(2010, 10, 1, 0, 0, 0)
upc.end_date DateTime.civil(2011, 9, 30, 11, 59, 59)
upc.identifier "8395837859183"
upc.excluded_states ["MA", "OR"]
end

Nothing obvious pops out here. Let me know if there's any more info I can provide to help debug this.



 Comments   
Comment by Hongli Lai [ 23/Mar/11 ]

Disabling journaling did not fix the problem. I had to drop the entire database.

Comment by Hongli Lai [ 23/Mar/11 ]

I just ran into the same problem as well. I was on 1.6, upgraded to 1.8 and enabled journaling, did a repairDatabase yesterday, and today all of a sudden insert operations start to fail. mongodb.log was full of these messages:

Wed Mar 23 00:26:58 [conn3] webshop.process_count_10 Assertion failure _a != -1 db/pdfile.h 427
0x5512de 0x562ce1 0x772f08 0x7796f1 0x77a9ca 0x755c9e 0x75b77c 0x8a617e 0x8b92d0 0x7f826972a8ba 0x7f8268ce602d
/usr/bin/mongod(_ZN5mongo12sayDbContextEPKc+0xae) [0x5512de]
/usr/bin/mongod(_ZN5mongo8assertedEPKcS1_j+0x111) [0x562ce1]
/usr/bin/mongod() [0x772f08]
/usr/bin/mongod(_ZN5mongo11DataFileMgr6insertEPKcPKvibRKNS_11BSONElementEb+0x10d1) [0x7796f1]
/usr/bin/mongod(_ZN5mongo11DataFileMgr16insertWithObjModEPKcRNS_7BSONObjEb+0x5a) [0x77a9ca]
/usr/bin/mongod(_ZN5mongo14receivedInsertERNS_7MessageERNS_5CurOpE+0x37e) [0x755c9e]
/usr/bin/mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_8SockAddrE+0x18ec) [0x75b77c]
/usr/bin/mongod(_ZN5mongo10connThreadEPNS_13MessagingPortE+0x21e) [0x8a617e]
/usr/bin/mongod(thread_proxy+0x80) [0x8b92d0]
/lib/libpthread.so.0(+0x68ba) [0x7f826972a8ba]
/lib/libc.so.6(clone+0x6d) [0x7f8268ce602d]
Wed Mar 23 00:26:58 [conn3] webshop.process_count_10 caught assertion _indexRecord webshop.process_count_10.$group_1_node_1_timestamp_1

Comment by Brian Ploetz [ 17/Mar/11 ]

The requested diaglog file

Comment by Brian Ploetz [ 17/Mar/11 ]

Most of the relevant bits of the unit test are noted above when I opened the ticket. It fails when it's trying to create the SaveWave::Model::MongoDB::Upc object in the before block. Also note the way it cleans up the database after the test suite is run by doing the following (drop each collection except the system collection):

Rspec.configure do |config|

  1. For MongoDB, since it's doesn't have transactions, we need to wipe the db ourselves
    config.after :suite do
  2. wipe the test db after the integration suite is run
    Mongoid.master.collections.select do |collection|
    collection.name !~ /system/
    end.each(&:drop)
    end
    end

As I mentioned in one of the follow ups, this error does not happen when I run this test in isolation, it only happens when I run the entire test suite. Given this, the diaglog.4d8282ce file I'm about to attach is 2.6MB, so I hope this is not like finding a needle in a haystack for you guys. I've tried to reproduce the issue running various combinations of a smaller subset of the tests with no luck. I'll keep trying though.....

The relevant collection where this is happening is named "products". Let me know if there's anything else you need.

Comment by Eliot Horowitz (Inactive) [ 17/Mar/11 ]

Ah, ok. Then no, a re-index doesn't make sense.
Anyway you can send us the unit test?
If not, can you start the server with --diaglog 3 and send us the diaglog.XXXX file it generated in <dbpath>

Comment by Brian Ploetz [ 17/Mar/11 ]

This is happening in a unit test of our Ruby code where the test database is empty at test start up, the data is created/updated/deleted as part of the test run, and then after the test each collection in the test database is dropped. So there is no pre-existing 1.6 era data in the database.

Given this, does a collection.reIndex() still make sense, and if so, at which point in the test workflow noted above should it happen?

Comment by Eliot Horowitz (Inactive) [ 17/Mar/11 ]

Can you do a .reIndex() on all the collections?
I think its likely because of a bug in 1.6 that could create invalid indexes

Comment by Kyle Banker [ 17/Mar/11 ]

Brian,

This looks like a potential core server issue, so I'm moving the issue from the Ruby driver to the core server.

Kyle

Comment by Brian Ploetz [ 17/Mar/11 ]

I see this in the MongoDB logs:

Thu Mar 17 16:12:44 [conn6] savewavemodel_test.products Assertion failure _a != -1 db/pdfile.h 427
0x10007d3c2 0x1000920be 0x1002e1af8 0x1002e2d1e 0x100187833 0x1002c300c 0x1002c7341 0x100436978 0x100449a54 0x7fff843f6536 0x7fff843f63e9
0 mongod 0x000000010007d3c2 _ZN5mongo12sayDbContextEPKc + 178
1 mongod 0x00000001000920be _ZN5mongo8assertedEPKcS1_j + 286
2 mongod 0x00000001002e1af8 _ZN5mongo14_unindexRecordERNS_12IndexDetailsERNS_7BSONObjERKNS_7DiskLocEb + 984
3 mongod 0x00000001002e2d1e _ZN5mongo11DataFileMgr12deleteRecordEPKcPNS_6RecordERKNS_7DiskLocEbb + 590
4 mongod 0x0000000100187833 _ZN5mongo13deleteObjectsEPKcNS_7BSONObjEbbbPNS_11RemoveSaverE + 1731
5 mongod 0x00000001002c300c _ZN5mongo14receivedDeleteERNS_7MessageERNS_5CurOpE + 1036
6 mongod 0x00000001002c7341 _ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_8SockAddrE + 5457
7 mongod 0x0000000100436978 _ZN5mongo10connThreadEPNS_13MessagingPortE + 584
8 mongod 0x0000000100449a54 thread_proxy + 132
9 libSystem.B.dylib 0x00007fff843f6536 _pthread_start + 331
10 libSystem.B.dylib 0x00007fff843f63e9 thread_start + 13
Thu Mar 17 16:12:44 [conn6] savewavemodel_test.products Assertion failure: _unindex failed savewavemodel_test.products.$_type_1
Thu Mar 17 16:12:44 [conn6] Assertion failure: _unindex failed: assertion db/pdfile.h:427
obj:

{ _id: ObjectId('4d826b3bf1bb3e111c00017b'), active: false, _type: "SaveWave::Model::MongoDB::Product", name: "Foo", start_date: new Date(1285905600000), end_date: new Date(1317398399000), identifier: "ABCDEF123456", created_at: new Date(1300392763000), updated_at: new Date(1300392763000) }

key:{ : "SaveWave::Model::MongoDB::Product" }
dl:0:62ef80
0x10007d3c2 0x1002e2109 0x1002e2d1e 0x100187833 0x1002c300c 0x1002c7341 0x100436978 0x100449a54 0x7fff843f6536 0x7fff843f63e9
0 mongod 0x000000010007d3c2 _ZN5mongo12sayDbContextEPKc + 178
1 mongod 0x00000001002e2109 _ZN5mongo14_unindexRecordERNS_12IndexDetailsERNS_7BSONObjERKNS_7DiskLocEb + 2537
2 mongod 0x00000001002e2d1e _ZN5mongo11DataFileMgr12deleteRecordEPKcPNS_6RecordERKNS_7DiskLocEbb + 590
3 mongod 0x0000000100187833 _ZN5mongo13deleteObjectsEPKcNS_7BSONObjEbbbPNS_11RemoveSaverE + 1731
4 mongod 0x00000001002c300c _ZN5mongo14receivedDeleteERNS_7MessageERNS_5CurOpE + 1036
5 mongod 0x00000001002c7341 _ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_8SockAddrE + 5457
6 mongod 0x0000000100436978 _ZN5mongo10connThreadEPNS_13MessagingPortE + 584
7 mongod 0x0000000100449a54 thread_proxy + 132
8 libSystem.B.dylib 0x00007fff843f6536 _pthread_start + 331
9 libSystem.B.dylib 0x00007fff843f63e9 thread_start + 13
Thu Mar 17 16:12:44 [conn6] unindex failed (key too big?) savewavemodel_test.products.$_type_1
savewavemodel_test.products Assertion failure _a != -1 db/pdfile.h 427
0x10007d3c2 0x1000920be 0x1002e1387 0x1002e238b 0x1002e755e 0x1002e932d 0x1002c51e7 0x1002c73f9 0x100436978 0x100449a54 0x7fff843f6536 0x7fff843f63e9
0 mongod 0x000000010007d3c2 _ZN5mongo12sayDbContextEPKc + 178
1 mongod 0x00000001000920be _ZN5mongo8assertedEPKcS1_j + 286
2 mongod 0x00000001002e1387 _ZN5mongo12_indexRecordEPNS_16NamespaceDetailsEiRNS_7BSONObjENS_7DiskLocEb + 1127
3 mongod 0x00000001002e238b _ZN5mongo11indexRecordEPNS_16NamespaceDetailsENS_7BSONObjENS_7DiskLocE + 139
4 mongod 0x00000001002e755e _ZN5mongo11DataFileMgr6insertEPKcPKvibRKNS_11BSONElementEb + 4062
5 mongod 0x00000001002e932d _ZN5mongo11DataFileMgr16insertWithObjModEPKcRNS_7BSONObjEb + 77
6 mongod 0x00000001002c51e7 _ZN5mongo14receivedInsertERNS_7MessageERNS_5CurOpE + 663
7 mongod 0x00000001002c73f9 _ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_8SockAddrE + 5641
8 mongod 0x0000000100436978 _ZN5mongo10connThreadEPNS_13MessagingPortE + 584
9 mongod 0x0000000100449a54 thread_proxy + 132
10 libSystem.B.dylib 0x00007fff843f6536 _pthread_start + 331
11 libSystem.B.dylib 0x00007fff843f63e9 thread_start + 13
Thu Mar 17 16:12:44 [conn6] savewavemodel_test.products caught assertion _indexRecord savewavemodel_test.products.$_type_1
Thu Mar 17 16:12:44 [conn6] savewavemodel_test.products Assertion failure _a != -1 db/pdfile.h 427
0x10007d3c2 0x1000920be 0x1002e1387 0x1002e238b 0x1002e755e 0x1002e932d 0x1002c51e7 0x1002c73f9 0x100436978 0x100449a54 0x7fff843f6536 0x7fff843f63e9
0 mongod 0x000000010007d3c2 _ZN5mongo12sayDbContextEPKc + 178
1 mongod 0x00000001000920be _ZN5mongo8assertedEPKcS1_j + 286
2 mongod 0x00000001002e1387 _ZN5mongo12_indexRecordEPNS_16NamespaceDetailsEiRNS_7BSONObjENS_7DiskLocEb + 1127
3 mongod 0x00000001002e238b _ZN5mongo11indexRecordEPNS_16NamespaceDetailsENS_7BSONObjENS_7DiskLocE + 139
4 mongod 0x00000001002e755e _ZN5mongo11DataFileMgr6insertEPKcPKvibRKNS_11BSONElementEb + 4062
5 mongod 0x00000001002e932d _ZN5mongo11DataFileMgr16insertWithObjModEPKcRNS_7BSONObjEb + 77
6 mongod 0x00000001002c51e7 _ZN5mongo14receivedInsertERNS_7MessageERNS_5CurOpE + 663
7 mongod 0x00000001002c73f9 _ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_8SockAddrE + 5641
8 mongod 0x0000000100436978 _ZN5mongo10connThreadEPNS_13MessagingPortE + 584
9 mongod 0x0000000100449a54 thread_proxy + 132
10 libSystem.B.dylib 0x00007fff843f6536 _pthread_start + 331
11 libSystem.B.dylib 0x00007fff843f63e9 thread_start + 13
Thu Mar 17 16:12:44 [conn6] savewavemodel_test.products caught assertion _indexRecord savewavemodel_test.products.$_type_1
Thu Mar 17 16:12:44 [conn6] savewavemodel_test.products Assertion failure _a != -1 db/pdfile.h 427
0x10007d3c2 0x1000920be 0x1002e1387 0x1002e238b 0x1002e755e 0x1002e932d 0x1002c51e7 0x1002c73f9 0x100436978 0x100449a54 0x7fff843f6536 0x7fff843f63e9
0 mongod 0x000000010007d3c2 _ZN5mongo12sayDbContextEPKc + 178
1 mongod 0x00000001000920be _ZN5mongo8assertedEPKcS1_j + 286
2 mongod 0x00000001002e1387 _ZN5mongo12_indexRecordEPNS_16NamespaceDetailsEiRNS_7BSONObjENS_7DiskLocEb + 1127
3 mongod 0x00000001002e238b _ZN5mongo11indexRecordEPNS_16NamespaceDetailsENS_7BSONObjENS_7DiskLocE + 139
4 mongod 0x00000001002e755e _ZN5mongo11DataFileMgr6insertEPKcPKvibRKNS_11BSONElementEb + 4062
5 mongod 0x00000001002e932d _ZN5mongo11DataFileMgr16insertWithObjModEPKcRNS_7BSONObjEb + 77
6 mongod 0x00000001002c51e7 _ZN5mongo14receivedInsertERNS_7MessageERNS_5CurOpE + 663
7 mongod 0x00000001002c73f9 _ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_8SockAddrE + 5641
8 mongod 0x0000000100436978 _ZN5mongo10connThreadEPNS_13MessagingPortE + 584
9 mongod 0x0000000100449a54 thread_proxy + 132
10 libSystem.B.dylib 0x00007fff843f6536 _pthread_start + 331
11 libSystem.B.dylib 0x00007fff843f63e9 thread_start + 13
Thu Mar 17 16:12:44 [conn6] savewavemodel_test.products caught assertion _indexRecord savewavemodel_test.products.$_type_1

What's interesting is that when I run this test in isolation, I don't see the error. It's only when I run the entire test suite that I see the error.

I should note that my mongoid.yml looks like this.....

defaults: &defaults
host: localhost
autocreate_indexes: true
allow_dynamic_fields: false
include_root_in_json: false
parameterize_keys: true
persist_in_safe_mode: true
raise_not_found_error: true
reconnect_time: 3

test:
<<: *defaults
database: savewavemodel_test

.....and my spec_helper.rb has this hook in it to wipe the db after each run:

  1. Load the mongoid config for the test database
    mongoid_file_name = File.expand_path('../mongoid.yml', _FILE_)
    @mongoid_settings = YAML.load(ERB.new(File.new(mongoid_file_name).read).result)
    @test_db = @mongoid_settings["test"]
    Mongoid.configure do |config|
    config.from_hash(@test_db)
    end
  1. Configure carrierwave
    CarrierWave.configure do |config|
    config.grid_fs_database = @test_db["database"]
    config.grid_fs_host = @test_db["test"]
    config.storage = :grid_fs
    end

Rspec.configure do |config|

  1. For MongoDB, since it's doesn't have transactions, we need to wipe the db ourselves
    config.after :suite do
  2. wipe the test db after the integration suite is run
    Mongoid.master.collections.select do |collection|
    collection.name !~ /system/
    end.each(&:drop)
    end
    end
Comment by Kyle Banker [ 17/Mar/11 ]

Anything fishy in the MongoDB logs? Can you extract the actual updates / deletes that these tests represent?

Generated at Thu Feb 08 03:01:11 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.