[SERVER-6702] assertion in repl5.js Created: 03/Aug/12 Updated: 11/Jul/16 Resolved: 08/Aug/12 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | 2.2.0-rc1 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Ian Whalen (Inactive) | Assignee: | Randolph Tan |
| Resolution: | Done | Votes: | 0 |
| Labels: | buildbot | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Operating System: | ALL |
| Participants: |
| Description |
|
http://buildlogs.mongodb.org/build/501b14b6d2a60f48f700058c/test/501b68fdd2a60f08c2000e6e/
|
| Comments |
| Comment by auto [ 08/Aug/12 ] | ||||||||||||||||||||||||||||||||||||||||
|
Author: {u'date': u'2012-08-07T14:55:34-07:00', u'email': u'randolph@10gen.com', u'name': u'Randolph Tan'}Message: Buildbot fix for Make sure to turn journaling on if we are going to kill a mongod process with SIGKILL with an expectation that it will be running fine after a restart on our tests. | ||||||||||||||||||||||||||||||||||||||||
| Comment by Randolph Tan [ 07/Aug/12 ] | ||||||||||||||||||||||||||||||||||||||||
|
Hasn't been able to reproduce this successfully, but here's a theory of what happened in the failed test. Before that, here's the normal case: 1. In DataFileMgr::insert(), we get a new record to write data into. This is initialized to have a size 0xEEEEEEEE.
3. Then we actually write the data to the record:
As you can see, we actually insert the record into the index before writing the actual data into it. So my theory is that the slave server in the test got killed exactly at the time between step 2 & 3 and just right after the index update has been flushed to disk. Mathias said that this shouldn't be an issue if the server was running with journaling on, but this specific failed test run was run with dur off. | ||||||||||||||||||||||||||||||||||||||||
| Comment by Randolph Tan [ 07/Aug/12 ] | ||||||||||||||||||||||||||||||||||||||||
|
Whenever a slave tries to apply inserts, it will execute it by doing an upsert. So this includes referring to _id index if the document exists and apply the update. Based on the stack trace, it seems that the index was pointing to a record that was mark as deleted, and this caused the invalid BSONObj size assertion. The test does not do any deletes, but when mongod allocates a new record, it initializes it to be a deleted record. So far, I don't see when the record will be left to be mark as deleted when inserting new documents other than if the collection is capped... | ||||||||||||||||||||||||||||||||||||||||
| Comment by Randolph Tan [ 06/Aug/12 ] | ||||||||||||||||||||||||||||||||||||||||
|
Disregard addr2line - addresses don't match. Additionally, the stack trace in the log is also displayed in the wrong order, the right order is (based on the header 0xb8f0b7 0xe59cfc 0xe335e2 0xe334ea 0x8aede9 0x8adbec 0x8ad906 0x8cf714 0xd1102c 0xa3fbcf 0xa4007b 0xa41fee 0xe42299 0xe1ca0a 0xe1dcfd 0xe20a4f 0xe228af 0xe22a58 0xe2304a 0xe234d1):
| ||||||||||||||||||||||||||||||||||||||||
| Comment by Randolph Tan [ 03/Aug/12 ] | ||||||||||||||||||||||||||||||||||||||||
|
addr2line (taken at build#1538, original log was from build#1539. Fortunately, ebc14a51ca3cfc0be7831eefb446860e22360e97 was the only change that was introduced in the new build)
| ||||||||||||||||||||||||||||||||||||||||
| Comment by Ian Whalen (Inactive) [ 03/Aug/12 ] | ||||||||||||||||||||||||||||||||||||||||
|
log has been attached. git revision so you can rebuild locally is 4aaed4ccfb608db56a269e0422f6d48385c28445 | ||||||||||||||||||||||||||||||||||||||||
| Comment by Tad Marshall [ 03/Aug/12 ] | ||||||||||||||||||||||||||||||||||||||||
|
I think this may be important ... I don't remember seeing Assertion: Ian or Randolph, can you save the log and the mongod executable in the Jira |