[SERVER-4950] Assertion on BSON size (0xEEEEEEEE) Created: 13/Feb/12  Updated: 30/Mar/12  Resolved: 16/Feb/12

Status: Closed
Project: Core Server
Component/s: Internal Code
Affects Version/s: 2.0.2
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Yosef Dinerstein Assignee: Ben Becker
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Windows, Azure


Issue Links:
Depends
Operating System: Windows
Participants:

 Description   

Shard server log repeatedly shows assertion on invalid object size. Looks like corruption:

Mon Feb 13 07:15:55 [rsSync] replSet syncing to: mongo1:27000
Mon Feb 13 07:15:56 [rsSync] Assertion: 10334:Invalid BSONObj size: -286331154 (0xEEEEEEEE) first element: EOO
Mon Feb 13 07:15:56 [rsSync] replSet syncTail: 10334 Invalid BSONObj size: -286331154 (0xEEEEEEEE) first element: EOO, syncing: { ts: Timestamp 1329088027000|19, h: -1120214476444430649, op: "i", ns: "anodelogs.log", o: { timestamp: new Date(1329088027621), level: "info", message: "Sending keep alive request to app "perfmonitor.saary" via http://10.28.148.177:30000 RW=yosefd2/__apps/saary/perfmonitor/.shimmed.v2.index.js", meta:

{ time: new Date(1329088027543), app: "ka.sys", domain: "anode-yosefd2.cloudapp.net", instanceId: "anodejsrole_IN_1", requestID: null, stackTop: "c:\an\repo\yosefd2\sys\ka\index.js:31:25", args: [ "Sending keep alive request to app "perfmonitor.saary" via http://10.28.148.177:30000", "RW=yosefd2/__apps/saary/perfmonitor/.shimmed.v2.index.js" ] }

, _id: ObjectId('4f38461bbf5d8f040700b119') } }

This repeats every 30 seconds and mentions the same object.



 Comments   
Comment by Ben Becker [ 16/Feb/12 ]

@Viktor, thanks for the details. It sounds like the issue is a bit different from what was initially reported, although the symptoms are certainly similar. I will keep an eye out for similar issues, but if you are ever able to reproduce this issue, please create a new issue or reopen this one.

Comment by Viktor Petersson [ 16/Feb/12 ]

@Ben, no it was installed from scratch and we added it into the replica set. Once it was up to speed as a secondary, we promoted it to primary (by bumping up its priority).

I resolved the issue for now by simply re-syncing the FreeBSD-box from the primary. The FreeBSD-box is currently in secondary mode and is working.

Comment by Ben Becker [ 16/Feb/12 ]

Apologies for the confusion.

@Viktor, when the machine was migrated from FreeBSD to Ubuntu, was the filesystem kept in tact, or did it change from something like UFS to Ext3/4/XFS/etc.?

Comment by Viktor Petersson [ 16/Feb/12 ]

Ben,

Can you check for any other assertions originating from the '[rsSync]' thread and paste them (including the backtrace)?

See log-snippet below.

Anything from the primary's log file around the same time?

On the primary (Ubuntu):

 
Wed Feb 15 13:05:35 [rsHealthPoll] DBClientCursor::init call() failed
Wed Feb 15 13:05:35 [rsHealthPoll] replSet info freebsd:27017 is down (or slow to respond): DBClientBase::findN: transport error: freebsd:27017 query: { replSetHeartbeat: "myreplset", v: 28, pv: 1, checkEmpty: false, from: "ubuntu:27017" }

On the secondary (FreeBSD):

 
Wed Feb 15 13:04:30 [rsSync] Assertion: 10334:Invalid BSONObj size: -286331154 (0xEEEEEEEE) first element: _id: ObjectId('4f3af4e47573df1109000394')
Wed Feb 15 13:04:30 [rsSync] replSet syncTail: 10334 Invalid BSONObj size: -286331154 (0xEEEEEEEE) first element: _id: ObjectId('4f3af4e47573df1109000394'), syncing: { ts: Timestamp 1329306877000|630, h: 7695009082165314552, op: "d", ns: "my_storage.fs.chunks", b: true, o: { _id: ObjectId('4f3b465f3dc585094a000264') } }
Wed Feb 15 13:04:47 [conn1811] end connection 10.0.0.14:43748
Wed Feb 15 13:04:47 [initandlisten] connection accepted from 10.0.0.14:43750 #1813
Wed Feb 15 13:04:53 [conn1812] end connection 10.0.0.17:45444
Wed Feb 15 13:04:53 [initandlisten] connection accepted from 10.0.0.17:23135 #1814
Wed Feb 15 13:05:01 [rsSync] replSet syncing to: ubuntu:27017
Wed Feb 15 13:05:01 [rsSync] Assertion: 10334:Invalid BSONObj size: -286331154 (0xEEEEEEEE) first element: _id: ObjectId('4f3af4e47573df1109000394')
Wed Feb 15 13:05:01 [rsSync] replSet syncTail: 10334 Invalid BSONObj size: -286331154 (0xEEEEEEEE) first element: _id: ObjectId('4f3af4e47573df1109000394'), syncing: { ts: Timestamp 1329306877000|630, h: 7695009082165314552, op: "d", ns: "my_storage.fs.chunks", b: true, o: { _id: ObjectId('4f3b465f3dc585094a000264') } }

How was the data transferred from the FreeBSD primary to the Ubuntu primary?

Not exactly sure what you mean, but the data is transferred over a private LAN, and the servers are all running in a public cloud.

On a related note, I did a few benchmarks and compared the performance on MongoDB/GridFS on CentOS, FreeBSD and Ubuntu and posted them here. This was the primary reason why we switched away from FreeBSD on the Primary.

Comment by Yosef Dinerstein [ 16/Feb/12 ]

In the logs there are no other assertions on the same mongo server in general and from rsSync thread in particular.
The assertion mentioned, indeed, comes from replica server.

The primary server for the same replica set shows the following assertion:

Thu Feb 16 11:10:40 [conn2411] local.oplog.rs warning assertion failure !e->firstRecord.isNull() db\oplog.cpp 478

The last one was not always accompanying the 1st one, but recently it started to appear as well.

A few more details about the data:

  • the object mentioned in the assertion is something we write into capped collection.
  • capped collection size is 5GB and it reached maximum about a week before assertion started to appear.
  • we only insert objects into this capped collection and never remove.

The question about FreeBSD->UBuntu was to Victor.

Comment by Ben Becker [ 15/Feb/12 ]

The error message 'Invalid BSONObj size: -286331154 (0xEEEEEEEE)' generally indicates a record has either been deleted or read/created from an invalid location.

@Yosef: Can you check for any other assertions originating from the '[rsSync]' thread and paste them (including the backtrace)? Anything from the primary's log file around the same time? How was the data transferred from the FreeBSD primary to the Ubuntu primary?

@Viktor: Regarding your second post, is that the only assertion failure you've seen on the config server? Could you also check the primary's log around Mon Feb 13 07:15:55 and attach a bit more surrounding those entries in the log files?

Comment by Viktor Petersson [ 15/Feb/12 ]

I can confirm a similar error. My setup is an Ubuntu 10.04 server with MongoDB 2.0.2 from 10gen as primary and a FreeBSD 9.0 server with MongoDB 2.0.2 from Ports as secondary.

We just migrated from using FreeBSD on the primary to Ubuntu, and that is when this started occurring.

Wed Feb 15 13:04:30 [rsSync] Assertion: 10334:Invalid BSONObj size: -286331154 (0xEEEEEEEE) first element: _id: ObjectId('4f3af4e47573df1109000394')
Wed Feb 15 13:04:30 [rsSync] replSet syncTail: 10334 Invalid BSONObj size: -286331154 (0xEEEEEEEE) first element: _id: ObjectId('4f3af4e47573df1109000394'), syncing: { ts: Timestamp 1329306877000|630, h: 7695009082165314552, op: "d", ns: "foobar.fs.chunks", b: true, o:

{ _id: ObjectId('4f3b465f3dc585094a000264') }

}

Restarting MongoDB on the secondary server did not do the trick either.

Comment by Yosef Dinerstein [ 13/Feb/12 ]

Restarting servers does not eliminate the assertion.

Comment by Yosef Dinerstein [ 13/Feb/12 ]

Similar assertion occurred once on configuration server:

Mon Jan 30 19:20:35 [conn1008] Assertion: 10334:Invalid BSONObj size: -286331154 (0xEEEEEEEE) first element: _id: "RD00155D328113:26997:1327514447:41"
Mon Jan 30 19:20:35 [conn1008] remove config.lockpings query: { _id:

{ $nin: [ "RD00155D32841E:26998:1327912673:41", "RD00155D3287B8:27000:1324403258:6334" ] }

, ping:

{ $lt: new Date(1327605634446) }

} exception: Invalid BSONObj size: -286331154 (0xEEEEEEEE) first element: _id: "RD00155D328113:26997:1327514447:41" code:10334 0ms

Generated at Thu Feb 08 03:07:26 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.