[SERVER-10764] Invalid access at address: 0x7fa087553e30 from thread: repl writer worker x Created: 13/Sep/13  Updated: 16/Nov/21  Resolved: 17/Sep/13

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 2.4.6
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Steve Rabouin Assignee: Unassigned
Resolution: Done Votes: 0
Labels: crash, replicaset
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

CentOS 6 64bit
Dual Quad-Core Xeon
72GB Ram


Operating System: Linux
Steps To Reproduce:

Unsure why it happens.

Participants:

 Description   

Running the following setup:

  • Primary server: 72GB Ram, dual quad-core xeon.
  • Replica 1: 72GB Ram, dual quad-core xeon.
  • Replica 2: 16GB Ram, quad-core xeon.
  • Replica 3: 16GB Ram, quad-core xeon. (Remote location)
  • Arbiter

All servers are on CentOS 6 64bit. Primary, backup and one of the remote

Primary server crashed a few weeks ago, backup became primary. I didn't really look into it, just removed the member, deleted the files, and reset it since it was down for a long time i figured it'd be faster.

Today, it did the same thing again. When I restart it, it gives the same error output.

Fri Sep 13 07:03:44.555 Invalid access at address: 0x7fad73cd0e30 from thread: repl writer worker 10
Fri Sep 13 07:03:44.536 [conn74] end connection A.B.C.D:57698 (87 connections now open)
Fri Sep 13 07:03:44.669 Got signal: 7 (Bus error).
Fri Sep 13 07:03:45.000 Backtrace:
0xdddd81 0x6d0d29 0x6d12b2 0x36c860f500 0xa61281 0xa61a32 0xac44f3 0xac58df 0xa903fa 0xa924c7 0xa72449 0xc273d3 0xc26b18 0xdab721 0xe26609 0x36c8607851 0x36c82e890d
/usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xdddd81]
/usr/bin/mongod(_ZN5mongo10abruptQuitEi+0x399) [0x6d0d29]
/usr/bin/mongod(_ZN5mongo24abruptQuitWithAddrSignalEiP7siginfoPv+0x262) [0x6d12b2]
/lib64/libpthread.so.0() [0x36c860f500]
/usr/bin/mongod(ZN5mongo16NamespaceDetails10_stdAllocEib+0x641) [0xa61281]
/usr/bin/mongod(_ZN5mongo16NamespaceDetails13allocWillBeAtEPKci+0x32) [0xa61a32]
/usr/bin/mongod(_ZN5mongo11DataFileMgr6insertEPKcPKvibbbPb+0x1153) [0xac44f3]
/usr/bin/mongod(_ZN5mongo11DataFileMgr16insertWithObjModEPKcRNS_7BSONObjEbb+0x4f) [0xac58df]
/usr/bin/mongod(_ZN5mongo14_updateObjectsEbPKcRKNS_7BSONObjES4_bbbRNS_7OpDebugEPNS_11RemoveSaverEbRKNS_24QueryPlanSelectionPolicyEb+0x2eda) [0xa903fa]
/usr/bin/mongod(_ZN5mongo27updateObjectsForReplicationEPKcRKNS_7BSONObjES4_bbbRNS_7OpDebugEbRKNS_24QueryPlanSelectionPolicyE+0xb7) [0xa924c7]
/usr/bin/mongod(_ZN5mongo21applyOperation_inlockERKNS_7BSONObjEbb+0xb39) [0xa72449]
/usr/bin/mongod(_ZN5mongo7replset8SyncTail9syncApplyERKNS_7BSONObjEb+0x713) [0xc273d3]
/usr/bin/mongod(_ZN5mongo7replset14multiSyncApplyERKSt6vectorINS_7BSONObjESaIS2_EEPNS0_8SyncTailE+0x48) [0xc26b18]
/usr/bin/mongod(_ZN5mongo10threadpool6Worker4loopEv+0x281) [0xdab721]
/usr/bin/mongod() [0xe26609]
/lib64/libpthread.so.0() [0x36c8607851]
/lib64/libc.so.6(clone+0x6d) [0x36c82e890d]

Fri Sep 13 07:33:20.945 Invalid access at address: 0x7fa087553e30 from thread: repl writer worker 2
Fri Sep 13 07:33:20.972 Got signal: 7 (Bus error).
Fri Sep 13 07:33:20.979 [initandlisten] connection accepted from 192.168.1.5:57728 #169 (109 connections now open)
Fri Sep 13 07:33:21.023 [rsHealthPoll] replset info A.B.C.D:27017 thinks that we are down
Fri Sep 13 07:33:20.980 Backtrace:
0xdddd81 0x6d0d29 0x6d12b2 0x36c860f500 0xa61281 0xa61a32 0xac44f3 0xac58df 0xa903fa 0xa924c7 0xa72449 0xc273d3 0xc26b18 0xdab721 0xe26609 0x36c8607851 0x36c82e890d
/usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xdddd81]
/usr/bin/mongod(_ZN5mongo10abruptQuitEi+0x399) [0x6d0d29]
/usr/bin/mongod(_ZN5mongo24abruptQuitWithAddrSignalEiP7siginfoPv+0x262) [0x6d12b2]
/lib64/libpthread.so.0() [0x36c860f500]
/usr/bin/mongod(ZN5mongo16NamespaceDetails10_stdAllocEib+0x641) [0xa61281]
/usr/bin/mongod(_ZN5mongo16NamespaceDetails13allocWillBeAtEPKci+0x32) [0xa61a32]
/usr/bin/mongod(_ZN5mongo11DataFileMgr6insertEPKcPKvibbbPb+0x1153) [0xac44f3]
/usr/bin/mongod(_ZN5mongo11DataFileMgr16insertWithObjModEPKcRNS_7BSONObjEbb+0x4f) [0xac58df]
/usr/bin/mongod(_ZN5mongo14_updateObjectsEbPKcRKNS_7BSONObjES4_bbbRNS_7OpDebugEPNS_11RemoveSaverEbRKNS_24QueryPlanSelectionPolicyEb+0x2eda) [0xa903fa]
/usr/bin/mongod(_ZN5mongo27updateObjectsForReplicationEPKcRKNS_7BSONObjES4_bbbRNS_7OpDebugEbRKNS_24QueryPlanSelectionPolicyE+0xb7) [0xa924c7]
/usr/bin/mongod(_ZN5mongo21applyOperation_inlockERKNS_7BSONObjEbb+0xb39) [0xa72449]
/usr/bin/mongod(_ZN5mongo7replset8SyncTail9syncApplyERKNS_7BSONObjEb+0x713) [0xc273d3]
/usr/bin/mongod(_ZN5mongo7replset14multiSyncApplyERKSt6vectorINS_7BSONObjESaIS2_EEPNS0_8SyncTailE+0x48) [0xc26b18]
/usr/bin/mongod(_ZN5mongo10threadpool6Worker4loopEv+0x281) [0xdab721]
/usr/bin/mongod() [0xe26609]
/lib64/libpthread.so.0() [0x36c8607851]
/lib64/libc.so.6(clone+0x6d) [0x36c82e890d]

DB stats:
{
"collections" : 72,
"objects" : 44619073,
"avgObjSize" : 2480.755786297936,
"dataSize" : 110689023524,
"storageSize" : 124300971392,
"numExtents" : 494,
"indexes" : 106,
"indexSize" : 5254527152,
"fileSize" : 135159349248,
"nsSizeMB" : 16,
"dataFileVersion" :

{ "major" : 4, "minor" : 5 }

,
"ok" : 1
}



 Comments   
Comment by Daniel Pasette (Inactive) [ 17/Sep/13 ]

Happy to hear you've worked around the issue.

Comment by Steve Rabouin [ 16/Sep/13 ]

Hi Dan, thanks for responding.

I didn't think it was a hard drive issue, since the second crash occurred over a week after the first one. But you're right – after looking at the syslog there are a bunch of errors. I should've looked at this myself prior to opening a ticket. :O I'm quite happy I chose mongo, these replica servers are so quick and easy to setup!

Thanks again.

Comment by Daniel Pasette (Inactive) [ 16/Sep/13 ]

To clarify the timeline:

  • the primary instance crashed a few weeks ago
  • you cleared the data directory and let it resync as a secondary
  • the same instance crashed again with the same error
  • when you try to restart it, it fails with the same message

Have you checked the syslog to see if there are any hardware messages indicating storage failures?

Generated at Thu Feb 08 03:24:00 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.