[SERVER-9613] Fatal Assertion on secondary after creating a new index on primary Created: 07/May/13  Updated: 16/Nov/21  Resolved: 14/Aug/13

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 2.4.3
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Samuel Clay Assignee: Andy Schwerin
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Ubuntu 12.10 64-bit


Operating System: ALL
Steps To Reproduce:

Add an index on primary, have it replicate to secondary.

Participants:

 Description   

I created a new index on primary and had it replicate to the secondaries. After the secondaries finished, 2/3 of them came up just fine. One machine did now and it gave a stack trace. The broken machine is db22.newsblur.com (on MMS), and all machines are running 2.4.3.

Sun May 5 03:47:43.376 [rsSync] replSet still syncing, not yet to minValid optime 5185cf69:1
Sun May 5 03:47:43.524 [rsSync] replSet SECONDARY
Sun May 5 03:47:43.525 [repl prefetch worker] newsblur.stories Assertion failure _a != -1 src/mongo/db/ops/../pdfile.h 607
Sun May 5 03:47:43.525 [repl prefetch worker] newsblur.stories Assertion failure _a != -1 src/mongo/db/ops/../pdfile.h 607
0xdcf361 0xd902bd 0x9cdf38 0x9d6dbf 0xb2c0b4 0xb2cf55 0xc1ca02 0xd9cc01 0xe17cb9 0x7f22bb81be9a 0x7f22bab2bcbd
Sun May 5 03:47:43.525 [repl prefetch worker] newsblur.stories Assertion failure _a != -1 src/mongo/db/ops/../pdfile.h 607
0xdcf3610xdcf361 0xd902bd 0xd902bd0x9cdf38 0x9cdf38 103172470x9d6dbf 0xb2c0b40xb2c0b4 0xb2cf55 0xb2cf550xc1ca02 0xd9cc01 0xe17cb90xc1ca02 0xd9cc010x7f22bb81be9a 0xe17cb90x7f22bab2bcbd 0x7f22bb81be9a
0x7f22bab2bcbd
Sun May 5 03:47:43.528 [repl prefetch worker] newsblur.stories Assertion failure _a != -1 src/mongo/db/ops/../pdfile.h 607
0xdcf361 0xd902bd 0x9cdf38 0x9d6dbf 0xb2c0b4 0xb2cf55 0xc1ca02 0xd9cc01 0xe17cb9 0x7f22bb81be9a 0x7f22bab2bcbd
/usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xdcf361]
/usr/bin/mongod(_ZN5mongo12verifyFailedEPKcS1_j+0xfd) [0xd902bd]
/usr/bin/mongod(_ZN5mongo18IndexInterfaceImplINS_12BtreeData_V1EE20beginInsertIntoIndexEiRNS_12IndexDetailsENS_7DiskLocERKNS_7BSONObjERKNS_8OrderingEb+0x118) [0x9cdf38]
/usr/bin/mongod(_ZN5mongo19fetchIndexInsertersERSt3setINS_7BSONObjENS_10BSONObjCmpESaIS1_EERNS_14IndexInterface13IndexInserterEPNS_16NamespaceDetailsEiRKS1_NS_7DiskLocEb+0x2cf) [0x9d6dbf]
/usr/bin/mongod(_ZN5mongo18prefetchIndexPagesEPNS_16NamespaceDetailsERKNS_7BSONObjE+0x724) [0xb2c0b4]
/usr/bin/mongod(_ZN5mongo28prefetchPagesForReplicatedOpERKNS_7BSONObjE+0x605) [0xb2cf55]
/usr/bin/mongod(_ZN5mongo7replset8SyncTail10prefetchOpERKNS_7BSONObjE+0x202) [0xc1ca02]
/usr/bin/mongod(_ZN5mongo10threadpool6Worker4loopEv+0x281) [0xd9cc01]
/usr/bin/mongod() [0xe17cb9]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a) [0x7f22bb81be9a]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f22bab2bcbd]
/usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xdcf361]
/usr/bin/mongod(_ZN5mongo12verifyFailedEPKcS1_j+0xfd) [0xd902bd]
/usr/bin/mongod(_ZN5mongo18IndexInterfaceImplINS_12BtreeData_V1EE20beginInsertIntoIndexEiRNS_12IndexDetailsENS_7DiskLocERKNS_7BSONObjERKNS_8OrderingEb+0x118) [0x9cdf38]
/usr/bin/mongod(_ZN5mongo19fetchIndexInsertersERSt3setINS_7BSONObjENS_10BSONObjCmpESaIS1_EERNS_14IndexInterface13IndexInserterEPNS_16NamespaceDetailsEiRKS1_NS_7DiskLocEb+0x2cf) [0x9d6dbf]
/usr/bin/mongod(_ZN5mongo18prefetchIndexPagesEPNS_16NamespaceDetailsERKNS_7BSONObjE+0x724) [0xb2c0b4]
/usr/bin/mongod(_ZN5mongo28prefetchPagesForReplicatedOpERKNS_7BSONObjE+0x605) [0xb2cf55]
/usr/bin/mongod(_ZN5mongo7replset8SyncTail10prefetchOpERKNS_7BSONObjE+0x202) [0xc1ca02]
/usr/bin/mongod(_ZN5mongo10threadpool6Worker4loopEv+0x281) [0xd9cc01]
/usr/bin/mongod() [0xe17cb9]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a) [0x7f22bb81be9a]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f22bab2bcbd]
/usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xdcf361]
/usr/bin/mongod(_ZN5mongo12verifyFailedEPKcS1_j+0xfd) [0xd902bd]
/usr/bin/mongod(_ZN5mongo18IndexInterfaceImplINS_12BtreeData_V1EE20beginInsertIntoIndexEiRNS_12IndexDetailsENS_7DiskLocERKNS_7BSONObjERKNS_8OrderingEb+0x118) [0x9cdf38]
/usr/bin/mongod(_ZN5mongo19fetchIndexInsertersERSt3setINS_7BSONObjENS_10BSONObjCmpESaIS1_EERNS_14IndexInterface13IndexInserterEPNS_16NamespaceDetailsEiRKS1_NS_7DiskLocEb+0x2cf) [0x9d6dbf]
/usr/bin/mongod(_ZN5mongo18prefetchIndexPagesEPNS_16NamespaceDetailsERKNS_7BSONObjE+0x724) [0xb2c0b4]
/usr/bin/mongod(_ZN5mongo28prefetchPagesForReplicatedOpERKNS_7BSONObjE+0x605) [0xb2cf55]
/usr/bin/mongod(_ZN5mongo7replset8SyncTail10prefetchOpERKNS_7BSONObjE+0x202) [0xc1ca02]
/usr/bin/mongod(_ZN5mongo10threadpool6Worker4loopEv+0x281) [0xd9cc01]
/usr/bin/mongod() [0xe17cb9]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a) [0x7f22bb81be9a]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f22bab2bcbd]
/usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xdcf361]
/usr/bin/mongod(_ZN5mongo12verifyFailedEPKcS1_j+0xfd) [0xd902bd]
/usr/bin/mongod(_ZN5mongo18IndexInterfaceImplINS_12BtreeData_V1EE20beginInsertIntoIndexEiRNS_12IndexDetailsENS_7DiskLocERKNS_7BSONObjERKNS_8OrderingEb+0x118) [0x9cdf38]
/usr/bin/mongod(_ZN5mongo19fetchIndexInsertersERSt3setINS_7BSONObjENS_10BSONObjCmpESaIS1_EERNS_14IndexInterface13IndexInserterEPNS_16NamespaceDetailsEiRKS1_NS_7DiskLocEb+0x2cf) [0x9d6dbf]
/usr/bin/mongod(_ZN5mongo18prefetchIndexPagesEPNS_16NamespaceDetailsERKNS_7BSONObjE+0x724) [0xb2c0b4]
/usr/bin/mongod(_ZN5mongo28prefetchPagesForReplicatedOpERKNS_7BSONObjE+0x605) [0xb2cf55]
/usr/bin/mongod(_ZN5mongo7replset8SyncTail10prefetchOpERKNS_7BSONObjE+0x202) [0xc1ca02]
/usr/bin/mongod(_ZN5mongo10threadpool6Worker4loopEv+0x281) [0xd9cc01]
/usr/bin/mongod() [0xe17cb9]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a) [0x7f22bb81be9a]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f22bab2bcbd]
Sun May 5 03:47:43.552 [repl writer worker 1] newsblur.stories Assertion failure _a != -1 src/mongo/db/ops/../pdfile.h 607
0xdcf361 0xd902bd 0x9cdf38 0x9d6dbf 0x9d8c22 0xac0009 0xac130f 0xa8b72a 0xa8d787 0xa6956a 0xc1fb13 0xc1f258 0xd9cc01 0xe17cb9 0x7f22bb81be9a 0x7f22bab2bcbd
/usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xdcf361]
/usr/bin/mongod(_ZN5mongo12verifyFailedEPKcS1_j+0xfd) [0xd902bd]
/usr/bin/mongod(_ZN5mongo18IndexInterfaceImplINS_12BtreeData_V1EE20beginInsertIntoIndexEiRNS_12IndexDetailsENS_7DiskLocERKNS_7BSONObjERKNS_8OrderingEb+0x118) [0x9cdf38]
/usr/bin/mongod(_ZN5mongo19fetchIndexInsertersERSt3setINS_7BSONObjENS_10BSONObjCmpESaIS1_EERNS_14IndexInterface13IndexInserterEPNS_16NamespaceDetailsEiRKS1_NS_7DiskLocEb+0x2cf) [0x9d6dbf]
/usr/bin/mongod(_ZN5mongo24indexRecordUsingTwoStepsEPKcPNS_16NamespaceDetailsENS_7BSONObjENS_7DiskLocEb+0x2d2) [0x9d8c22]
/usr/bin/mongod(_ZN5mongo11DataFileMgr6insertEPKcPKvibbbPb+0x1239) [0xac0009]
/usr/bin/mongod(_ZN5mongo11DataFileMgr16insertWithObjModEPKcRNS_7BSONObjEbb+0x4f) [0xac130f]
/usr/bin/mongod(_ZN5mongo14_updateObjectsEbPKcRKNS_7BSONObjES4_bbbRNS_7OpDebugEPNS_11RemoveSaverEbRKNS_24QueryPlanSelectionPolicyEb+0x2efa) [0xa8b72a]
/usr/bin/mongod(_ZN5mongo27updateObjectsForReplicationEPKcRKNS_7BSONObjES4_bbbRNS_7OpDebugEbRKNS_24QueryPlanSelectionPolicyE+0xb7) [0xa8d787]
/usr/bin/mongod(_ZN5mongo21applyOperation_inlockERKNS_7BSONObjEbb+0x65a) [0xa6956a]
/usr/bin/mongod(_ZN5mongo7replset8SyncTail9syncApplyERKNS_7BSONObjEb+0x713) [0xc1fb13]
/usr/bin/mongod(_ZN5mongo7replset14multiSyncApplyERKSt6vectorINS_7BSONObjESaIS2_EEPNS0_8SyncTailE+0x48) [0xc1f258]
/usr/bin/mongod(_ZN5mongo10threadpool6Worker4loopEv+0x281) [0xd9cc01]
/usr/bin/mongod() [0xe17cb9]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a) [0x7f22bb81be9a]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f22bab2bcbd]
Sun May 5 03:47:43.562 [repl writer worker 1] ERROR: writer worker caught exception: assertion src/mongo/db/ops/../pdfile.h:607 on: { ts: Timestamp 1367723893000|1, h: 256075113332974832, v: 2, op: "i", ns: "newsblur.stories", o: { _id: ObjectId('5185cf75409b612f1631dce1'), share_user_ids: {}, story_author_name: "", story_date: new Date(1367723893192), story_tags: {}, story_hash: "6:388392", story_feed_id: 6, story_permalink: "http://lcamtuf.blogspot.com/2013/05/some-harmless-old-fashioned-fun-with-css.html", story_title: "Some harmless, old-fashioned fun with CSS", comment_user_ids: {}, story_content_z: BinData, story_guid: "http://lcamtuf.blogspot.com/2013/05/some-harmless-old-fashioned-fun-with-css.html" } }
Sun May 5 03:47:43.562 [repl writer worker 1] Fatal Assertion 16360
0xdcf361 0xd8f0d3 0xc1f33c 0xd9cc01 0xe17cb9 0x7f22bb81be9a 0x7f22bab2bcbd
/usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xdcf361]
/usr/bin/mongod(_ZN5mongo13fassertFailedEi+0xa3) [0xd8f0d3]
/usr/bin/mongod(_ZN5mongo7replset14multiSyncApplyERKSt6vectorINS_7BSONObjESaIS2_EEPNS0_8SyncTailE+0x12c) [0xc1f33c]
/usr/bin/mongod(_ZN5mongo10threadpool6Worker4loopEv+0x281) [0xd9cc01]
/usr/bin/mongod() [0xe17cb9]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a) [0x7f22bb81be9a]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f22bab2bcbd]
Sun May 5 03:47:43.566 [repl writer worker 1]
***aborting after fassert() failure
Sun May 5 03:47:43.566 Got signal: 6 (Aborted).
Sun May 5 03:47:43.568 Backtrace:
0xdcf361 0x6cf729 0x7f22baa6e4a0 0x7f22baa6e425 0x7f22baa71b8b 0xd8f10e 0xc1f33c 0xd9cc01 0xe17cb9 0x7f22bb81be9a 0x7f22bab2bcbd
/usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xdcf361]
/usr/bin/mongod(_ZN5mongo10abruptQuitEi+0x399) [0x6cf729]
/lib/x86_64-linux-gnu/libc.so.6(+0x364a0) [0x7f22baa6e4a0]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35) [0x7f22baa6e425]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x17b) [0x7f22baa71b8b]
/usr/bin/mongod(_ZN5mongo13fassertFailedEi+0xde) [0xd8f10e]
/usr/bin/mongod(_ZN5mongo7replset14multiSyncApplyERKSt6vectorINS_7BSONObjESaIS2_EEPNS0_8SyncTailE+0x12c) [0xc1f33c]
/usr/bin/mongod(_ZN5mongo10threadpool6Worker4loopEv+0x281) [0xd9cc01]
/usr/bin/mongod() [0xe17cb9]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a) [0x7f22bb81be9a]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f22bab2bcbd]



 Comments   
Comment by Daniel Pasette (Inactive) [ 14/Aug/13 ]

Closing this as no further information received.

Comment by Andy Schwerin [ 09/May/13 ]

This looks a lot like corruption of the catalog data file on the secondary. The catalog data structure for the collection being prefetched (called the NamespaceDetails object) contains what is conceptually an array of index description objects, and a count of how many are valid. All of the valid ones occupy index positions smaller than that count, and no invalid ones should have an index position smaller than the count. However, in this case, one of the entries that should be valid is not.

I would recommend wiping this secondary and resyncing it, but first check dmesg or the system logs for reports of disk or file system problems.

Generated at Thu Feb 08 03:20:57 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.