[SERVER-4055] Assertion error on compacting a non existent collection Created: 11/Oct/11  Updated: 11/Jul/16  Resolved: 27/Oct/11

Status: Closed
Project: Core Server
Component/s: Index Maintenance, Storage
Affects Version/s: 2.0.0
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Marc Gràcia Assignee: Brandon Diamond
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

mongo 2.0.0 / 64bits on CentOS 5.5. 2 shard 3 nodes per shard. 1 delayed node per shard. 1 additional arbiter per shard


Issue Links:
Duplicate
is duplicated by SERVER-3792 some compact masserts should be uasserts Closed
Operating System: Linux
Participants:

 Description   

In the process of upgrading to v2.0.0 i were compacting collections on one of one shard secondaries.
I accidentally issued a 'db.calendar.runCommand("compact")' on the default test database.
The shell changed prompt from secondary to recovering, and stood there.

Looking on the logs I saw this:

Tue Oct 11 17:35:19 [conn44] replSet going into maintenance mode (0 other tasks)
Tue Oct 11 17:35:19 [conn44] replSet RECOVERING
Tue Oct 11 17:35:19 [conn44] Assertion: 13660:namespace test.calendar does not exist
0x587512 0xa9be43 0xa9c647 0x973b49 0x97512f 0x95d725 0x9607b4 0x87e037 0x88485c 0xa96a46 0x635dd7 0x30e920673d 0x30e8ad3f6d
/opt/mongo/bin/mongod(_ZN5mongo11msgassertedEiPKc+0x112) [0x587512]
/opt/mongo/bin/mongod(_ZN5mongo7compactERKSsRSsbRNS_14BSONObjBuilderE+0x603) [0xa9be43]
/opt/mongo/bin/mongod(_ZN5mongo10CompactCmd3runERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x267) [0xa9c647]
/opt/mongo/bin/mongod(_ZN5mongo11execCommandEPNS_7CommandERNS_6ClientEiPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0x6a9) [0x973b49]
/opt/mongo/bin/mongod(_ZN5mongo12_runCommandsEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x6ff) [0x97512f]
/opt/mongo/bin/mongod(_ZN5mongo11runCommandsEPKcRNS_7BSONObjERNS_5CurOpERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x35) [0x95d725]
/opt/mongo/bin/mongod(ZN5mongo8runQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1+0xee4) [0x9607b4]
/opt/mongo/bin/mongod [0x87e037]
/opt/mongo/bin/mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0x55c) [0x88485c]
/opt/mongo/bin/mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0x76) [0xa96a46]
/opt/mongo/bin/mongod(_ZN5mongo3pms9threadRunEPNS_13MessagingPortE+0x287) [0x635dd7]
/lib64/libpthread.so.0 [0x30e920673d]
/lib64/libc.so.6(clone+0x6d) [0x30e8ad3f6d]
Tue Oct 11 17:35:19 [conn35] assertion 13436 not master or secondary, can't read ns:crawler4.ad query:

{ SITE_ID: 403064, URL_UNIQUE: "11565" }

Tue Oct 11 17:35:19 [conn35] ntoskip:0 ntoreturn:-1
Tue Oct 11 17:35:19 [conn35] end connection 83.149.71.144:46588
Tue Oct 11 17:35:19 [initandlisten] connection accepted from 83.149.71.144:57778 #45
Tue Oct 11 17:35:19 [conn45] assertion 13436 not master or secondary, can't read ns:crawler4.ad query:

{ SITE_ID: 403064, URL_UNIQUE: "11821" }

Tue Oct 11 17:35:19 [conn45] ntoskip:0 ntoreturn:-1
Tue Oct 11 17:35:19 [conn45] end connection 83.149.71.144:57778
Tue Oct 11 17:35:19 [conn28] end connection 83.149.71.144:46581
Tue Oct 11 17:35:19 [conn30] assertion 13436 not master or secondary, can't read ns:crawler4.ad query:

{ SITE_ID: 403064, URL_UNIQUE: "11566" }

Tue Oct 11 17:35:19 [conn30] ntoskip:0 ntoreturn:-1
Tue Oct 11 17:35:19 [conn30] end connection 83.149.71.144:46583
Tue Oct 11 17:35:19 [initandlisten] connection accepted from 83.149.71.144:57783 #46
Tue Oct 11 17:35:19 [conn29] assertion 13436 not master or secondary, can't read ns:crawler4.ad query:

{ SITE_ID: 553, URL_UNIQUE: "ref/2463-052A/pr/15" }

Tue Oct 11 17:35:19 [conn29] ntoskip:0 ntoreturn:-1
Tue Oct 11 17:35:19 [conn34] assertion 13436 not master or secondary, can't read ns:crawler4.ad query:

{ SITE_ID: 595, URL_UNIQUE: "VP0000003247136" }

Tue Oct 11 17:35:19 [conn34] ntoskip:0 ntoreturn:-1
Tue Oct 11 17:35:19 [conn31] assertion 13436 not master or secondary, can't read ns:crawler4.ad query:

{ SITE_ID: 1861, URL_UNIQUE: "10435611" }

Tue Oct 11 17:35:19 [conn31] ntoskip:0 ntoreturn:-1
Tue Oct 11 17:35:19 [conn46] assertion 13436 not master or secondary, can't read ns:crawler4.ad query:

{ SITE_ID: 588, URL_UNIQUE: "2369189" }

Tue Oct 11 17:35:19 [conn46] ntoskip:0 ntoreturn:-1
Tue Oct 11 17:35:19 [conn46] end connection 83.149.71.144:57783
Tue Oct 11 17:35:19 [conn31] end connection 83.149.71.144:46584
Tue Oct 11 17:35:19 [conn29] end connection 83.149.71.144:46582
Tue Oct 11 17:35:19 [conn34] end connection 83.149.71.144:46587
Tue Oct 11 17:35:19 [conn36] end connection 83.149.71.144:46589
Tue Oct 11 17:35:19 [conn27] end connection 83.149.71.144:46580
Tue Oct 11 17:35:19 [conn33] end connection 83.149.71.144:46586
Tue Oct 11 17:35:19 [conn32] assertion 13436 not master or secondary, can't read ns:crawler4.ad query:{ SIT...

This rendered the node unusable.
I re-issued the command on the correct database and everything looks like going fine (The collection is being compacted).
I hope the node will go back to secondary afterwards, I'll keep you posted.

Doesn't look like a ugly bug, but as it leds the node on RECOVERING state, I think is bad enough.



 Comments   
Comment by auto [ 27/Oct/11 ]

Author:

{u'login': u'', u'name': u'Brandon Diamond', u'email': u'brandon@10gen.com'}

Message: SERVER-4055: Added validation to compact code
Branch: master
https://github.com/mongodb/mongo/commit/1dc8aa72b68fb858989b893b03388d4312d854a3

Comment by Brandon Diamond [ 26/Oct/11 ]

Good to hear. This shouldn't have a lasting impact on your node. I'm reworking the code to avoid the ugly error and the switch to "recovering" state.

Comment by Marc Gràcia [ 26/Oct/11 ]

FYI:
After recovering the correct collection, the node went back to secondary and all worked fine.

Generated at Thu Feb 08 03:04:49 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.