[SERVER-9416] ns name too long leads to Fatal Assertion 16360 and replica set failure Created: 22/Apr/13  Updated: 10/Dec/14  Resolved: 29/Aug/13

Status: Closed
Project: Core Server
Component/s: Index Maintenance
Affects Version/s: 2.4.2
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: Oskar Liljeblad Assignee: Unassigned
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Debian GNU/Linux unstable, mongodb-server 1:2.4.2-1


Issue Links:
Duplicate
is duplicated by SERVER-7282 Logic for checking namespace name len... Closed
Operating System: ALL
Participants:

 Description   

Originally we had three servers in the replica set, but now only one. The reason is that as soon as we add another, they will crash with the following error, and the PRIMARY server will become SECONDARY.

Mon Apr 22 07:16:00.617 [repl writer worker 2] ERROR: writer worker caught exception: ns name too long, max size is 128 on: { ts: Timestamp 1366543066000|162, h: -5305590734582994850, v: 2, op: "i", ns: "metadata_0bff5fe8-8f84-4122-a618-460b63bac265.system.indexes", o: { ns: "metadata_0bff5fe8-8f84-4122-a618-460b63bac265.tmp.mr.terminateConnection_34304294-340c-4fa2-bab1-062604390d90_1000083_inc", key: { 0: 1 }, name: "0_1" } }
Mon Apr 22 07:16:00.617 [repl writer worker 2]   Fatal Assertion 16360
0xa26ad3 0x9f2bec 0x9297c3 0x9ff4f1 0x7f2c4fb5e629 0x7f2c4eb5ab50 0x7f2c4dc8ba7d
 /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x23) [0xa26ad3]
 /usr/bin/mongod(_ZN5mongo13fassertFailedEi+0x4c) [0x9f2bec]
 /usr/bin/mongod(_ZN5mongo7replset14multiSyncApplyERKSt6vectorINS_7BSONObjESaIS2_EEPNS0_8SyncTailE+0x113) [0x9297c3]
 /usr/bin/mongod(_ZN5mongo10threadpool6Worker4loopEv+0x41) [0x9ff4f1]
 /usr/lib/libboost_thread.so.1.49.0(+0x10629) [0x7f2c4fb5e629]
 /lib/x86_64-linux-gnu/libpthread.so.0(+0x6b50) [0x7f2c4eb5ab50]
 /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f2c4dc8ba7d]
Mon Apr 22 07:16:00.619 [repl writer worker 2]
 
***aborting after fassert() failure
 
 
Mon Apr 22 07:16:00.619 Got signal: 6 (Aborted).
 
Mon Apr 22 07:16:00.622 Backtrace:
0xa26ad3 0x61af30 0x7f2c4dbe34f0 0x7f2c4dbe3475 0x7f2c4dbe66f0 0x9f2c27 0x9297c3 0x9ff4f1 0x7f2c4fb5e629 0x7f2c4eb5ab50 0x7f2c4dc8ba7d
 /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x23) [0xa26ad3]
 /usr/bin/mongod(_ZN5mongo10abruptQuitEi+0x100) [0x61af30]
 /lib/x86_64-linux-gnu/libc.so.6(+0x324f0) [0x7f2c4dbe34f0]
 /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35) [0x7f2c4dbe3475]
 /lib/x86_64-linux-gnu/libc.so.6(abort+0x180) [0x7f2c4dbe66f0]
 /usr/bin/mongod(_ZN5mongo13fassertFailedEi+0x87) [0x9f2c27]
 /usr/bin/mongod(_ZN5mongo7replset14multiSyncApplyERKSt6vectorINS_7BSONObjESaIS2_EEPNS0_8SyncTailE+0x113) [0x9297c3]
 /usr/bin/mongod(_ZN5mongo10threadpool6Worker4loopEv+0x41) [0x9ff4f1]
 /usr/lib/libboost_thread.so.1.49.0(+0x10629) [0x7f2c4fb5e629]
 /lib/x86_64-linux-gnu/libpthread.so.0(+0x6b50) [0x7f2c4eb5ab50]
 /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f2c4dc8ba7d]

It didn't matter that we removed the collection - it seemed to be replicated from the journal on some server somehow. To fix the problem we had to remove all the data files on a server and restart replication. So it is not reproducable right now.



 Comments   
Comment by Daniel Pasette (Inactive) [ 29/Aug/13 ]

duplicate of SERVER-7282

Comment by Stennie Steneker (Inactive) [ 03/May/13 ]

Hi Oskar,

Thank you for reporting this bug; I have been able to reproduce the issue using MongoDB 2.4.2.

From the error message provided, it appears that the namespace which is too long is a temporary collection for a MapReduce job:

metadata_0bff5fe8-8f84-4122-a618-460b63bac265.tmp.mr.terminateConnection_34304294-340c-4fa2-bab1-062604390d90_1000083_inc

There is some extra padding reserved for the namespace, so while your namespace is 121 characters it does trigger the 128 character error on the first attempt to insert into the collection. However, subsequent insertions succeed and insert an entry into the oplog with the long namespace. As a consequence, these insertions are creating oplog entries which cannot be applied on the secondary and trigger the Fatal Assertion 16360 that you encountered.

We will investigate the code fix for this issue.

Regards,
Stephen

Comment by Oskar Liljeblad [ 22/Apr/13 ]

I should also note that the collection causing the problem is

db: metadata_0bff5fe8-8f84-4122-a618-460b63bac265
collection: terminateConnection_34304294-340c-4fa2-bab1-062604390d90

(102 characters including '.')

Generated at Thu Feb 08 03:20:21 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.