[SERVER-10335] mmapV1 namespace hashtable max chain limit should be programmaticly optional Created: 25/Jul/13  Updated: 06/Dec/22  Resolved: 14/Sep/18

Status: Closed
Project: Core Server
Component/s: MMAPv1, Replication
Affects Version/s: 2.4.5
Fix Version/s: None

Type: Bug Priority: Minor - P4
Reporter: Daniel Kador Assignee: Backlog - Storage Execution Team
Resolution: Won't Fix Votes: 2
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Ubuntu 12.04 LTS


Issue Links:
Depends
depends on SERVER-10441 allow namespace hashtable to grow dyn... Closed
Assigned Teams:
Storage Execution
Operating System: ALL
Participants:

 Description   

In our workload, users can dynamically create collections and indexes for those collections. A user apparently sent in a large number of collections (around 9,000). The primary handled this okay (well, it errored on insert eventually, which is fine). But when replication picked up the last few, all my secondaries (I have three of them) crashed with the same error. I had to manually go in, reconfigure the old primary, drop the offending database, then delete the database's files off the secondaries. Things came up normally after a restart of all secondaries then.

Here's the log:

  19695 Thu Jul 25 09:57:21.896 [repl writer worker 3] build index keen_service__51c94dc9897a2c2255000008._metadata_PageView- 2013-07-25 14:57:03 +0000 { _id: 1 }
  19696 Thu Jul 25 09:57:21.896 [repl writer worker 3] error: hashtable namespace index max chain reached:1335
  19697 Thu Jul 25 09:57:21.896 [repl writer worker 3] error: hashtable namespace index max chain reached:1335
  19698 Thu Jul 25 09:57:21.896 [repl writer worker 3] error: hashtable namespace index max chain reached:1335
  19699 Thu Jul 25 09:57:21.920 [repl writer worker 3] error: hashtable namespace index max chain reached:1335
  19700 Thu Jul 25 09:57:21.939 [repl writer worker 3] ERROR: writer worker caught exception: too many namespaces/collections on: { ts: Timestamp 1374764241000|103, h: -2418803875898564014, v: 2, op: "
  19700 i", ns: "keen_service__51c94dc9897a2c2255000008._metadata_PageView- 2013-07-25 14:57:03 +0000", o: { _id: "singleton_constant_id", keen_properties: { header:-=-:created_at: { num_appearances: 1  19700 , type_appearances: { datetime: 1 } }, header:-=-:timestamp: { num_appearances: 1, type_appearances: { datetime: 1 } } }, property_names: [ "header:-=-:timestamp", "body:-=-:Event", "body:-=-:u  19700 serId", "header:-=-:created_at", "body:-=-:KiddomLessonTwo" ], user_properties: { body:-=-:Event: { num_appearances: 1, type_appearances: { string: 1 } }, body:-=-:KiddomLessonTwo: { num_appear  19700 ances: 1, type_appearances: { string: 1 } }, body:-=-:userId: { num_appearances: 1, type_appearances: { string: 1 } } } } }
  19701 Thu Jul 25 09:57:21.939 [repl writer worker 3]   Fatal Assertion 16360
  19702 0xdd2331 0xd92323 0xc231db 0xd9fe71 0xe1aad9 0x7f2dbacb2e9a 0x7f2db9fc5ccd
  19703  /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xdd2331]
  19704  /usr/bin/mongod(_ZN5mongo13fassertFailedEi+0xa3) [0xd92323]
  19705  /usr/bin/mongod(_ZN5mongo7replset14multiSyncApplyERKSt6vectorINS_7BSONObjESaIS2_EEPNS0_8SyncTailE+0x13b) [0xc231db]
  19706  /usr/bin/mongod(_ZN5mongo10threadpool6Worker4loopEv+0x281) [0xd9fe71]
  19707  /usr/bin/mongod() [0xe1aad9]
  19708  /lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a) [0x7f2dbacb2e9a]
  19709  /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f2db9fc5ccd]
  19710 Thu Jul 25 09:57:21.992 [repl writer worker 3]
  19711
  19712 ***aborting after fassert() failure
  19713
  19714
  19715 Thu Jul 25 09:57:22.004 Got signal: 6 (Aborted).
  19716
  19717 Thu Jul 25 09:57:22.022 Backtrace:
  19718 0xdd2331 0x6cfb19 0x7f2db9f084a0 0x7f2db9f08425 0x7f2db9f0bb8b 0xd9235e 0xc231db 0xd9fe71 0xe1aad9 0x7f2dbacb2e9a 0x7f2db9fc5ccd
  19719  /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xdd2331]
  19720  /usr/bin/mongod(_ZN5mongo10abruptQuitEi+0x399) [0x6cfb19]
  19721  /lib/x86_64-linux-gnu/libc.so.6(+0x364a0) [0x7f2db9f084a0]
  19722  /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35) [0x7f2db9f08425]
  19723  /lib/x86_64-linux-gnu/libc.so.6(abort+0x17b) [0x7f2db9f0bb8b]
  19724  /usr/bin/mongod(_ZN5mongo13fassertFailedEi+0xde) [0xd9235e]
  19725  /usr/bin/mongod(_ZN5mongo7replset14multiSyncApplyERKSt6vectorINS_7BSONObjESaIS2_EEPNS0_8SyncTailE+0x13b) [0xc231db]
  19726  /usr/bin/mongod(_ZN5mongo10threadpool6Worker4loopEv+0x281) [0xd9fe71]
  19727  /usr/bin/mongod() [0xe1aad9]
  19728  /lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a) [0x7f2dbacb2e9a]
  19729  /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f2db9fc5ccd]



 Comments   
Comment by Eric Milkie [ 20/Apr/16 ]

We could fix this by turning off the limit checking while in non-PRIMARY state.

Comment by Eric Milkie [ 19/Feb/15 ]

Note: this affects MMAPv1 only; WiredTiger storage engine is not affected by this issue.

Comment by Daniel Kador [ 25/Jul/13 ]

I've already patched my app to limit the number of namespaces, so we're out of the woods on this issue. But yeah, it seems pretty bad to have an abnormal but not crazy use case completely crash all secondaries at the same time and require manual file deletion to recover. Would welcome a fix here.

Comment by Eric Milkie [ 25/Jul/13 ]

The maxChain limit you are hitting is a function of the size of the ns file. Unfortunately, the insertion order of items into a hashtable might result in different sized chains, so you will not hit the namespace limit at the same time on primaries and secondaries.
As a workaround, can you modify your application to cap the number of namespaces a bit before you hit this server limit?
The real fix will involve allowing replication to "cheat" a little bit and go past the maxChain limit in the namespace hashtable.

Generated at Thu Feb 08 03:22:53 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.