Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-21383

Surprisingly slow insertion to mongodb capped collections

    • Type: Icon: Bug Bug
    • Resolution: Cannot Reproduce
    • Priority: Icon: Minor - P4 Minor - P4
    • None
    • Affects Version/s: 3.2.0-rc0
    • Component/s: Performance
    • None
    • ALL
    • Hide

      Run the following script twice in parallel:

      import time, pymongo, pandas as pd
      DBNAME = 'test5'
      COLLNAME = 'abc'
      mongo = pymongo.MongoClient()
      #mongo[DBNAME].drop_collection(COLLNAME)
      if DBNAME not in mongo.database_names():
          coll = mongo[DBNAME].create_collection(COLLNAME,
                                          capped = True, size = 50 * 1024 * 1024, max = 1024 * 1024,
                                          fsync = False,  # no delays please (FYI, journaling is disabled)
                                          #autoIndexId = False,
                                          )
      
      tarr = []
      for _ in range(1000):
          msg = dict.fromkeys([ 'f%d' % j for j in range(30) ], 'foo')
          t1 = time.time()
          coll.insert(msg)  # <-- INSERT() HERE
          tarr.append(time.time() - t1)
          time.sleep(1./100)  # 100 docs/sec. surely not too much load...
      
      print pd.Series(tarr).describe()
      

      OUTPUT:

      RUN1:
      count    1000.000000
      mean        0.001555
      std         0.002712
      min         0.000654
      25%         0.001021
      50%         0.001084
      75%         0.001151
      max         0.065006  <-- 65ms!! 
      dtype: float64
      
      RUN2:
      count    1000.000000
      mean        0.001733
      std         0.004013
      min         0.000717
      25%         0.001043
      50%         0.001069
      75%         0.001120
      max         0.084803  <-- 84ms!! 
      dtype: float64
      
      Show
      Run the following script twice in parallel: import time, pymongo, pandas as pd DBNAME = 'test5' COLLNAME = 'abc' mongo = pymongo.MongoClient() #mongo[DBNAME].drop_collection(COLLNAME) if DBNAME not in mongo.database_names(): coll = mongo[DBNAME].create_collection(COLLNAME, capped = True, size = 50 * 1024 * 1024, max = 1024 * 1024, fsync = False, # no delays please (FYI, journaling is disabled) #autoIndexId = False, ) tarr = [] for _ in range(1000): msg = dict.fromkeys([ 'f%d' % j for j in range(30) ], 'foo') t1 = time.time() coll.insert(msg) # <-- INSERT() HERE tarr.append(time.time() - t1) time.sleep(1./100) # 100 docs/sec. surely not too much load... print pd.Series(tarr).describe() OUTPUT: RUN1: count 1000.000000 mean 0.001555 std 0.002712 min 0.000654 25% 0.001021 50% 0.001084 75% 0.001151 max 0.065006 <-- 65ms!! dtype: float64 RUN2: count 1000.000000 mean 0.001733 std 0.004013 min 0.000717 25% 0.001043 50% 0.001069 75% 0.001120 max 0.084803 <-- 84ms!! dtype: float64
    • None
    • 0
    • None
    • None
    • None
    • None
    • None
    • None

      I'm using mongodb's capped collections + tailable cursors as a pubsub engine, using pymongo. (This approach is described here: http://blog.pythonisito.com/2013/04/mongodb-pubsub-with-capped-collections.html). My environment includes several topics, several publishers, and a few dozen subscribers.

      This works great, except that every once in a while I encounter insert-operations which take surprisingly long time to complete.

      My understanding is that mongodb can support insertion of >10K docs/sec, which is why I am surprised that a single insert operation can take 85ms. (I also observed >1sec a few times in my production environment, but couldn't reproduce it using this SSCCE).

      To keep delays to a minimum, journaling is disabled, and I am using the lowest write-concern, which helped, but didn't solve the problem.

      What causes this and how can I fix it?

      I tried both v3.2.0-rc0 and v3.0.7.

      More info:

      db version v3.2.0-rc0
      git version: bf28bd20fa507c4d8cc5919dfbbe87b7750ae8b0
      OpenSSL version: OpenSSL 1.0.1f 6 Jan 2014
      allocator: tcmalloc
      modules: none
      build environment:
          distmod: ubuntu1404
          distarch: x86_64
          target_arch: x86_64
      
      "storageEngine" : { "name" : "mmapv1", "supportsCommittedReads" : false }
      
      journaling DISABLED
      pymongo version = 3.0.3
      

      (I also posted this in StackOverflow: http://stackoverflow.com/q/33521301/2096752 . No answers yet)

            Assignee:
            Unassigned Unassigned
            Reporter:
            shx2 S S
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: