Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-4795

Possible Memory Leak

    • Type: Icon: Bug Bug
    • Resolution: Works as Designed
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: WT3.2.0
    • Component/s: None
    • Labels:
      None
    • Environment:
      Ubuntu 18.04
    • 0
    • Storage Engines 2019-06-03

      This a follow up on the ticket https://jira.mongodb.org/browse/WT-4770

      I made the necessary changes to allow WT to ingest the data.

      Here is the configuration string I use to open the database:

       

      create,log=(enabled=true,file_max=512MB),cache_size=1024MB

       

      What I do is store data only in keys but that should not matter since most keys are smaller that 1MB and the whole transaction doesn't exceed 10MB (I think).

      Mind the fact that this time this is not random keys.

      What I store are 3-tuples with two of their permutations. For instance for a given tuple (subject, predicate, object), I will store that in a prefix 1, then I will store (predicate, object, subject) in prefix 2 and at last in prefix 3 I will store (object, subject, predicate).

      The data that I am storing is the following:

      • Given a DOCUMENT with FILENAME store it in (FILENAME, "text", DOCUMENT) (with its permutations)
      • Remove punctuation from DOCUMENT and split by space, it will be given WORDS
      • For each WORD in WORDS, store (FILENAME, "word", WORD)

      This is similar to an inverted index and its original document.

      While loading that data inside wiredtiger, even if I do reset the session between each transaction, leads to a memory leak.

      You can find the data at http://hyper.dev/out.log.gz

      The program that show the behavior is the following:

      import base64
      import wiredtiger
      config = "create,log=(enabled=true,file_max=512MB),cache_size=1024MB"
      wt = wiredtiger.wiredtiger_open("wt", config)
      session = wt.open_session()
      session.create("table:test", "key_format=u,value_format=u")
      cursor = session.open_cursor("table:test")
      index = 0
      with open("out.log") as f:
          session.begin_transaction()
          for line in f:
              if line == "# BEGIN TRANSACTION\n":
                  session.commit_transaction()
                  session.reset()  # XXX: try to release some memory.
                  index += 1
                  print("transaction", index)
                  session.begin_transaction()
              line = line.strip().encode("ascii")
              key = base64.b64decode(line)
              cursor.set_key(key)
              cursor.set_value(b"\x01")
              cursor.insert()session.commit_transaction()
      

      I extracted this use case from a project (named hoply) where I use 3 tables instead of one table with several prefixes. Using three tables the leak is much more obvious. I can guide you to setup hoply, if you want.

      Let me know if I am doing something wrong.

       

       

       

        1. dump.py
          0.9 kB
        2. hoply-memory-graph-without-session-close.png
          hoply-memory-graph-without-session-close.png
          16 kB
        3. hoply-memory-graph-with-session-close.png
          hoply-memory-graph-with-session-close.png
          15 kB
        4. load.py
          0.8 kB
        5. wt-memory-graph.png
          wt-memory-graph.png
          15 kB

            Assignee:
            alexander.gorrod@mongodb.com Alexander Gorrod
            Reporter:
            amz3 Amirouche
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: