Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-19304

Reduce Document Validation Overhead for mmapv1 Storage Engine

    • Type: Icon: Bug Bug
    • Resolution: Won't Fix
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 3.1.5
    • Component/s: Storage
    • Labels:
      None
    • Storage Execution
    • ALL
    • Hide

      The following are code excerpts from the mongo-perf js tests used when the difference in overhead between the storage engine was observed. The baseline case basically measures the throughput of inserting documents that has 20 integer fields. The compare case does the same but has a validation filter set up to ensure that all 20 fields are present, and are integers before a document is inserted.

      tests.push( {   name: "Insert.DocValidation.TwentyInt.Baseline", 
                      tags: ['insert', 'baseline'], 
                      pre: function( collection) {
                          collection.drop();
                      },
                      ops: [ {
                          op: "insert",
                          doc: {
                              a: {"#RAND_INT": [0, 10000]},
                              b: {"#RAND_INT": [0, 10000]},
                              c: {"#RAND_INT": [0, 10000]},
                              d: {"#RAND_INT": [0, 10000]},
                              e: {"#RAND_INT": [0, 10000]},
                              f: {"#RAND_INT": [0, 10000]},
                              g: {"#RAND_INT": [0, 10000]},
                              h: {"#RAND_INT": [0, 10000]},
                              i: {"#RAND_INT": [0, 10000]},
                              j: {"#RAND_INT": [0, 10000]},
                              k: {"#RAND_INT": [0, 10000]},
                              l: {"#RAND_INT": [0, 10000]},
                              m: {"#RAND_INT": [0, 10000]},
                              n: {"#RAND_INT": [0, 10000]},
                              o: {"#RAND_INT": [0, 10000]},
                              p: {"#RAND_INT": [0, 10000]},
                              q: {"#RAND_INT": [0, 10000]},
                              r: {"#RAND_INT": [0, 10000]},
                              s: {"#RAND_INT": [0, 10000]},
                              t: {"#RAND_INT": [0, 10000]}
                          } }
      ]});
      
      
      tests.push( {   name: "Insert.DocValidation.TwentyInt", 
                      tags: ['insert', 'baseline'], 
                      pre: function( collection) {
                          collection.drop();
                          collection.runCommand("create", {"validator": {
                              $and: [
                                  {a: {$exists: true}},
                                  {a: {$type: 16}},
                                  {b: {$exists: true}},
                                  {b: {$type: 16}},
                                  {c: {$exists: true}},
                                  {c: {$type: 16}},
                                  {d: {$exists: true}},
                                  {d: {$type: 16}},
                                  {e: {$exists: true}},
                                  {e: {$type: 16}},
                                  {f: {$exists: true}},
                                  {f: {$type: 16}},
                                  {g: {$exists: true}},
                                  {g: {$type: 16}},
                                  {h: {$exists: true}},
                                  {h: {$type: 16}},
                                  {a: {$exists: true}},
                                  {a: {$type: 16}},
                                  {i: {$exists: true}},
                                  {i: {$type: 16}},
                                  {j: {$exists: true}},
                                  {j: {$type: 16}},
                                  {k: {$exists: true}},
                                  {k: {$type: 16}},
                                  {l: {$exists: true}},
                                  {l: {$type: 16}},
                                  {m: {$exists: true}},
                                  {m: {$type: 16}},
                                  {n: {$exists: true}},
                                  {n: {$type: 16}},
                                  {o: {$exists: true}},
                                  {o: {$type: 16}},
                                  {p: {$exists: true}},
                                  {p: {$type: 16}},
                                  {q: {$exists: true}},
                                  {q: {$type: 16}},
                                  {r: {$exists: true}},
                                  {r: {$type: 16}},
                                  {s: {$exists: true}},
                                  {s: {$type: 16}},
                                  {t: {$exists: true}},
                                  {t: {$type: 16}},
                              ] }});
                      },
                      ops: [ {
                          op: "insert",
                          doc: {
                              a: {"#RAND_INT": [0, 10000]},
                              b: {"#RAND_INT": [0, 10000]},
                              c: {"#RAND_INT": [0, 10000]},
                              d: {"#RAND_INT": [0, 10000]},
                              e: {"#RAND_INT": [0, 10000]},
                              f: {"#RAND_INT": [0, 10000]},
                              g: {"#RAND_INT": [0, 10000]},
                              h: {"#RAND_INT": [0, 10000]},
                              i: {"#RAND_INT": [0, 10000]},
                              j: {"#RAND_INT": [0, 10000]},
                              k: {"#RAND_INT": [0, 10000]},
                              l: {"#RAND_INT": [0, 10000]},
                              m: {"#RAND_INT": [0, 10000]},
                              n: {"#RAND_INT": [0, 10000]},
                              o: {"#RAND_INT": [0, 10000]},
                              p: {"#RAND_INT": [0, 10000]},
                              q: {"#RAND_INT": [0, 10000]},
                              r: {"#RAND_INT": [0, 10000]},
                              s: {"#RAND_INT": [0, 10000]},
                              t: {"#RAND_INT": [0, 10000]}
                          } }
      ]});
      
      
      Show
      The following are code excerpts from the mongo-perf js tests used when the difference in overhead between the storage engine was observed. The baseline case basically measures the throughput of inserting documents that has 20 integer fields. The compare case does the same but has a validation filter set up to ensure that all 20 fields are present, and are integers before a document is inserted. tests.push( { name: "Insert.DocValidation.TwentyInt.Baseline", tags: ['insert', 'baseline'], pre: function( collection) { collection.drop(); }, ops: [ { op: "insert", doc: { a: {"#RAND_INT": [0, 10000]}, b: {"#RAND_INT": [0, 10000]}, c: {"#RAND_INT": [0, 10000]}, d: {"#RAND_INT": [0, 10000]}, e: {"#RAND_INT": [0, 10000]}, f: {"#RAND_INT": [0, 10000]}, g: {"#RAND_INT": [0, 10000]}, h: {"#RAND_INT": [0, 10000]}, i: {"#RAND_INT": [0, 10000]}, j: {"#RAND_INT": [0, 10000]}, k: {"#RAND_INT": [0, 10000]}, l: {"#RAND_INT": [0, 10000]}, m: {"#RAND_INT": [0, 10000]}, n: {"#RAND_INT": [0, 10000]}, o: {"#RAND_INT": [0, 10000]}, p: {"#RAND_INT": [0, 10000]}, q: {"#RAND_INT": [0, 10000]}, r: {"#RAND_INT": [0, 10000]}, s: {"#RAND_INT": [0, 10000]}, t: {"#RAND_INT": [0, 10000]} } } ]}); tests.push( { name: "Insert.DocValidation.TwentyInt", tags: ['insert', 'baseline'], pre: function( collection) { collection.drop(); collection.runCommand("create", {"validator": { $and: [ {a: {$exists: true}}, {a: {$type: 16}}, {b: {$exists: true}}, {b: {$type: 16}}, {c: {$exists: true}}, {c: {$type: 16}}, {d: {$exists: true}}, {d: {$type: 16}}, {e: {$exists: true}}, {e: {$type: 16}}, {f: {$exists: true}}, {f: {$type: 16}}, {g: {$exists: true}}, {g: {$type: 16}}, {h: {$exists: true}}, {h: {$type: 16}}, {a: {$exists: true}}, {a: {$type: 16}}, {i: {$exists: true}}, {i: {$type: 16}}, {j: {$exists: true}}, {j: {$type: 16}}, {k: {$exists: true}}, {k: {$type: 16}}, {l: {$exists: true}}, {l: {$type: 16}}, {m: {$exists: true}}, {m: {$type: 16}}, {n: {$exists: true}}, {n: {$type: 16}}, {o: {$exists: true}}, {o: {$type: 16}}, {p: {$exists: true}}, {p: {$type: 16}}, {q: {$exists: true}}, {q: {$type: 16}}, {r: {$exists: true}}, {r: {$type: 16}}, {s: {$exists: true}}, {s: {$type: 16}}, {t: {$exists: true}}, {t: {$type: 16}}, ] }}); }, ops: [ { op: "insert", doc: { a: {"#RAND_INT": [0, 10000]}, b: {"#RAND_INT": [0, 10000]}, c: {"#RAND_INT": [0, 10000]}, d: {"#RAND_INT": [0, 10000]}, e: {"#RAND_INT": [0, 10000]}, f: {"#RAND_INT": [0, 10000]}, g: {"#RAND_INT": [0, 10000]}, h: {"#RAND_INT": [0, 10000]}, i: {"#RAND_INT": [0, 10000]}, j: {"#RAND_INT": [0, 10000]}, k: {"#RAND_INT": [0, 10000]}, l: {"#RAND_INT": [0, 10000]}, m: {"#RAND_INT": [0, 10000]}, n: {"#RAND_INT": [0, 10000]}, o: {"#RAND_INT": [0, 10000]}, p: {"#RAND_INT": [0, 10000]}, q: {"#RAND_INT": [0, 10000]}, r: {"#RAND_INT": [0, 10000]}, s: {"#RAND_INT": [0, 10000]}, t: {"#RAND_INT": [0, 10000]} } } ]});

      While measuring the overhead of document validation, it is noticed that the overhead from doing document validation is higher when the mmapv1 storage engine is used compared to wiredTiger. When inserting documents that has 20 integer fields, adding a validator on all 20 fields in wiredTiger showed a throughput drop of ~10% but mmapv1 throughput dropped by ~25%. We should look at ways to reduce the overhead for mmapv1. The big overhead for mmapv1 was observed in both 3.1.4 and 3.1.5.

            Assignee:
            backlog-server-execution [DO NOT USE] Backlog - Storage Execution Team
            Reporter:
            chung-yen.chang Chung-yen Chang
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: