[SERVER-8192] Optimize btree key generation Created: 16/Jan/13  Updated: 08/Dec/15  Resolved: 31/Mar/15

Status: Closed
Project: Core Server
Component/s: Index Maintenance, Write Ops
Affects Version/s: None
Fix Version/s: 3.1.1

Type: Improvement Priority: Major - P3
Reporter: Aaron Staple Assignee: David Storch
Resolution: Done Votes: 32
Labels: graphquery
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Duplicate
is duplicated by SERVER-6904 Inner Collection (w/ > 10000 entries)... Closed
is duplicated by SERVER-9128 In place updates on documents contain... Closed
is duplicated by SERVER-8954 Index Key Extraction Much Slower for ... Closed
is duplicated by SERVER-11699 Update command: unexpected performance Closed
Related
related to SERVER-14037 Parallel arrays in distinct sub-docum... Closed
related to SERVER-8193 Optimize in place updates that modify... Closed
Backwards Compatibility: Fully Compatible
Sprint: Quint 3.1.0, Quint Iteration 3.1.1
Participants:

 Description   

IndexSpec::getKeys() finds the set of index keys for a given document and index key spec. It's used when inserting / updating / deleting a document to update the index entries, and also for performing in memory sorts, deduping $or clauses and for other purposes.

Right now extracting 10k elements from a nested object field within an array takes on the order of seconds on a decently fast machine. We could see how much we can optimize the implementation.



 Comments   
Comment by Githook User [ 31/Mar/15 ]

Author:

{u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}

Message: SERVER-8192 BtreeKeyGenerator cleanup
Branch: master
https://github.com/mongodb/mongo/commit/81b5ee3d389e67b4460417a8c701436fe49e4950

Comment by Githook User [ 31/Mar/15 ]

Author:

{u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}

Message: SERVER-8192 optimize V1 index key generation for indexed arrays
Branch: master
https://github.com/mongodb/mongo/commit/bbd8e48529b71d78563a796b8c8e9d029c248d67

Comment by Githook User [ 17/Mar/15 ]

Author:

{u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}

Message: SERVER-8192 add tests for V1 index key generation
Branch: master
https://github.com/mongodb/mongo/commit/51508504b737064060ac70a016eeef7ade8699d2

Comment by Githook User [ 27/May/14 ]

Author:

{u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}

Message: SERVER-8192 additional btree key generation unit tests
Branch: master
https://github.com/mongodb/mongo/commit/77d6ea60245080552132d942930180a61897d5de

Comment by Githook User [ 24/May/14 ]

Author:

{u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}

Message: Revert "SERVER-8192 rewritten V1 btree key generation"

This reverts commit 441b3c183c76399f248205989dec757708601394.
Branch: master
https://github.com/mongodb/mongo/commit/63f6ed53137372c8c056d730702278a36a636cd1

Comment by Githook User [ 13/May/14 ]

Author:

{u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}

Message: SERVER-8192 rewritten V1 btree key generation
Branch: master
https://github.com/mongodb/mongo/commit/441b3c183c76399f248205989dec757708601394

Comment by Githook User [ 21/Apr/14 ]

Author:

{u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}

Message: SERVER-8192 port btree key generation tests to a new unit test
Branch: master
https://github.com/mongodb/mongo/commit/e87b42c4f13e48078f5c4aefba3caf18dcfba072

Comment by A. Jesse Jiryu Davis [ 02/Apr/13 ]

The above experiment in 2.4.0-rc0, on my Mac:

elements	ms	predicted
1000	90		82
2000	350		328
3000	770		738
4000	1340	1312
5000	2080	2050
6000	3000	2952
7000	3950	4018
8000	5325	5248
9000	6530	6642
10000	8200	8200

Where "ms" is the time, and "predicted" is just 0.000082*(number of elements squared). I'm claiming that since "ms" is close to "predicted", there's a quadratic algorithm in the code path. In 2.0.9, the performance seems slightly better but still quadratic:

elements	ms	predicted
1000	71		64
2000	268		257
3000	581		577
4000	1026	1027
5000	1683	1604
6000	2320	2310
7000	3161	3144
8000	4070	4106
9000	5216	5197
10000	6416	6416

In this case the constant factor is .00006416 (about 20% better), but the shape of the curve is the same.

Comment by A. Jesse Jiryu Davis [ 02/Apr/13 ]

Some code to test the duration of an update with two indexes and N subdocuments:

db.test.drop()
db.test.createIndex({'x.a': 1});
db.test.createIndex({'x.b': 1});
 
function bigdoc(n) {
  var x = [];
  var doc = {x: x};
  for (var i = 0; i < n; i++) {
    x.push({a:i, b:i});
  }
  
  return doc;
}
 
function timeit(n) {
  var doc = bigdoc(n);
  db.test.remove();
  db.test.insert(doc);
  var gle = db.getLastError();
  var start = ISODate();
  db.test.update({}, {$addToSet:{x:{a:1, b:1}}});
  gle = db.getLastError();
  return ISODate() - start;
}

Comment by Asya Kamsky [ 20/Mar/13 ]

Related tests and comments in SERVER-8193

On my Mac update of a different indexed field (in place) takes 45 seconds when there is an indexed array with 20,000 entries that getKeys() traverses.

Generated at Thu Feb 08 03:16:48 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.