|
The results using BatchSize appear really bad. The values are the index values divided by the no index values. We're losing 10-20% throughput, and gaining ~20% latency.
| |
Num
Docs |
Ave.
Latency |
Op
Throughput |
Latency
99th
Percentile |
Latency
95th
Percentile |
1 Thread
index/no index |
10,000 |
1.118 |
0.82 |
0.72 |
1.24 |
| |
|
|
|
|
|
1 Thread
index/no index |
1,000,000 |
1.22 |
0.86 |
1.22 |
1.22 |
| |
|
|
|
|
|
5 Threads
index/no index |
1,000,000 |
1.20 |
0.87 |
1.22 |
1.21 |
| |
|
|
|
|
|
10 Threads
index/no index |
1,000,000 |
1.148 |
0.90 |
1.446 |
1.25 |
| |
|
|
|
|
|
I used variations on the following genny test code:
Actors:
|
|
- Name: BuildAscendingIndex
|
Type: RunCommand
|
Threads: 1
|
Phases:
|
- Repeat: 1
|
Database: FooTestDB
|
Collection: Collection0
|
Operations:
|
- OperationName: RunCommand
|
OperationCommand:
|
createIndexes: Collection0
|
indexes:
|
- key: {monotonicField: 1}
|
name: monotonicField_1
|
- {Nop: true}
|
|
- Name: Loader
|
Type: Loader
|
Threads: 10
|
Phases:
|
- {Nop: true}
|
- Repeat: 1
|
Database: FooTestDB
|
MultipleThreadsPerCollection: true
|
CollectionCount: 1
|
DocumentCount: 1000000
|
BatchSize: 300
|
Document:
|
monotonicField: 5
|
randomValueField: {^RandomInt: {min: 0, max: 50}}
|
|
|
Next avenues of exploration would be
- Use BatchSize with a high number because it minimizes noise from the rest of the system in the results.
- Use duration rather than document number to run the perf workloads
|
|
CrudActor.insertOne
| |
Num
Docs |
Ave.
Latency |
Op
Throughput |
Latency
99th
Percentile |
Duration
Total |
Overhead
Total |
1 Thread
No Index |
2,000 |
469183 |
2049 |
486928 |
976,000,000 |
81589408 |
1 Thread
Index |
2,000 |
458038 |
2074 |
468870 |
964,000,000 |
124641440 |
| |
|
0.97 |
1.01 |
0.96 |
0.98 |
1.53 |
1 Thread
No Index |
20,000 |
453982 |
2036 |
482656 |
9820,000,000 |
784,037,740 |
1 Thread
Index |
20,000 |
461363 |
2002 |
489156 |
9989,000,000 |
839,553,939 |
| |
|
1.01 |
0.98 |
1.01 |
1.017 |
1.07 |
5 Threads
No Index |
20,000 |
520487 |
8869 |
628686 |
2255,000,000 |
844520586 |
5 Threads
Index |
20,000 |
487372 |
9601 |
559442 |
2083,000,000 |
1058689191 |
| |
|
0.93 |
1.08 |
0.89 |
0.92 |
1.25 |
10 Threads
No Index |
20,000 |
617355 |
14825 |
1015251 |
1349,000,000 |
916319400 |
10 Threads
Index |
20,000 |
573565 |
15503 |
1027878 |
1290,000,000 |
1,352,263,676 |
| |
|
0.93 |
1.045 |
1.012 |
0.96 |
1.48 |
This looks maybe promising
- The multi-thread results are consistently promising, rather nonsensically so.
- The single thread results don't appear to suffer appreciably.
- The OverheadTotal I've really have no idea what's going on there. Something to do with the test infrastructure would be my first guess, as I don't really understand what it encompasses.
In summation, the results don't appear very conclusive in pointing at with an index or without being always better, so I think it might be fair to say that the performance is not appreciably different and the approach is OK.
|
|
Next I'm going to try to get perf comparison with and without index using something like this
Actors:
|
|
- Name: BuildAscendingIndex
|
Type: RunCommand
|
Threads: 1
|
Phases:
|
- Repeat: 1
|
Database: FooTestDB
|
Collection: FooTestCollection
|
Operations:
|
- OperationName: RunCommand
|
OperationCommand:
|
createIndexes: FooTestCollection
|
indexes:
|
- key: {monotonicField: 1}
|
name: monotonicField_1
|
- {Nop: true}
|
|
- Name: InsertDocumentsLoad
|
Type: CrudActor
|
Database: FooTestDB
|
Threads: 1
|
Phases:
|
- {Nop: true}
|
- Repeat: 20000
|
Threads: 1
|
Collection: FooTestCollection
|
Operations:
|
- OperationName: insertOne
|
OperationCommand:
|
Document:
|
monotonicField: 5
|
randomValueField: {^RandomInt: {min: 0, max: 50}}
|
The idea is that a constant value for the index field across documents will cause ordering to fall back onto the collection RecordID
- The index key is created from the key value and collection RecordID, to ensure uniqueness.
- If the key value is always the same, then the RecordID is the deciding factor for ordering.
- If the document being inserted does not define a _id field, then one will be auto-generated, and _id is the RecordID.
- So the index will be created in ascending order, which is most performant.
The approach avoids the issue of concurrent genny threads generating 'monotonicField' values that either aren't sequential across threads or are duplicates.
|
SchemaVersion: 2018-07-01
|
Owner: "@mongodb/replication"
|
Description: |
|
TODO: TIG-3321
|
|
Actors:
|
|
- Name: Loader
|
Type: Loader
|
Threads: 10
|
Phases:
|
- Repeat: 1
|
Database: &DB test
|
MultipleThreadsPerCollection: true
|
CollectionCount: 1
|
DocumentCount: 20000
|
BatchSize: 1
|
Document:
|
monotonicField: {^Inc: {start: 0}}
|
randomValueField: {^RandomInt: {min: 0, max: 50}}
|
Indexes:
|
- keys: {monotonicField: 1}
|
options: {name: "monotonicField_1"}
|
|
AutoRun:
|
- When:
|
mongodb_setup:
|
$eq:
|
- atlas
|
- replica
|
- replica-all-feature-flags
|
- single-replica
|
I ran the above with different configurations (w/ or w/o index, document count, thread number).
| |
Num
Docs |
Ave.
Latency |
Op
Throughput |
Latency
99th
Percentile |
Duration
Total |
Total
Overhead |
1 Thread
No Index |
5,000 |
451508 |
2125 |
470631 |
2352000000 |
146553505 |
1 Thread
Index |
5,000 |
437551 |
2193 |
452645 |
2279000000 |
141445008 |
| |
|
.96 |
1.032 |
.96 |
.97 |
.97 |
1 Thread
No Index |
20,000 |
445491 |
2115 |
471213 |
9453000000 |
594571235 |
1 Thread
Index |
20,000 |
448309 |
2116 |
473196 |
945000000 |
551885149 |
| |
|
1.006 |
1.00 |
1.004 |
1.00 |
.92 |
5 Threads
No Index |
20,000 |
501206 |
9606 |
548557 |
2082000000 |
596304932 |
5 Threads
Index |
20,000 |
503747 |
9492 |
558126 |
2107000000 |
598560765 |
| |
|
1.005 |
.988 |
1.02 |
1.01 |
|
10 Threads
No Index |
20,000 |
566244 |
17256 |
721253 |
1159000000 |
697396606 |
10 Threads
Index |
20,000 |
635821 |
14652 |
1128086 |
1365000000 |
735095274 |
| |
|
1.12 |
0.86 |
1.56 |
1.17 |
1.05 |
Some issues to note
- I should separate out the index building from the Loader, for better perf isolation
- The DocumentGenerator in genny is using {^Inc: {start: 0}}, which gives each thread a generator starting at value 0. Therefore, duplicate values are inserted into the index.
- I'm not sure how genny gathers these statistics, what the AverageLatency specifically encompasses, or what OperationThroughput means when it doesn't add up to the number of documents inserted, etc.
Lastly, it doesn't make a lot of sense to me that latency would decrease 50% for 10 threads just because of an index. I think something else is going on in that number but I'm not sure what, could be test infrastructure, or how latency is calculated.
|
Generated at Thu Feb 08 06:01:55 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.