[SERVER-65121] Investigate performance difference between collection w/o index and w/ index, where the index is a similar format to that of a derived metadata change diff log Created: 31/Mar/22  Updated: 25/Apr/22  Resolved: 25/Apr/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Dianna Hohensee (Inactive) Assignee: Dianna Hohensee (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File Screen Shot 2022-04-11 at 12.16.40 PM.png    
Sprint: Execution Team 2022-04-18, Execution Team 2022-05-02
Participants:

 Description   

Goal: ascertain whether the performance difference of persisting a change diff log for derived metadata is significant, to inform whether to choose this implementation plan. A carefully shaped index should be a reasonable simile of a change diff log.

Hypothetical Change Diff Log, clustered by timestamp
{
	timestamp: <>                               <- clustered index
	nss/UUID: <>                                 <- collection identifier
	change: {
		count: <1,-1>                         <- absence of DM type field == no change
		dataSize: <int>
	}
}

Collection
{
	_id: <>					<- leave blank
	monotonicField: 1			<- shall create index on this, always increases like a timestamp
	randomValueField: <>		<- shall represent derived metadata diff values
}

Index on monotonicField
{
	monotonicFieldValue: <collection_docID>
}

Workload
{
	1 thread running inserts on the collection
	manually compare performance results with and without the proposed index
}

1) I'm choosing an insert workload, to prompt adding entries to the index, like as if it were a log where an entry is written on every write.

2) I may experiment with multiple threads, say 3-4, depending on the 1 thread results. The monotonicField would then be mostly monotonically increasing, arguably simulating out-of-order writes.



 Comments   
Comment by Dianna Hohensee (Inactive) [ 19/Apr/22 ]

The results using BatchSize appear really bad. The values are the index values divided by the no index values. We're losing 10-20% throughput, and gaining ~20% latency.

  Num
Docs
Ave.
Latency
Op
Throughput
Latency
99th
Percentile
Latency
95th
Percentile
1 Thread
index/no index
10,000 1.118 0.82 0.72 1.24
           
1 Thread
index/no index
1,000,000 1.22 0.86 1.22 1.22
           
5 Threads
index/no index
1,000,000 1.20 0.87 1.22 1.21
           
10 Threads
index/no index
1,000,000 1.148 0.90 1.446 1.25
           

I used variations on the following genny test code:

Actors:
 
- Name: BuildAscendingIndex
  Type: RunCommand
  Threads: 1
  Phases:
  - Repeat: 1
    Database: FooTestDB
    Collection: Collection0
    Operations:
    - OperationName: RunCommand
      OperationCommand:
        createIndexes: Collection0
        indexes:
        - key: {monotonicField: 1}
          name: monotonicField_1
  - {Nop: true}
 
- Name: Loader
  Type: Loader
  Threads: 10
  Phases:
  - {Nop: true}
  - Repeat: 1
    Database: FooTestDB
    MultipleThreadsPerCollection: true
    CollectionCount: 1
    DocumentCount: 1000000
    BatchSize: 300
    Document:
      monotonicField: 5
      randomValueField: {^RandomInt: {min: 0, max: 50}}

Comment by Dianna Hohensee (Inactive) [ 12/Apr/22 ]

Next avenues of exploration would be

  • Use BatchSize with a high number because it minimizes noise from the rest of the system in the results.
  • Use duration rather than document number to run the perf workloads
Comment by Dianna Hohensee (Inactive) [ 12/Apr/22 ]

CrudActor.insertOne

  Num
Docs
Ave.
Latency
Op
Throughput
Latency
99th
Percentile
Duration
Total
Overhead
Total
1 Thread
No Index
2,000 469183 2049 486928 976,000,000 81589408
1 Thread
Index
2,000 458038 2074 468870 964,000,000 124641440
    0.97 1.01 0.96 0.98 1.53
1 Thread
No Index
20,000 453982 2036 482656 9820,000,000 784,037,740
1 Thread
Index
20,000 461363 2002 489156 9989,000,000 839,553,939
    1.01 0.98 1.01 1.017 1.07
5 Threads
No Index
20,000 520487 8869 628686 2255,000,000 844520586
5 Threads
Index
20,000 487372 9601 559442 2083,000,000 1058689191
    0.93 1.08 0.89 0.92 1.25
10 Threads
No Index
20,000 617355 14825 1015251 1349,000,000 916319400
10 Threads
Index
20,000 573565 15503 1027878 1290,000,000 1,352,263,676
    0.93 1.045 1.012 0.96 1.48

This looks maybe promising

  • The multi-thread results are consistently promising, rather nonsensically so.
  • The single thread results don't appear to suffer appreciably.
  • The OverheadTotal I've really have no idea what's going on there. Something to do with the test infrastructure would be my first guess, as I don't really understand what it encompasses.

In summation, the results don't appear very conclusive in pointing at with an index or without being always better, so I think it might be fair to say that the performance is not appreciably different and the approach is OK.

Comment by Dianna Hohensee (Inactive) [ 11/Apr/22 ]

Next I'm going to try to get perf comparison with and without index using something like this

Actors:
 
- Name: BuildAscendingIndex
  Type: RunCommand
  Threads: 1
  Phases:
  - Repeat: 1
    Database: FooTestDB
    Collection: FooTestCollection
    Operations:
    - OperationName: RunCommand
      OperationCommand:
        createIndexes: FooTestCollection
        indexes:
        - key: {monotonicField: 1}
          name: monotonicField_1
  - {Nop: true}
 
- Name: InsertDocumentsLoad
  Type: CrudActor
  Database: FooTestDB
  Threads: 1
  Phases:
  - {Nop: true}
  - Repeat: 20000
    Threads: 1
    Collection: FooTestCollection
    Operations:
    - OperationName: insertOne
      OperationCommand:
        Document:
          monotonicField: 5
          randomValueField: {^RandomInt: {min: 0, max: 50}}

The idea is that a constant value for the index field across documents will cause ordering to fall back onto the collection RecordID

  • The index key is created from the key value and collection RecordID, to ensure uniqueness.
  • If the key value is always the same, then the RecordID is the deciding factor for ordering.
  • If the document being inserted does not define a _id field, then one will be auto-generated, and _id is the RecordID.
  • So the index will be created in ascending order, which is most performant.

The approach avoids the issue of concurrent genny threads generating 'monotonicField' values that either aren't sequential across threads or are duplicates.

Comment by Dianna Hohensee (Inactive) [ 11/Apr/22 ]

SchemaVersion: 2018-07-01
Owner: "@mongodb/replication"
Description: |
  TODO: TIG-3321
 
Actors:
 
- Name: Loader
  Type: Loader
  Threads: 10
  Phases:
  - Repeat: 1
    Database: &DB test
    MultipleThreadsPerCollection: true
    CollectionCount: 1
    DocumentCount: 20000
    BatchSize: 1
    Document:
      monotonicField: {^Inc: {start: 0}}
      randomValueField: {^RandomInt: {min: 0, max: 50}}
    Indexes:
    - keys: {monotonicField: 1}
      options: {name: "monotonicField_1"}
 
AutoRun:
- When:
    mongodb_setup:
      $eq:
      - atlas
      - replica
      - replica-all-feature-flags
      - single-replica

I ran the above with different configurations (w/ or w/o index, document count, thread number).

  Num
Docs
Ave.
Latency
Op
Throughput
Latency
99th
Percentile
Duration
Total
Total
Overhead
1 Thread
No Index
5,000 451508 2125 470631 2352000000 146553505
1 Thread
Index
5,000 437551 2193 452645 2279000000 141445008
    .96 1.032 .96 .97 .97
1 Thread
No Index
20,000 445491 2115 471213 9453000000 594571235
1 Thread
Index
20,000 448309 2116 473196 945000000 551885149
    1.006 1.00 1.004 1.00 .92
5 Threads
No Index
20,000 501206 9606 548557 2082000000 596304932
5 Threads
Index
20,000 503747 9492 558126 2107000000 598560765
    1.005 .988 1.02 1.01  
10 Threads
No Index
20,000 566244 17256 721253 1159000000 697396606
10 Threads
Index
20,000 635821 14652 1128086 1365000000 735095274
    1.12 0.86 1.56 1.17 1.05

Some issues to note

  • I should separate out the index building from the Loader, for better perf isolation
  • The DocumentGenerator in genny is using {^Inc: {start: 0}}, which gives each thread a generator starting at value 0. Therefore, duplicate values are inserted into the index.
  • I'm not sure how genny gathers these statistics, what the AverageLatency specifically encompasses, or what OperationThroughput means when it doesn't add up to the number of documents inserted, etc.

Lastly, it doesn't make a lot of sense to me that latency would decrease 50% for 10 threads just because of an index. I think something else is going on in that number but I'm not sure what, could be test infrastructure, or how latency is calculated.

Generated at Thu Feb 08 06:01:55 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.