Goal: ascertain whether the performance difference of persisting a change diff log for derived metadata is significant, to inform whether to choose this implementation plan. A carefully shaped index should be a reasonable simile of a change diff log.
Hypothetical Change Diff Log, clustered by timestamp
|
{
|
timestamp: <> <- clustered index
|
nss/UUID: <> <- collection identifier
|
change: {
|
count: <1,-1> <- absence of DM type field == no change
|
dataSize: <int>
|
}
|
}
|
Collection
|
{
|
_id: <> <- leave blank
|
monotonicField: 1 <- shall create index on this, always increases like a timestamp
|
randomValueField: <> <- shall represent derived metadata diff values
|
}
|
Index on monotonicField
|
{
|
monotonicFieldValue: <collection_docID>
|
}
|
Workload
|
{
|
1 thread running inserts on the collection
|
manually compare performance results with and without the proposed index
|
}
|
1) I'm choosing an insert workload, to prompt adding entries to the index, like as if it were a log where an entry is written on every write.
2) I may experiment with multiple threads, say 3-4, depending on the 1 thread results. The monotonicField would then be mostly monotonically increasing, arguably simulating out-of-order writes.