[SERVER-58064] Investigate the increase in database validation time between MongoDB versions Created: 24/Jun/21  Updated: 18/Nov/21  Resolved: 18/Nov/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Question Priority: Major - P3
Reporter: James O'Leary Assignee: Gregory Noma
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Sprint: Execution Team 2021-08-23, Execution Team 2021-09-06, Execution Team 2021-09-20, Execution Team 2021-11-15, Execution Team 2021-11-29
Participants:

 Description   

While running the workloads for the Benchmark Long Lived Transaction project, it was noted that the validation commands take longer in new database versions.

In general, workloads with more write operations took longer to validate (the % change in the total amount of time ranged from about 10 to 35%).

Is this increase expected? Is there anything that can be done to bring the validation times of the later versions down to match the earlier version.



 Comments   
Comment by Gregory Noma [ 17/Nov/21 ]

I realize that my analysis in my previous comment may not be entirely correct, since in 4.2 we also need to account for the time spent in KeyString::resetToKey as we still need to convert to KeyString even after generating the key in BSON. However, generating the key in BSON and then converting to KeyString may still be faster than building the key in KeyString to begin with.

I should also note that I ran some tests with more indexes on the same dataset, the exact configuration of which caused the results to quite different. In some cases 4.2 and 4.4 performed very close to one another, and in some cases 4.4 was actually faster than 4.2. This makes sense since the more indexes there are, the more keys with have to additionally convert from BSON to KeyString.

Unfortunately the refactor in PM-822 makes it a bit difficult to directly compare the perf results side-by-side. My best guess at the moment is that depending on the exact dataset and indexes, key generation is slower. There may also be some very small regressions in various places due to the refactor, but those are a bit harder to tell. I think we would need to test on a wider variety of datasets to know for sure that there is a general regression, versus just with certain types of data (potentially with speedups on different datasets). Assuming key generation is the bottleneck here, I don't think there's really anything we can easily do about it. We would probably instead have to look to make optimizations in other areas.

In conclusion, there doesn't appear to be anything immediately obvious here to improve on. If we think it's important enough to do so, I think we can open a separate ticket to look into making small improvements to various parts of the validate logic. It would also be useful to have more performance workloads for validate with a wider variety of datasets and index types.

Comment by James O'Leary [ 16/Nov/21 ]

gregory.noma There are 3 other indexes per collection (there are 4 collections).

Comment by Gregory Noma [ 16/Nov/21 ]

jim.oleary were there any secondary indexes on the collections from your validate numbers, or only the _id index?

Comment by Gregory Noma [ 11/Nov/21 ]

I ran another test, this time with a slightly larger dataset (10 million documents rather than 1 million), and this time the main difference seems to be from BtreeKeyGenerator::getKeys.
4.2:

-   94.84%     0.40%  conn1            mongod              [.] mongo::CollectionImpl::validate                                                                                   ▒
   - 94.44% mongo::CollectionImpl::validate                                                                                                                                      ▒
      + 43.98% mongo::RecordStoreValidateAdaptor::traverseIndex                                                                                                                  ▒
      - 36.46% mongo::RecordStoreValidateAdaptor::validate                                                                                                                       ▒
         - 10.28% mongo::AbstractIndexAccessMethod::getKeys                                                                                                                      ▒
            - 9.75% mongo::BtreeKeyGenerator::getKeys                                                                                                                            ▒
               + 1.51% std::vector<std::set<unsigned long, std::less<unsigned long>, std::allocator<unsigned long> >, std::allocator<std::set<unsigned long, std::less<unsigned l▒
               + 1.37% mongo::BSONObj::getField                                                                                                                                  ▒
                 0.90% std::_Rb_tree<mongo::BSONObj, mongo::BSONObj, std::_Identity<mongo::BSONObj>, mongo::BSONComparatorInterfaceBase<mongo::BSONObj>::LessThan, std::allocator▒
               + 0.77% mongo::mongoMalloc                                                                                                                                        ▒
         + 8.40% mongo::KeyString::resetToKey                                                                                                                                    ▒
         + 5.41% mongo::validateBSON                                                                                                                                             ▒
         + 3.57% mongo::IndexConsistency::addDocKey                                                                                                                              ▒
           1.22% std::_Rb_tree<mongo::BSONObj, mongo::BSONObj, std::_Identity<mongo::BSONObj>, mongo::BSONComparatorInterfaceBase<mongo::BSONObj>::LessThan, std::allocator<mongo▒
           0.52% mongo::IndexDescriptor::isMultikey                                                                                                                              ▒
           0.51% operator new[]                                                                                                                                                  ▒
      + 13.87% mongo::WiredTigerRecordStoreCursorBase::next                                                                                                                      ▒

4.4:

-   90.03%     0.00%  conn1            mongod              [.] mongo::CollectionValidation::validate                                                                              ▒
   - mongo::CollectionValidation::validate                                                                                                                                        ▒
      - 55.05% mongo::ValidateAdaptor::traverseRecordStore                                                                                                                        ▒
         - 35.60% mongo::ValidateAdaptor::validateRecord                                                                                                                          ▒
            - 20.50% mongo::AbstractIndexAccessMethod::getKeys                                                                                                                    ▒
               - mongo::BtreeAccessMethod::doGetKeys                                                                                                                              ▒
                  - 19.80% mongo::BtreeKeyGenerator::getKeys                                                                                                                      ▒
                     + 6.29% mongo::KeyString::BuilderBase<mongo::StackBufBuilderBase<512ul> >::appendBSONElement                                                                 ▒
                     + 2.96% mongo::KeyString::BuilderBase<mongo::StackBufBuilderBase<512ul> >::appendRecordId                                                                    ▒
                     + 2.54% mongo::BSONObj::getField                                                                                                                             ▒
                     + 2.49% std::_Rb_tree<mongo::KeyString::Value, mongo::KeyString::Value, std::_Identity<mongo::KeyString::Value>, std::less<mongo::KeyString::Value>, std::all▒
                     + 1.52% std::vector<std::set<unsigned long, std::less<unsigned long>, std::allocator<unsigned long> >, std::allocator<std::set<unsigned long, std::less<unsig▒
                     + 1.19% mongo::mongoMalloc                                                                                                                                   ▒
            + 5.23% mongo::validateBSON                                                                                                                                           ▒
            + 3.17% mongo::IndexConsistency::addDocKey                                                                                                                            ▒
              1.08% std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, mongo::IndexInfo, std::less<std::__cxx11::basic_string<char, std::ch▒
              1.05% std::_Rb_tree<mongo::KeyString::Value, mongo::KeyString::Value, std::_Identity<mongo::KeyString::Value>, std::less<mongo::KeyString::Value>, std::allocator<mo▒
              0.73% mongo::IndexDescriptor::isMultikey                                                                                                                            ▒
         + 17.67% mongo::SeekableRecordThrottleCursor::next                                                                                                                       ▒
           1.08% mongo::ProgressMeter::hit                                                                                                                                        ▒
      + 34.94% mongo::ValidateAdaptor::traverseIndex                                                                                                                              ▒

This possibly points to PM-1345, which changed index key generation to use KeyString rather than BSON. This seems like a reasonable explanation, since key generation using KeyString is expected to be slower.

Comment by Gregory Noma [ 09/Nov/21 ]

I ran a relatively small test, running validate on a collection containing 1 million documents with only an integer _id field. Going from 4.2 to 4.4, I saw a 16.4% speed reduction. A cursory look a the perf profiling seems to show that the main difference is in WiredTigerRecordStoreCursorBase::next.

4.2:

-   93.01%     0.36%  conn1            mongod              [.] mongo::CollectionImpl::validate                                                                                   ▒
   - 92.65% mongo::CollectionImpl::validate                                                                                                                                      ▒
      + 41.62% mongo::RecordStoreValidateAdaptor::validate                                                                                                                       ▒
      + 38.49% mongo::RecordStoreValidateAdaptor::traverseIndex                                                                                                                  ▒
      - 12.41% mongo::WiredTigerRecordStoreCursorBase::next                                                                                                                      ▒
         - 5.16% __curfile_next                                                                                                                                                  ▒
            - 3.75% __wt_btcur_next                                                                                                                                              ▒
                 __cursor_row_slot_return                                                                                                                                        ▒
         + 3.20% mongo::WiredTigerRecordStoreStandardCursor::hasWrongPrefix                                                                                                      ▒
         + 1.11% __wt_cursor_get_value                                                                                                                                           ▒
           0.94% mongo::PrepareConflictTracker::endPrepareConflict                                                                                                               ▒

4.4:

-   92.51%     0.00%  conn2            mongod              [.] mongo::CollectionValidation::validate                                                                              ▒
   - mongo::CollectionValidation::validate                                                                                                                                        ▒
      - 57.17% mongo::ValidateAdaptor::traverseRecordStore                                                                                                                        ▒
         + 36.31% mongo::ValidateAdaptor::validateRecord                                                                                                                          ▒
         - 19.20% mongo::SeekableRecordThrottleCursor::next                                                                                                                       ▒
            - 17.64% mongo::WiredTigerRecordStoreCursorBase::next                                                                                                                 ▒
               - 10.02% __curfile_next                                                                                                                                            ▒
                  - 8.47% __wt_btcur_next_prefix                                                                                                                                  ▒
                     - 2.72% __wt_txn_read                                                                                                                                        ▒
                          __wt_value_return_buf                                                                                                                                   ▒
               + 3.92% mongo::WiredTigerRecordStoreStandardCursor::hasWrongPrefix                                                                                                 ▒
               + 2.31% __wt_cursor_get_value                                                                                                                                      ▒
              1.01% mongo::DataThrottle::awaitIfNeeded                                                                                                                            ▒
           1.00% mongo::ProgressMeter::hit                                                                                                                                        ▒
      + 35.24% mongo::ValidateAdaptor::traverseIndex                                                                                                                              ▒

Comment by James O'Leary [ 24/Jun/21 ]

connie.chen
Nominally, the motivation was partly due to some of the genny tests failing because validation phase of the task was taking longer. These particular tests were not deployed in the end so this is not a huge issue.

The LLT workloads run for a fixed amount of time,  but the overall duration of the task was taking longer on the later versions.  So I was interested to see if this increase in the validation times is something that was expected.

The number above are the sum of the millisecond durations of the validate commands scraped from the mongod logs.

So for example, in the following case:
 

llt_insert_high_ltc linux-3-node-replSet
    756437 v4.2 total
   1016256 v4.4 total       0.34
   1030633 v5.0 total       0.36

On a V4.2 3 node replset, validating 4 collections (with 20M docs each) and a DB size of ~ 12GB took 756437 milliseconds.
On a V4.4 3 node replset, validating 4 collections (with 20M docs each) and a DB size of ~ 12GB took 1016256 milliseconds.
On a V5.0 3 node replset (albeit an older git revision), validating 4 collections (with 20M docs each) and a DB size of ~ 12GB took 1030633 milliseconds.
V4.4 and V5.0 are 34% and 36% slower respectively.

Comment by Connie Chen [ 24/Jun/21 ]

jim.oleary - could you provide additional context behind the motivation for this? 
Please also detail what the numbers mean in the chart you provided in the comment above. 

Generated at Thu Feb 08 05:43:29 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.