[SERVER-36697] Use virtual clock for transactions metrics unit tests Created: 16/Aug/18  Updated: 29/Oct/23  Resolved: 03/Oct/18

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 4.1.4

Type: Bug Priority: Major - P3
Reporter: William Schultz (Inactive) Assignee: William Schultz (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Related
related to SERVER-37317 Make sleepmillis more conservative in... Closed
is related to SERVER-36846 TransactionsMetrics unit test should ... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.0
Sprint: Repl 2018-09-10, Repl 2018-09-24, Repl 2018-10-08
Participants:
Linked BF Score: 34

 Description   

Currently, many of the unit tests for transactions metrics in transaction_participant_test.cpp use real time sleeps in order to measure the passage of time and to check assertions.This can make the tests flaky and also slow. We should try to virtualize the clock source used for these tests to circumvent these issues.



 Comments   
Comment by Githook User [ 03/Oct/18 ]

Author:

{'name': 'William Schultz', 'email': 'william.schultz@mongodb.com', 'username': 'will62794'}

Message: SERVER-36697 Utilize TickSource for transactions timing metrics

This patch converts the existing transactions diagnostics timing related
metrics to use a TickSource to record transaction durations. The
TickSource is a high precision, mock-able time source for measuring
the passage of time. This patch also converts the existing unit tests
to use a mock TickSource, which allows the tests to fully virtualize
time, making them much faster and less flaky.
Branch: master
https://github.com/mongodb/mongo/commit/982ba21e0ffbdaaf766dc8fa060728ba9b5f4914

Comment by William Schultz (Inactive) [ 26/Sep/18 ]

After working on a patch for this, I unfortunately think it may be sensible to forgo the clock virtualization for these tests. For one, I realized that the slowness of these unit tests doesn't necessarily come from the fact they use real time sleeps. For example, the regular TxnParticipantTest unit tests, which include almost no metrics tests, already report the following runtime:

TxnParticipantTest             | tests:   62 | fails:    0 | assert calls:          0 | time secs:  4.093

so it seems like in general the unit tests in TxnParticipantTest are just a bit slow. Additionally, the changes necessary to virtualize the clock source for all of these tests leads to a fair amount of code churn, and I doubt that the changes would apply cleanly in a 4.0 backport, which would be necessary if we wanted to fix the test issues on 4.0 as well. Originally this ticket grew out of an attempt to address BFs that arose as a result of these tests depending on real time. At this point, I think it may just be better to fix those issues in an ad hoc manner. Also, the virtualization changes would lead to some changes about how we track metrics internally, and this could lead to divergence between 4.0 and 4.2 on how these metrics are tracked, which I think might be nice to avoid.

Comment by William Schultz (Inactive) [ 20/Sep/18 ]

After investigation, it looks like TickSource may actually be the right thing to use, since it is high precision and mock-able.

Comment by Andy Schwerin [ 17/Aug/18 ]

I think we could pull this off pretty easily. With a small amount of extra work, we could introduce a high-resolution time point type for the return type of the curTimeMicros64 alternative. That could certainly be done separately, though.

Comment by William Schultz (Inactive) [ 16/Aug/18 ]

One solution may be to just add a curTimeMicros method to ClockSource so that we can get the current time in microsecond resolution from a ClockSource. Then we can use a MockClockSource in the unit tests. We would need to change calls to the free function curTimeMicros64 in transaction_participant.cpp to get the time from the clock source. May be a bit tricky though since ClockSource tracks time in millisecond resolution.

Comment by William Schultz (Inactive) [ 16/Aug/18 ]

This may be the best fix for the linked build failures.

Generated at Thu Feb 08 04:43:50 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.