[SERVER-36294] Sum of currentActive and currentInactive not equal to currentOpen transactions Created: 26/Jul/18 Updated: 29/Oct/23 Resolved: 11/Aug/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Diagnostics |
| Affects Version/s: | None |
| Fix Version/s: | 4.0.2, 4.1.2 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Bruce Lucas (Inactive) | Assignee: | Nathan Louie |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | SWDI, global-txn-metrics | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||
| Issue Links: |
|
||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Backport Requested: |
v4.0
|
||||||||||||||||
| Sprint: | Repl 2018-08-13 | ||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
Small inaccuracies due to skew might be expected because the counter updates aren't atomic, but this looks to me to be much larger than that, as if the transaction is spending significant time between between corresponding updates to the counters. This was a simple test with 10 threads doing transactional updates to single independent documents:
|
| Comments |
| Comment by Githook User [ 16/Aug/18 ] | |
|
Author: {'username': 'will62794', 'email': 'william.schultz@mongodb.com', 'name': 'William Schultz'}Message: (cherry picked from commit 95bcc2148cc7ec3ca4f54e16c4a75bbc0f806ace) | |
| Comment by William Schultz (Inactive) [ 11/Aug/18 ] | |
|
Closing this ticket since functional change was pushed. We will add additional testing for this fix as part of | |
| Comment by Githook User [ 11/Aug/18 ] | |
|
Author: {'username': 'will62794', 'name': 'William Schultz', 'email': 'william.schultz@mongodb.com'}Message: | |
| Comment by William Schultz (Inactive) [ 26/Jul/18 ] | |
|
As per offline discussion, we will start by moving the initial call to "open" the transaction into the first unstash event, instead of in the initial call to beginOrContinueTxn. This will make it so that it is not possible for a transaction to appear "open" and neither active nor inactive. | |
| Comment by Bruce Lucas (Inactive) [ 26/Jul/18 ] | |
|
Using pmp on the repro above I find a lot of threads here:
If at this point in the code the transaction is "open" but neither "active" nor "inactive" that (rather than time between bumping open and bumping active) could account for the discrepancy. | |
| Comment by Bruce Lucas (Inactive) [ 26/Jul/18 ] | |
|
So, based on the fact that active+inactive is a relatively small fraction of open (see chart above), the amount of time spent outside of the active or inactive time period is considerably less than the open time period (time between incrementing open and decrementing it). If most of that time is between bumping open and bumping active (an assumption, maybe not founded), then the amount of time between incrementing open and incrementing active is a relatively large fraction of the time between incrementing open and decrementing it. The latter includes a round trip to the client (assuming we don't decrement open until we receive the commit from the client - correct?), so that seems like a rather long time indeed. (To put some numbers to it, from the chart, (avg active + avg inactive) / (avg open) = (0.637 + 0.976) / 7.488 = 0.215 = 22%, so about 78% of the "open" time we are neither "active" nor "inactive", by the current metrics.). | |
| Comment by William Schultz (Inactive) [ 26/Jul/18 ] | |
|
Yes, as I understand it, the snapshot allocation could be considered one of the earliest "resource acquisition" events of a transaction. This may or may not, however, eagerly start a storage engine transaction. milkie would be the better person to answer that question. In regards to the delay between the "opening" of the transaction and the first "unstashing" (snapshot allocation, etc.), I guess it depends what you consider a "large" delay. The call to beginOrContinueTxn, which is executed insideĀ here, basically just sets up some metadata related to the transaction, but no actual "work" of the transaction has been done yet. It is not until this point that we will actually execute the first "unstash" event, and then subsequently execute the first command of the transaction. I haven't really measured how long it generally takes between these two events (the "begin" and "unstash" event), but it seems to be mostly occupied with various command argument parsing logic. It's probably a little weird that we set up the transaction metadata indicating the start of a transaction before we parse all the necessary command arguments, but that's the way things have been structured. | |
| Comment by Bruce Lucas (Inactive) [ 26/Jul/18 ] | |
|
I think it would be reasonable to consider it started when the first resource is taken, and I think your comment implies that that is the snapshot? If so that seems like a reasonable place. However if I'm interpreting this all correctly it seems like there is a large delay between when we first get word of the new transaction and currently bump open, and when we actually start to do some work related to the transaction (create snapshot and wuow) and bump active. If so do we know why? Is there a performance problem here? If this is expected maybe this is something that needs to have some kind of metric? | |
| Comment by William Schultz (Inactive) [ 26/Jul/18 ] | |
|
Yes, the initial "unstash" operation is where, for example, we create the transaction's WriteUnitOfWork and pre-allocate a snapshot for the transaction, so I think this is a reasonable marker for the transaction start. It would be a bit more difficult to "start" the transaction when it first takes collection locks, since the taking of locks is deferred to the execution of the first command run on that transaction. So, in order to track that event we would probably have to inject some metrics tracking into specific command codepaths, which I think we would rather not do. | |
| Comment by Bruce Lucas (Inactive) [ 26/Jul/18 ] | |
|
Maybe the point at which the transaction first begins to hold resources, e.g. locks and cache content, is a reasonable marker for the start of the transaction? This will be the point at which the transaction begins to have a potentially significant impact on other operations, right? | |
| Comment by William Schultz (Inactive) [ 26/Jul/18 ] | |
|
bruce.lucas Yes, you are right. We currently treat Session::beginOrContinueTxn as the "opening" of the transaction, even though we don't initially set the transition to active until the first time it calls Session::unstashTransactionResources. I think we intuitively thought to "open" the transaction inside beginOrContinueTxn, but your point is definitely valid, in that it makes the active/inactive counters inconsistent. I think we should consider moving the initial call to "open" the transaction into "unstash", so that it aligns with the initial call to make the transaction "active". There is arguably some ambiguity about what constitutes the true "beginning" of a multi-document transaction internally, so I think it is reasonable to consider the fist "unstash" event as the beginning of the transaction, especially since I don't see any reason why there should be a significant gap of time between when we call beginOrContinueTxn and when we first call unstashTransactionResources. | |
| Comment by Bruce Lucas (Inactive) [ 26/Jul/18 ] | |
|
I think the problem here is skew between the open counter and the inactive or active counters, not between the active and inactive. For example when we increment open, at that point in the code we don't also immediately increment active or inactive (whichever is appropriate). Maybe a significant amount of time elapses before we finally increment active or inactive? | |
| Comment by Eric Milkie [ 26/Jul/18 ] | |
|
Currently between the active and inactive counters, they are always updated in concert in the code, one immediately after the other, so the only benefit to the check is that we haven't forgotten to update both together in the code. It doesn't check, however, if we missed a place where the counters should have been updated. | |
| Comment by Bruce Lucas (Inactive) [ 26/Jul/18 ] | |
|
I think it's not a bad thing that there are three separate counters. If there were only two, one of them could be inaccurate and we wouldn't know it. The redundancy provides a check on the accuracy. | |
| Comment by Eric Milkie [ 26/Jul/18 ] | |
|
I'm not sure why the code keeps three separate counters for active, inactive, and open, and updates them all independently. We could reduce the number of counters and calculate one from the other two. |