[SERVER-53623] Use a static variable for CancelationToken::Uncancellable() Created: 06/Jan/21  Updated: 10/Mar/21  Resolved: 10/Mar/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Minor - P4
Reporter: Benjamin Caimano (Inactive) Assignee: Tyler Seip (Inactive)
Resolution: Won't Do Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Sprint: Service Arch 2021-03-22
Participants:

 Description   

We create a new CancelationToken for each call to CancelationToken::uncancelable() here. Unfortunately, this makes each invocation take more time than is necessary. We should shift the function to construct an uncancelable state once and reuse it for each invocation.

Benchmark without static variable

Show all

2021-01-06 21:27:03
Running install/bin/cancelation_bm
Run on (16 X 2299.97 MHz CPU s)
CPU Caches:
  L1 Data 32K (x8)
  L1 Instruction 32K (x8)
  L2 Unified 256K (x8)
  L3 Unified 46080K (x1)
Load Average: 0.29, 0.42, 0.34
--------------------------------------------------------------------------------------
Benchmark                                            Time             CPU   Iterations
--------------------------------------------------------------------------------------
BM_create_single_token_from_source                14.1 ns         14.1 ns     49630229
BM_uncancelable_token_ctor                         172 ns          172 ns      4077472
BM_cancel_tokens_from_single_source/1             1507 ns         1509 ns       464189
BM_cancel_tokens_from_single_source/10            1998 ns         2000 ns       350055
BM_cancel_tokens_from_single_source/100           6985 ns         6982 ns       100347
BM_cancel_tokens_from_single_source/1000         59314 ns        59304 ns        11813
BM_cancel_tokens_from_single_source/10000       864175 ns       863888 ns          915
BM_cancel_tokens_from_single_source/100000     9274757 ns      9270859 ns           73
BM_cancel_tokens_from_single_source/1000000  127268434 ns    127259349 ns            6
BM_check_if_token_from_source_canceled           0.371 ns        0.371 ns   1000000000
BM_cancelation_source_from_token_ctor              934 ns          934 ns      1161130
BM_cancelation_source_default_ctor                 187 ns          187 ns      3745716
BM_ranged_depth_cancelation_hierarchy/1           1788 ns         1789 ns       391206
BM_ranged_depth_cancelation_hierarchy/10          4735 ns         4736 ns       148036
BM_ranged_depth_cancelation_hierarchy/100        36921 ns        36914 ns        18951
BM_ranged_depth_cancelation_hierarchy/1000      497926 ns       497745 ns         1405

Benchmark with static variable

Show all

2021-01-06 21:23:52
Running install/bin/cancelation_bm
Run on (16 X 2299.97 MHz CPU s)
CPU Caches:
  L1 Data 32K (x8)
  L1 Instruction 32K (x8)
  L2 Unified 256K (x8)
  L3 Unified 46080K (x1)
Load Average: 0.88, 0.43, 0.32
--------------------------------------------------------------------------------------
Benchmark                                            Time             CPU   Iterations
--------------------------------------------------------------------------------------
BM_create_single_token_from_source                14.1 ns         14.1 ns     49627405
BM_uncancelable_token_ctor                        14.1 ns         14.1 ns     49631845
BM_cancel_tokens_from_single_source/1             1494 ns         1496 ns       468025
BM_cancel_tokens_from_single_source/10            1985 ns         1988 ns       352574
BM_cancel_tokens_from_single_source/100           6950 ns         6957 ns       100612
BM_cancel_tokens_from_single_source/1000         59051 ns        59100 ns        11844
BM_cancel_tokens_from_single_source/10000       843504 ns       843279 ns          928
BM_cancel_tokens_from_single_source/100000     9178104 ns      9178246 ns           78
BM_cancel_tokens_from_single_source/1000000  126632492 ns    126614691 ns            6
BM_check_if_token_from_source_canceled           0.371 ns        0.371 ns   1000000000
BM_cancelation_source_from_token_ctor              928 ns          928 ns      1161740
BM_cancelation_source_default_ctor                 190 ns          190 ns      3683515
BM_ranged_depth_cancelation_hierarchy/1           1771 ns         1773 ns       394449
BM_ranged_depth_cancelation_hierarchy/10          4707 ns         4709 ns       148645
BM_ranged_depth_cancelation_hierarchy/100        38334 ns        38387 ns        18246
BM_ranged_depth_cancelation_hierarchy/1000      478853 ns       478903 ns         1462



 Comments   
Comment by Matthew Saltz (Inactive) [ 10/Mar/21 ]

We can revisit this if it's ever a problem

Comment by Billy Donahue [ 10/Mar/21 ]

Yes.

It would be interesting if the BM_uncancelable_token_ctor benchmark showed the cost as the # of threads rises. Just to put some handwaving numbers on this, experiments benchmarking Status Ref/Unref (a similar refcounted type) showed that the cost of the intrusive_ptr copy goes up by about 40ns just by adding a second thread. Adding more threads it gets worse from there in a rougly linear way. Speeding up from 172ns to 14ns in a single threaded case is nice, but those gains would be quickly wiped out by adding a handful of competing threads. If contention adds 40ns/thread (which is seems to in the Status case), then just 5 competing threads would make the proposed static variable a pessimization.

https://docs.google.com/spreadsheets/d/1BLHZhlWmt-DPNvM6KDd9c3DyGrPqWRVbrep8LAZg7KM/edit#gid=1897039357

 

Comment by Matthew Saltz (Inactive) [ 10/Mar/21 ]

Yeah I just meant nonstandard as in uncommon, sorry for the confusion there. Okay I say we "Won't do" this and come back to it later if it ever ends up being a bottleneck for something

Comment by Billy Donahue [ 10/Mar/21 ]

I just want to clarify that thread_local is standard C++11. If you mean nonstandard more colloquially I agree. It can be unexpected.

If it's thread_local, all references to that variable are invalid when the thread that created it ends. If it's going to be passed between threads we'd have problems.

I agree with leaving it alone and speeding up the "uncancelable" value's initializer if it's too slow, without trying to share instances of it.

Comment by Matthew Saltz (Inactive) [ 10/Mar/21 ]

This is probably a terrible idea but making it a thread_local would mean a given thread would only have to create it once, and then there would also be no contention across threads. So in an executor thread for example it'd allow for a lot of savings in constructing the object, but a one-off thread might have to go ahead and make it for itself. But given that using thread_local is kind of confusing and not standard I'd almost rather just leave it in its current less performant state - it's not like we're creating these in a tight loop.

My first implementation by the way allowed for a nullptr state instead of the way it is now - it's doable but just makes the CancelationToken implementation a little messier.

Comment by Billy Donahue [ 04/Mar/21 ]

Something to consider. maybe the intrusive_ptr refcount contention involved in using a single sentinel will be expensive, maybe more expensive than the construction costs because the contention affects all threads using any uncancelable token.

Maybe a nullptr state intrusive_ptr can be used for the "uncancelable" so there's no contention.

Generated at Thu Feb 08 05:31:25 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.