[SERVER-76441] Include pps_allowance_exceeded in serverstatus Created: 22/Apr/23  Updated: 24/Jan/24

Status: Blocked
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Kelsey Schubert Assignee: Backlog - Service Architecture
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Related
Assigned Teams:
Service Arch
Backport Requested:
v7.3
Sprint: Service Arch 2023-08-07, Service Arch 2023-08-21, Service Arch 2023-09-04, Service Arch 2023-09-18, Service Arch 2023-10-02
Participants:
Case:

 Description   

On environments that produce stats that include pps_allowance_exceeded in the output of the following command

ethtool -S eth0

capture it in serverStatus and ensure that it is included in FTDC.

We should also review the output of this command and include other metrics that we believe would be helpful diagnosing network issues.

cc cailin.nelson@mongodb.com, nikki.sloan@mongodb.com



 Comments   
Comment by Cailin Nelson [ 22/Apr/23 ]

Note: Azure also has some similar statistics, see Application Usage on https://learn.microsoft.com/en-us/azure/virtual-network/accelerated-networking-how-it-works. At least the vf_tx_dropped might be interesting?

Comment by Kelsey Schubert [ 22/Apr/23 ]

Requesting backports for consideration, but we should be careful to stage them (e.g. make sure it's released to 7.0 before backporting to 6.0).

Comment by Kelsey Schubert [ 22/Apr/23 ]

This stat exists on AWS. In Azure you can run the tool but these stats are not available. GCP does not return any stats with ethtool.

Sample output on AWS (run from a spawnhost in evergreen that I spun up for this purpose so it's pretty boring):

$ ethtool -S eth0
NIC statistics:
     tx_timeout: 0
     suspend: 0
     resume: 0
     wd_expired: 0
     interface_up: 1
     interface_down: 0
     admin_q_pause: 0
     bw_in_allowance_exceeded: 0
     bw_out_allowance_exceeded: 0
     pps_allowance_exceeded: 0
     conntrack_allowance_exceeded: 0
     linklocal_allowance_exceeded: 0
     queue_0_tx_cnt: 409
     queue_0_tx_bytes: 41568
     queue_0_tx_queue_stop: 0
     queue_0_tx_queue_wakeup: 0
     queue_0_tx_dma_mapping_err: 0
     queue_0_tx_linearize: 0
     queue_0_tx_linearize_failed: 0
     queue_0_tx_napi_comp: 2573
     queue_0_tx_tx_poll: 2580
     queue_0_tx_doorbells: 408
     queue_0_tx_prepare_ctx_err: 0
     queue_0_tx_bad_req_id: 0
     queue_0_tx_llq_buffer_copy: 1
     queue_0_tx_missed_tx: 0
     queue_0_tx_unmask_interrupt: 2573
     queue_0_rx_cnt: 9671
     queue_0_rx_bytes: 11811608
     queue_0_rx_rx_copybreak_pkt: 1472
     queue_0_rx_csum_good: 8501
     queue_0_rx_refil_partial: 0
     queue_0_rx_bad_csum: 0
     queue_0_rx_page_alloc_fail: 0
     queue_0_rx_skb_alloc_fail: 0
     queue_0_rx_dma_mapping_err: 0
     queue_0_rx_bad_desc_num: 0
     queue_0_rx_bad_req_id: 0
     queue_0_rx_empty_rx_ring: 0
     queue_0_rx_csum_unchecked: 963
     queue_0_rx_lpc_warm_up: 0
     queue_0_rx_lpc_full: 0
     queue_0_rx_lpc_wrong_numa: 0
     queue_1_tx_cnt: 6124
     queue_1_tx_bytes: 398222
     queue_1_tx_queue_stop: 0
     queue_1_tx_queue_wakeup: 0
     queue_1_tx_dma_mapping_err: 0
     queue_1_tx_linearize: 0
     queue_1_tx_linearize_failed: 0
     queue_1_tx_napi_comp: 5224
     queue_1_tx_tx_poll: 5224
     queue_1_tx_doorbells: 6124
     queue_1_tx_prepare_ctx_err: 0
     queue_1_tx_bad_req_id: 0
     queue_1_tx_llq_buffer_copy: 0
     queue_1_tx_missed_tx: 0
     queue_1_tx_unmask_interrupt: 5224
     queue_1_rx_cnt: 75
     queue_1_rx_bytes: 36821
     queue_1_rx_rx_copybreak_pkt: 47
     queue_1_rx_csum_good: 75
     queue_1_rx_refil_partial: 0
     queue_1_rx_bad_csum: 0
     queue_1_rx_page_alloc_fail: 0
     queue_1_rx_skb_alloc_fail: 0
     queue_1_rx_dma_mapping_err: 0
     queue_1_rx_bad_desc_num: 0
     queue_1_rx_bad_req_id: 0
     queue_1_rx_empty_rx_ring: 0
     queue_1_rx_csum_unchecked: 0
     queue_1_rx_lpc_warm_up: 0
     queue_1_rx_lpc_full: 0
     queue_1_rx_lpc_wrong_numa: 0
     queue_2_tx_cnt: 345
     queue_2_tx_bytes: 42216
     queue_2_tx_queue_stop: 0
     queue_2_tx_queue_wakeup: 0
     queue_2_tx_dma_mapping_err: 0
     queue_2_tx_linearize: 0
     queue_2_tx_linearize_failed: 0
     queue_2_tx_napi_comp: 4283
     queue_2_tx_tx_poll: 4466
     queue_2_tx_doorbells: 344
     queue_2_tx_prepare_ctx_err: 0
     queue_2_tx_bad_req_id: 0
     queue_2_tx_llq_buffer_copy: 9
     queue_2_tx_missed_tx: 0
     queue_2_tx_unmask_interrupt: 4283
     queue_2_rx_cnt: 66364
     queue_2_rx_bytes: 94783931
     queue_2_rx_rx_copybreak_pkt: 184
     queue_2_rx_csum_good: 66364
     queue_2_rx_refil_partial: 0
     queue_2_rx_bad_csum: 0
     queue_2_rx_page_alloc_fail: 0
     queue_2_rx_skb_alloc_fail: 0
     queue_2_rx_dma_mapping_err: 0
     queue_2_rx_bad_desc_num: 0
     queue_2_rx_bad_req_id: 0
     queue_2_rx_empty_rx_ring: 0
     queue_2_rx_csum_unchecked: 0
     queue_2_rx_lpc_warm_up: 0
     queue_2_rx_lpc_full: 0
     queue_2_rx_lpc_wrong_numa: 0
     queue_3_tx_cnt: 1532
     queue_3_tx_bytes: 209990
     queue_3_tx_queue_stop: 0
     queue_3_tx_queue_wakeup: 0
     queue_3_tx_dma_mapping_err: 0
     queue_3_tx_linearize: 0
     queue_3_tx_linearize_failed: 0
     queue_3_tx_napi_comp: 3045
     queue_3_tx_tx_poll: 3252
     queue_3_tx_doorbells: 1493
     queue_3_tx_prepare_ctx_err: 0
     queue_3_tx_bad_req_id: 0
     queue_3_tx_llq_buffer_copy: 194
     queue_3_tx_missed_tx: 0
     queue_3_tx_unmask_interrupt: 3045
     queue_3_rx_cnt: 42406
     queue_3_rx_bytes: 60719172
     queue_3_rx_rx_copybreak_pkt: 239
     queue_3_rx_csum_good: 42406
     queue_3_rx_refil_partial: 0
     queue_3_rx_bad_csum: 0
     queue_3_rx_page_alloc_fail: 0
     queue_3_rx_skb_alloc_fail: 0
     queue_3_rx_dma_mapping_err: 0
     queue_3_rx_bad_desc_num: 0
     queue_3_rx_bad_req_id: 0
     queue_3_rx_empty_rx_ring: 0
     queue_3_rx_csum_unchecked: 0
     queue_3_rx_lpc_warm_up: 0
     queue_3_rx_lpc_full: 0
     queue_3_rx_lpc_wrong_numa: 0
     ena_admin_q_aborted_cmd: 0
     ena_admin_q_submitted_cmd: 34
     ena_admin_q_completed_cmd: 34
     ena_admin_q_out_of_space: 0
     ena_admin_q_no_completion: 0

Generated at Thu Feb 08 06:32:42 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.