[SERVER-30283] PingStats::hit() should not set _numFailuresSinceLastStart to integer max Created: 24/Jul/17  Updated: 30/Oct/23  Resolved: 09/Jan/18

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 3.6.4, 3.7.1

Type: Bug Priority: Major - P3
Reporter: William Schultz (Inactive) Assignee: William Schultz (Inactive)
Resolution: Fixed Votes: 0
Labels: neweng, pull-request
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File Screen Shot 2017-12-21 at 12.48.09 AM.png     PNG File Screen Shot 2017-12-21 at 12.48.57 AM.png     PNG File Screen Shot 2017-12-21 at 12.49.58 AM.png     PNG File Screen Shot 2017-12-21 at 12.51.25 AM.png    
Issue Links:
Backports
Depends
Related
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v3.6, v3.4
Participants:
Linked BF Score: 46

 Description   

PingStats, defined in ToplogyCoordinatorImpl, is a data structure used to track statistics about replication heartbeat messages. It has a variable called _numFailuresSinceLastStart, which keeps track of failed heartbeat attempts. When a good heartbeat response comes back, this variable is set to std::numeric_limits<int>::max(), presumably as an easy way for this value to always be larger than others in certain comparisons. In the case that we actually get back a good heartbeat response (PingStats::hit) and then a failed heartbeat from the same node (PingStats::miss), we will actually overflow this _numFailuresSinceLastStart variable. We should not use the integer max as a way to determine that we received a good heartbeat response. Creating a separate field in PingStats to explicitly track this would probably be a simple approach.



 Comments   
Comment by Githook User [ 26/Feb/18 ]

Author:

{'email': 'william.schultz@mongodb.com', 'name': 'William Schultz', 'username': 'will62794'}

Message: SERVER-30283 Clean up TopologyCoordinator::PingStats interface

(cherry picked from commit a76bbc3d1b426954312acb4d03775350cda83733)
Branch: v3.6
https://github.com/mongodb/mongo/commit/2ab75de9aae35d4d6bbb8adadf21fc596c116d5b

Comment by Githook User [ 26/Feb/18 ]

Author:

{'email': 'cramaechi@me.com', 'name': 'Chibuikem Amaechi', 'username': 'cramaechi'}

Message: SERVER-30283 Fix integer overflow in PingStats::hit()

Closes #1196

Signed-off-by: William Schultz <william.schultz@mongodb.com>
(cherry picked from commit 6e3b0deb789ec1e9bbdb78f42547278fb7b6b8f0)
Branch: v3.6
https://github.com/mongodb/mongo/commit/ffe3e6270c64cf0b0edcb84983af6d0b604a37f6

Comment by Githook User [ 25/Jan/18 ]

Author:

{'name': 'William Schultz', 'email': 'william.schultz@mongodb.com', 'username': 'will62794'}

Message: SERVER-30283 Fix typo in PingStats comment
Branch: master
https://github.com/mongodb/mongo/commit/d6e53ee6a3f08a949e42c69e781c817bb941f92d

Comment by Githook User [ 25/Jan/18 ]

Author:

{'name': 'William Schultz', 'email': 'william.schultz@mongodb.com', 'username': 'will62794'}

Message: SERVER-30283 Clean up TopologyCoordinator::PingStats interface
Branch: master
https://github.com/mongodb/mongo/commit/a76bbc3d1b426954312acb4d03775350cda83733

Comment by Githook User [ 09/Jan/18 ]

Author:

{'email': 'cramaechi@me.com', 'name': 'Chibuikem Amaechi', 'username': 'cramaechi'}

Message: SERVER-30283 Fix integer overflow in PingStats::hit()

Closes #1196

Signed-off-by: William Schultz <william.schultz@mongodb.com>
Branch: master
https://github.com/mongodb/mongo/commit/6e3b0deb789ec1e9bbdb78f42547278fb7b6b8f0

Comment by Chibuikem Amaechi [ 23/Dec/17 ]

Hi Kelsey,

No prob. Thank you!

Comment by Kelsey Schubert [ 23/Dec/17 ]

Thanks for the pull request, cramaechi! Many developers are on vacation this coming week for the holidays, but we'll take a look soon after.

Comment by Chibuikem Amaechi [ 22/Dec/17 ]

Hi Spencer,

I have opened a pull request and looking forward to your feedback!

Best Regards,
Chibuikem Amaechi

Comment by Chibuikem Amaechi [ 21/Dec/17 ]

Hi Spencer,

I ran into a some problems trying to make certain comparisons work the same with the hasFailed() method.

Here are the comparisons:

mongo/src/mongo/db/repl/topology_coordinator.cpp

To fix this problem, I first changed the method name hasFailed() to noMoreRetries(), and then made it return true if _numFailuresSinceLastStart was greater than kMaxHeartBeatRetries or there was at least one good heartbeat received since the last call to start(). I adjusted all the comparisons to use noMoreRetries() except for the comparison in lines 1028 - 1029, to which I added the following: hbStats.getGoodHeartbeatReceived() == false.

If my explanation is not clear enough, I can post screenshots of my work to give you a better idea of what I am proposing.

Looking forward to hearing your thoughts.

Best Regards,
Chibuikem. A

Comment by Chibuikem Amaechi [ 19/Dec/17 ]

Great. I will open a pull request by the end of today!

Comment by Spencer Brody (Inactive) [ 19/Dec/17 ]

cramaechi, please feel free to put together a pull request and we'd be happy to take a look!

Comment by Spencer Brody (Inactive) [ 18/Dec/17 ]

Seems like a reasonable approach, although instead of adding an accessor for _goodHeartbeatReceived, I would actually just add a method "hasFailed()" that returns true if the _numFailuresSinceLastStart is greater than kMaxHeartbeatRetries and there's been no good heartbeat received since the last start. Then you could also remove the accessor for _numFailuresSinceLastStart and tighten up the interface for PingStats.

EDIT: Actually this log message might make it hard to get rid of the getNumFailuresSinceLastStart() accessor method, but you could probably still remove all the other callers in favor of using hasFailed()

Comment by Kelsey Schubert [ 18/Dec/17 ]

spencer, would you please review cramaechi's proposed change?

Thanks,
Kelsey

Comment by Chibuikem Amaechi [ 17/Dec/17 ]

Hi William,

I am a new server contributor and would like to be assigned this ticket.

My proposed change would involve creating a separate field called _goodHeartbeatReceived in PingStats to flag when a good heartbeat was received, and then adding an accessor function to go along with this new field. The aforementioned changes would allow us to remove the assignment statement (_numFailuresSinceLastStart = std::numeric_limits<int>::max()) causing the potential overflow. Modifications would then be made to certain comparisons that referenced the _numFailuresSinceLastStart field to accommodate my new changes.

Please share your thoughts.

Best Regards,
Chibuikem Amaechi

Generated at Thu Feb 08 04:23:16 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.