[SERVER-22428] Log read-after-optime timeouts Created: 01/Feb/16 Updated: 25/Jan/17 Resolved: 11/Feb/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | 3.2.4, 3.3.2 |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Spencer Brody (Inactive) | Assignee: | Benety Goh |
| Resolution: | Done | Votes: | 0 |
| Labels: | code-and-test | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Backport Completed: | |||||||||
| Sprint: | Repl 10 (02/19/16) | ||||||||
| Participants: | |||||||||
| Description |
|
To debug issues with CSRS and read-after-optime it would be nice to have a way to see in the logs when an operation is timing out on waiting for a given optime to become visible. Sample log message that will be logged:
|
| Comments |
| Comment by Githook User [ 18/Feb/16 ] | |||||||
|
Author: {u'username': u'benety', u'name': u'Benety Goh', u'email': u'benety@mongodb.com'}Message: (cherry picked from commit 4bbd6f51d9dc2e4de0a7d0824cc76bc8a514e156) | |||||||
| Comment by Githook User [ 11/Feb/16 ] | |||||||
|
Author: {u'username': u'benety', u'name': u'Benety Goh', u'email': u'benety@mongodb.com'}Message: | |||||||
| Comment by Eric Milkie [ 02/Feb/16 ] | |||||||
|
After discussing, the best option seems to be to add logging in Command::run() to log when read after optime returns a Time Exceeded error status. The verbosity will be 0 for config servers and 2 for all other cases. | |||||||
| Comment by Eric Milkie [ 02/Feb/16 ] | |||||||
|
Since every query returns a specific timeout response if maxTimeMs expires, shouldn't the entity doing the query log something? | |||||||
| Comment by Scott Hernandez (Inactive) [ 01/Feb/16 ] | |||||||
|
FYI: There is a metrics section for getLastError related to stuff like this:
Might be good to add some metrics for this too. | |||||||
| Comment by Spencer Brody (Inactive) [ 01/Feb/16 ] | |||||||
|
Assigning to Eric to triage. Would be nice to get this backported to 3.2.3, if not 3.2.2 |