[SERVER-22428] Log read-after-optime timeouts Created: 01/Feb/16  Updated: 25/Jan/17  Resolved: 11/Feb/16

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 3.2.4, 3.3.2

Type: Task Priority: Major - P3
Reporter: Spencer Brody (Inactive) Assignee: Benety Goh
Resolution: Done Votes: 0
Labels: code-and-test
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-22392 "Operation timed out" when enableShar... Closed
Backwards Compatibility: Fully Compatible
Backport Completed:
Sprint: Repl 10 (02/19/16)
Participants:

 Description   

To debug issues with CSRS and read-after-optime it would be nice to have a way to see in the logs when an operation is timing out on waiting for a given optime to become visible.

Sample log message that will be logged:

2016-02-11T14:20:29.647-0500 c20010| 2016-02-11T14:20:29.646-0500 I COMMAND  [conn1] Command on database 
local timed out waiting for read concern to be satisfied. Command: { find: "oplog.rs", readConcern: { afterOpTime: { ts: Timestamp 14
55218425000|0, t: -1.0 } }, maxTimeMS: 5000.0 }



 Comments   
Comment by Githook User [ 18/Feb/16 ]

Author:

{u'username': u'benety', u'name': u'Benety Goh', u'email': u'benety@mongodb.com'}

Message: SERVER-22428 added logging for read-after-optime timeouts

(cherry picked from commit 4bbd6f51d9dc2e4de0a7d0824cc76bc8a514e156)
Branch: v3.2
https://github.com/mongodb/mongo/commit/27022caada3ea1f4f4633696f7b99d28a171bbf2

Comment by Githook User [ 11/Feb/16 ]

Author:

{u'username': u'benety', u'name': u'Benety Goh', u'email': u'benety@mongodb.com'}

Message: SERVER-22428 added logging for read-after-optime timeouts
Branch: master
https://github.com/mongodb/mongo/commit/4bbd6f51d9dc2e4de0a7d0824cc76bc8a514e156

Comment by Eric Milkie [ 02/Feb/16 ]

After discussing, the best option seems to be to add logging in Command::run() to log when read after optime returns a Time Exceeded error status. The verbosity will be 0 for config servers and 2 for all other cases.

Comment by Eric Milkie [ 02/Feb/16 ]

Since every query returns a specific timeout response if maxTimeMs expires, shouldn't the entity doing the query log something?
Also, I am guessing the query maxTimeMs is greater than the default slowms time, so the queries ought to be getting logged already. If they are not, that would indicate a problem with how query time in curop is accrued (the query timer apparently doesn't start until after the wait for optime); this would be something for the query team to investigate.

Comment by Scott Hernandez (Inactive) [ 01/Feb/16 ]

FYI: There is a metrics section for getLastError related to stuff like this:

"getLastError" : {
			"wtime" : {
				"num" : 0,
				"totalMillis" : 0
			},
			"wtimeouts" : NumberLong(0)
		}

Might be good to add some metrics for this too.

Comment by Spencer Brody (Inactive) [ 01/Feb/16 ]

Assigning to Eric to triage. Would be nice to get this backported to 3.2.3, if not 3.2.2

Generated at Thu Feb 08 04:00:23 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.