[SERVER-47767] read_operations_during_rollback.js should make sure that 'GetMoreHangBeforeReadLock' failpoint is hit on correct collection Created: 24/Apr/20  Updated: 29/Oct/23  Resolved: 27/Apr/20

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 4.7.0

Type: Bug Priority: Major - P3
Reporter: William Schultz (Inactive) Assignee: William Schultz (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-44068 Investigate the slowest sections of R... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Repl 2020-05-04
Participants:

 Description   

The read_operations_during_rollback.js test waits for us to hit a GetMoreHangBeforeReadLock failpoint here. It is possible this failpoint is triggered on a collection we don't expect (e.g. local.oplog.rs), causing us to continue erroneously, before this getMore command is hung. This causes this getMore to succeed, because we ended up starting the rollback before the getMore had started. We should make sure that the getMore command is hung on the right collection before proceeding.



 Comments   
Comment by William Schultz (Inactive) [ 27/Apr/20 ]

Omitting backport for now. If we see this bug appear on older branches then we can backport at that point.

Comment by Githook User [ 27/Apr/20 ]

Author:

{'name': 'William Schultz', 'email': 'william.schultz@mongodb.com', 'username': 'will62794'}

Message: SERVER-47767 Make sure read_operations_during_rollback.js waits for failpoint on the correct collection
Branch: master
https://github.com/mongodb/mongo/commit/e55c04b738ee1851bf27d0eb770a473b0e0e7867

Comment by William Schultz (Inactive) [ 24/Apr/20 ]

This turned up in a patch build while working on SERVER-44068. In that failure, we can see that it appears plausible that the getMore that triggered the checkLog condition (see 'conn35') was on the oplog, not on the test collection:

[js_test:read_operations_during_rollback] 2020-04-22T15:53:25.356+0000 Starting JSTest jstests/replsets/read_operations_during_rollback.js...
[js_test:read_operations_during_rollback] 2020-04-22T15:53:27.322+0000 d21030| 2020-04-22T15:53:27.322+00:00 I  NETWORK  51800   [conn35] "client metadata","attr":{"remote":"10.122.23.22:60442","client":"conn35","doc":{"application":{"name":"OplogFetcher"},"driver":{"name":"MongoDB Internal Client","version":"4.5.0-846-ge7165a3-patch-5ea063c0e3c33138e956024b"},"os":{"type":"Linux","name":"Red Hat Enterprise Linux Server release 6.2 (Santiago)","architecture":"x86_64","version":"Kernel 2.6.32-220.el6.x86_64"}}}
[js_test:read_operations_during_rollback] 2020-04-22T15:53:27.988+0000 [jsTest] Setting fail point GetMoreHangBeforeReadLock
[js_test:read_operations_during_rollback] 2020-04-22T15:53:27.988+0000 d21030| 2020-04-22T15:53:27.988+00:00 W  CONTROL  23829   [conn2] "Set failpoint","attr":{"failPointName":"GetMoreHangBeforeReadLock","failPoint":{"mode":1,"data":{},"timesEntered":0}}
[js_test:read_operations_during_rollback] 2020-04-22T15:53:27.992+0000 d21030| 2020-04-22T15:53:27.990+00:00 I  QUERY    20477   [conn35] "GetMoreHangBeforeReadLock fail point enabled. Blocking until fail point is disabled."
[js_test:read_operations_during_rollback] 2020-04-22T15:53:28.027+0000 d21030| 2020-04-22T15:53:28.027+00:00 I  QUERY    20477   [conn26] "GetMoreHangBeforeReadLock fail point enabled. Blocking until fail point is disabled."
[js_test:read_operations_during_rollback] 2020-04-22T15:53:28.131+0000 d21030| 2020-04-22T15:53:28.131+00:00 I  COMMAND  51803   [conn35] "Slow query","attr":{"type":"command","ns":"local.$cmd","appName":"OplogFetcher","command":{"getMore":1043447821946114879,"collection":"oplog.rs","batchSize":13981010,"maxTimeMS":50,"term":1,"lastKnownCommittedOpTime":{"ts":{"$timestamp":{"t":1587570807,"i":2}},"t":1},"$db":"local","$replData":1,"$oplogQueryData":1,"$readPreference":{"mode":"secondaryPreferred"},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1587570807,"i":2}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}}},"cursorid":1043447821946114879,"numYields":0,"ok":0,"errMsg":"operation was interrupted","errName":11602,"errCode":11602,"reslen":752,"locks":{},"protocol":"op_msg","durationMillis":142}
[js_test:read_operations_during_rollback] 2020-04-22T15:53:28.407+0000 checkLog ignoring failure: Error: error doing query: failed: network error while attempting to run command 'getLog' on host 'ip-10-122-23-22:21020'
[js_test:read_operations_during_rollback] 2020-04-22T15:53:28.715+0000 [jsTest] Clearing fail point GetMoreHangBeforeReadLock
[js_test:read_operations_during_rollback] 2020-04-22T15:53:28.715+0000 d21030| 2020-04-22T15:53:28.715+00:00 W  CONTROL  23829   [conn71] "Set failpoint","attr":{"failPointName":"GetMoreHangBeforeReadLock","failPoint":{"mode":0,"data":{},"timesEntered":4}}
[js_test:read_operations_during_rollback] 2020-04-22T15:53:28.716+0000 [jsTest] ----
[js_test:read_operations_during_rollback] 2020-04-22T15:53:28.716+0000 [jsTest] Wait for 'getMore' thread to join.
[js_test:read_operations_during_rollback] 2020-04-22T15:53:28.716+0000 [jsTest] ----
[js_test:read_operations_during_rollback] 2020-04-22T15:53:28.716+0000 
[js_test:read_operations_during_rollback] 2020-04-22T15:53:28.716+0000 assert: [0] != [252] are not equal : encountered an error in the parallel shell
[js_test:read_operations_during_rollback] 2020-04-22T15:53:28.716+0000 doassert@src/mongo/shell/assert.js:20:14
[js_test:read_operations_during_rollback] 2020-04-22T15:53:28.716+0000 assert.eq@src/mongo/shell/assert.js:179:9
[js_test:read_operations_during_rollback] 2020-04-22T15:53:28.716+0000 startParallelShell/<@src/mongo/shell/servers_misc.js:183:13
[js_test:read_operations_during_rollback] 2020-04-22T15:53:28.717+0000 @jstests/replsets/read_operations_during_rollback.js:62:1
[js_test:read_operations_during_rollback] 2020-04-22T15:53:28.717+0000 @jstests/replsets/read_operations_during_rollback.js:5:2
[js_test:read_operations_during_rollback] 2020-04-22T15:53:28.717+0000 uncaught exception: Error: [0] != [252] are not equal : encountered an error in the parallel shell :
[js_test:read_operations_during_rollback] 2020-04-22T15:53:28.717+0000 doassert@src/mongo/shell/assert.js:20:14
[js_test:read_operations_during_rollback] 2020-04-22T15:53:28.717+0000 assert.eq@src/mongo/shell/assert.js:179:9
[js_test:read_operations_during_rollback] 2020-04-22T15:53:28.717+0000 startParallelShell/<@src/mongo/shell/servers_misc.js:183:13
[js_test:read_operations_during_rollback] 2020-04-22T15:53:28.717+0000 @jstests/replsets/read_operations_during_rollback.js:62:1
[js_test:read_operations_during_rollback] 2020-04-22T15:53:28.717+0000 @jstests/replsets/read_operations_during_rollback.js:5:2
[js_test:read_operations_during_rollback] 2020-04-22T15:53:28.717+0000 failed to load: jstests/replsets/read_operations_during_rollback.js

Generated at Thu Feb 08 05:15:10 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.