[SERVER-39154] Create a fail point to make it so if a LockManager lock cannot be acquired immediately, then the operation fails Created: 23/Jan/19 Updated: 29/Oct/23 Resolved: 05/Mar/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Concurrency, Testing Infrastructure |
| Affects Version/s: | None |
| Fix Version/s: | 4.1.9 |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Judah Schvimer | Assignee: | Vesselina Ratcheva (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||
| Sprint: | Repl 2019-02-25, Repl 2019-03-11 | ||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Description |
|
This is necessary for prepare transaction testing so that DDL ops do not block forever. |
| Comments |
| Comment by Githook User [ 05/Mar/19 ] |
|
Author: {'name': 'Vesselina Ratcheva', 'email': 'vesselina.ratcheva@10gen.com', 'username': 'vessy-mongodb'}Message: |
| Comment by Eric Milkie [ 25/Jan/19 ] |
|
If you limit the failpoint to apply to only MODE_S and MODE_X acquisitions for all resource types, that limits the amount of unintentional failures you might see. I say, try it and see if it works. |
| Comment by Max Hirschhorn [ 25/Jan/19 ] |
To clarify one point that may be getting missed here, the index build wouldn't "hang for 5 seconds" in the kind of test the initial sync fuzzer would generate. It would hang forever because there's a single thread of execution on the client and it would have run
This failpoint only needs to apply for MODE_S and MODE_X lock acquisitions because MODE_IS and MODE_IX would be compatible with the locks held by a transaction. |
| Comment by Judah Schvimer [ 25/Jan/19 ] |
|
max.hirschhorn suggested this instead of using maxTimeMS. I believe that was to allow the operation to take as long as it wants for everything but the lock acquisition. If an index build with no lock contention would take 45 seconds, we want that to succeed, but if an index build hangs for 5 seconds waiting on a lock, we'd want that to fail. |
| Comment by Eric Milkie [ 25/Jan/19 ] |
|
Wouldn't it be easier to add a timeout to all DDL operations (either by modifying the fuzzer, or by editing the timeout on the server when a failpoint was activated)? I am concerned that you might find lots of things would break if you make all lock acquisitions fail immediately if the resource is contended. |
| Comment by Judah Schvimer [ 25/Jan/19 ] |
|
Once a transaction is in prepare it no longer has a time limit. Since the fuzzer is single-threaded, if it does a DDL op before committing or aborting the prepared transaction, it will block on the DDL op and never try to commit or abort the transaction. |
| Comment by Eric Milkie [ 25/Jan/19 ] |
|
Why would DDL ops ever block forever? I thought all transactions had a time limit to eventually commit or abort. |