[SERVER-21366] Long-running transactions in MigrateStatus::apply Created: 09/Nov/15 Updated: 06/Dec/22 Resolved: 15/Dec/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 3.0.7 |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Bruce Lucas (Inactive) | Assignee: | [DO NOT USE] Backlog - Sharding Team |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Assigned Teams: |
Sharding
|
||||||||||||
| Sprint: | Sharding D (12/11/15) | ||||||||||||
| Participants: | |||||||||||||
| Description |
|
In a sharded scenario involving chunk moves, inserts, and deletes (via TTL) we observe WT "range of pinned ids" statistic growing, indicating long-running transactions. Added this patch to look for transactions taking >100ms.
Here was the longest, a transaction that ran for about 23 s:
addr2line:
This transaction was held open for the duration of applying deletes during the catchup phase of a chunk migration. |
| Comments |
| Comment by Kaloian Manassiev [ 15/Dec/17 ] |
|
Starting with version 3.2, this code no longer exists and I believe all the places performing long-running WT transactions without yielding the snapshot have been cleaned up. Since 3.0 will soon no longer be supported I am closing this as Won't Fix. |
| Comment by Daniel Pasette (Inactive) [ 23/Dec/15 ] |
|
Re-opening while we decide whether to go with this fix. |
| Comment by Githook User [ 18/Dec/15 ] |
|
Author: {u'username': u'monkey101', u'name': u'Dan Pasette', u'email': u'dan@mongodb.com'}Message: Revert " This reverts commit 934c5a5241edd01df270065831646d78ed5a80c1. |
| Comment by Githook User [ 18/Dec/15 ] |
|
Author: {u'username': u'monkey101', u'name': u'Dan Pasette', u'email': u'dan@mongodb.com'}Message: Revert " This reverts commit 7f3d0f2cfac80f49d8c7d8ec6aeaad7cae6d6cb0. |
| Comment by Githook User [ 18/Dec/15 ] |
|
Author: {u'username': u'monkey101', u'name': u'Dan Pasette', u'email': u'dan@mongodb.com'}Message: Revert " This reverts commit b306a90872fcf190462daaad1c3154d48c324ca9. |
| Comment by Githook User [ 15/Dec/15 ] |
|
Author: {u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}Message: |
| Comment by Githook User [ 08/Dec/15 ] |
|
Author: {u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}Message: |
| Comment by Githook User [ 08/Dec/15 ] |
|
Author: {u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}Message: |
| Comment by Bruce Lucas (Inactive) [ 10/Nov/15 ] |
|
Testing verifies that this patch eliminates the long-running transactions. However a problem remains: a chunk move can still get stuck, apparently indefinitely. Following experiment sheds some light:
I think this comment explains why: we can't determine whether deletes relate to the chunk in flight, so we transfer all deletes from the donor shard to the recipient shard. So as long as there is a steady stream of deletes, whether they relate to the chunk in flight or not, the migration will not complete. The migration however can complete if the deletes stop (for example, the 60 second interval between TTL passes), and if the recipient shard is able to process the incoming delete ops quickly and determine that they are not present on the recipient shard (because they relate to chunks on the donor shard, and not to chunks on this shard, nor the chunk in flight). |
| Comment by Bruce Lucas (Inactive) [ 09/Nov/15 ] |
|
Thanks for the quick response Kal. Patch applied and running. I'll get back to you in the morning with the results. One question though: does this have the same effect as your initial suggestion of committing the transaction after every document? Would it make sense to do it every n documents (e.g. 100) instead? If there is a high rate of deletes I think this portion of the moveChunk may struggle to keep up, and in the TTL case in fact it may not actually doing anything for each document because they may have been deleted already on the to-shard by the TTL monitor, so anything we could do to keep that loop fast should help. Same question about reacquiring the lock for each document. |
| Comment by Kaloian Manassiev [ 09/Nov/15 ] |
|
bruce.lucas@mongodb.com, would it be possible to apply the attached patch to 3.0 and run it with your repro? |
| Comment by Randolph Tan [ 09/Nov/15 ] |
|
kaloian.manassiev Talked with jason.rassi about this. Deleting over a single document over _id should be fast (unless there's no _id index) and no yielding makes the logic easier to understand. In addition, the loop is already re-acquiring the collection lock for every iteration, so it's basically yielding already. |
| Comment by Kaloian Manassiev [ 09/Nov/15 ] |
|
renctan, do you know why do we need to do manual yielding for deleteObjects when those documents should be logically independent? bruce.lucas, in your repro would it be possible to add a call to txn->recoveryUnit()->commitAndRestart() at the end of each iteration of the loop and see if this helps. This should keep the pinned transaction range smaller. |