Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-88517

Characterize load that would trigger server failure due to many txns waiting for timeout

    • Type: Icon: Task Task
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None
    • Cluster Scalability

      A race condition in an aggregation that is in a transaction that causes a cursor to be killed can cause a transaction to commit without including all participants that are added by other participants (PM-2844). Specifically, when AsyncResultsMerger (ARM) sends a getMore request to a shard, and then subsequently the cursor is killed (for example a results $limit is reached) and the transaction committed, ARM does not wait for any response from the shards that would indicate that participants have been added. This situation allows the transaction to commit before the response of added participants is propagated to the transaction coordinator. This causes the added participants to maintain their transaction resources until the transaction is aborted by the start of a subsequent transaction with higher txn number or by transaction timeout.

      The triggering of this race condition does not cause a correctness issue as long as the getMore does not involve and does not trigger any writes (this restriction is currently in place). This is because the added participant that is errantly omitted from transaction commit is only returning read results and these results would not be seen by the client due to the cursor being killed. However, the added participant will maintain its transaction resources as mentioned above, and there could be a specific load that causes this additional resource overhead to trigger a server failure.

      We need to characterize such a workload that triggers a shard to maintain significant transaction resources as to cause performance degradation or failure in the server.

      More information: https://docs.google.com/document/d/1Czt5q5VrTx3mB7rHRVKvZmMjr8hZeH8T2as5DLVOdGQ/edit?usp=sharing

            Assignee:
            Unassigned Unassigned
            Reporter:
            israel.hsu@mongodb.com Israel Hsu
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated: