Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-44092

Orphan documents impact IXSCAN performance

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Waiting For User Input
    • Priority: Major - P3
    • Resolution: Unresolved
    • Affects Version/s: 4.0.10
    • Fix Version/s: None
    • Component/s: Sharding
    • Labels:
      None
    • Operating System:
      ALL

      Description

      This happens once in a while. So far, no clear steps were found to reproduce this, but the symptoms are clear enough on their own. 

      This issue looks like this: some of mongod instances start to have an abnormally high CPU usage. They have the same amount of data as other shards, the same count and shape of operations. This may go as bad as near 100% cpu usage, but mongod instance still woks, albeit with increased latency. 

      In all cases we found the same offending query pattern: there are some frequent reads to a collection with, let's say, "foo == 1" query. The returned documents are updated with "foo = 2" and updated in db, then read query is repeated. Normally, this query often returns zero documents, and this doesn't upset mongod very much. When the issue takes place, however, this query still returns zero documents, but takes much longer to do so.

      Some prior experience with orphaned documents gave us a degree of insight, so we tried and cleaned up orphans in the problematic collection. All performance effects ceased at once. No other side effects were observed. 

      The effect presented itself several times, and each time this workaround was just as successful. At once such occasion we degraded state lasted some 12 hours, so this is not something that resolves itself - and therefore this doesn't look like a healthy last stage of chunk balancing, for example. Orphans are left behind for good. Or rather, for the bad. 

      The issue seems random and damaging enough to warrant some sort of user-side workaround, like cron-based orphan cleanup on all collections. Clearly, such measures must not be the only way to fix the issue. 

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                9 Start watching this issue

                Dates

                • Created:
                  Updated: