Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-12146

writeback listener may not get correct code back from ClientInfo::getLastError

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.2.7, 2.4.9
    • Component/s: Sharding
    • Labels:
      None
    • Operating System:
      ALL

      Description

      Issue Status as of January 2nd, 2014

      ISSUE SUMMARY
      Under very rare circumstances mongos may incorrectly report a write as successful. The bug can manifest in the unlikely event that the mongos reuses a previously-used connection from the shared pool which contains a stale writeback field. In this situation, mongos cannot guarantee the correct post-migration location of writes and thus may incorrectly report the write as successful. Since mongos outgoing connections are tied to incoming client connections, this can only occur in cases of high connection turnover and low latency. The bug is difficult to trigger, but has caused a lost write in one known case.

      This race condition can only occur on the first occurrence of a writeback being queued for a shard. Once a writeback is queued, the connection is cached.

      USER IMPACT

      Affected Version: All versions of MongoDB prior to and including v2.4.8.
      Conditions Required: Sharded cluster with balancing enabled and active.
      Frequency: Extremely rare.
      Root Cause: In certain cases, it is possible for the getLastError aggregation in mongos ClientInfo to not return the correct code to the writeback listener. We ignore any previous writebacks when reprocessing a write in the writeback listener, but incorrectly do not append the other getLastError fields contained in "res" (the getLastError result from the shard).

      In short, when retrying a write via the writeback listener, it is possible for the writeback listener to miss the special stale config code it needs to continue retrying.

      SOLUTION
      Always aggregate results from getLastError even in the presence of previous writebacks.

      WORKAROUNDS
      Temporarily disable the balancer until all mongos are updated to ensure your sharded cluster is not susceptible to this bug.

      PATCHES
      Production release v2.4.9 and v2.2.7 contain the fix for this issue, and production release v2.6.0 will contain the fix as well. Upgrading all mongos processes to MongoDB v2.4.9 or MongoDB v2.2.7 is required to avoid this issue.

      Original Description

      In certain cases, it seems possible for the getLastError aggregation in mongos ClientInfo to not return the correct code to the writeback listener.

      The core issue is here:

                  if ( writebacks.size() ){
                      vector<BSONObj> v = _handleWriteBacks( writebacks , fromWriteBackListener );
                      if ( v.size() == 0 && fromWriteBackListener ) {
                          // ok
                      }
                      ...
                  }
                  else {
                      result.append( "singleShard" , theShard );
                      result.appendElements( res );
                  }

      We ignore any writebacks when reprocessing a write in the WBL, but incorrectly do not append the other getLastError fields contained in "res" (the getLastError result from the shard).

      In short, when retrying a command in the WBL, it's possible for the WBL to not get the special stale config code it needs to continue retrying.

      1. writeback_retry.js
        4 kB
        Greg Studer

        Activity

        Hide
        greg_10gen Greg Studer (Inactive) added a comment -

        Attached test case reproduces with two fail points in the WBL - difficult to trigger deterministically.

        Show
        greg_10gen Greg Studer (Inactive) added a comment - Attached test case reproduces with two fail points in the WBL - difficult to trigger deterministically.
        Hide
        xgen-internal-githook Githook User added a comment -

        Author:

        {u'username': u'monkey101', u'name': u'Dan Pasette', u'email': u'dan@10gen.com'}

        Message: SERVER-12146 do not check writebacks if calling gle from wbl
        Branch: v2.4
        https://github.com/mongodb/mongo/commit/bd3553dff93786447130c242c274678f969cd513

        Show
        xgen-internal-githook Githook User added a comment - Author: {u'username': u'monkey101', u'name': u'Dan Pasette', u'email': u'dan@10gen.com'} Message: SERVER-12146 do not check writebacks if calling gle from wbl Branch: v2.4 https://github.com/mongodb/mongo/commit/bd3553dff93786447130c242c274678f969cd513
        Hide
        dan@10gen.com Dan Pasette added a comment -

        This patch will be backported to the 2.2 branch.

        Show
        dan@10gen.com Dan Pasette added a comment - This patch will be backported to the 2.2 branch.
        Hide
        xgen-internal-githook Githook User added a comment -

        Author:

        {u'username': u'monkey101', u'name': u'Dan Pasette', u'email': u'dan@10gen.com'}

        Message: SERVER-12146 do not check writebacks if calling gle from wbl
        Branch: v2.2
        https://github.com/mongodb/mongo/commit/307fb42c66350981525d64ca8f6a2dbfe6a3d8f4

        Show
        xgen-internal-githook Githook User added a comment - Author: {u'username': u'monkey101', u'name': u'Dan Pasette', u'email': u'dan@10gen.com'} Message: SERVER-12146 do not check writebacks if calling gle from wbl Branch: v2.2 https://github.com/mongodb/mongo/commit/307fb42c66350981525d64ca8f6a2dbfe6a3d8f4

          People

          • Votes:
            0 Vote for this issue
            Watchers:
            13 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:
              Days since reply:
              1 year, 16 weeks ago
              Date of 1st Reply: