Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-22618

RangeDeleter asserts and causes primary to be unusable

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Incomplete
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Sharding
    • Labels:
      None
    • Operating System:
      ALL
    • Sprint:
      Sharding 11 (03/11/16), Sharding 12 (04/01/16), Sharding 13 (04/22/16)

      Description

      #8.

      Note: this looks like SERVER-22390 but is totally different - please read carefully.

      Our setup: primary+secondary+arbiter+another secondary which is >1 hour behind and syncing. in SECONDARY. we added it as votes:0 and priority:0 so it will not crash (as a workaround SERVER-22390)

      The other secondary is perfectly sync'd (0 sec lag).

      Range deleter shows this:

      2016-02-15T13:04:40.161+0000 I -        [RangeDeleter] Assertion: 64:waiting for replication timed out
      2016-02-15T13:04:40.506+0000 I CONTROL  [RangeDeleter]
       0x12d5772 0x12713d4 0x125cfa8 0x125d05c 0xb59fab 0xdf5026 0xdf2829 0x1a99100 0x7ff1831d7df3 0x7ff182f051bd
      ----- BEGIN BACKTRACE -----
      {"backtrace":[{"b":"400000","o":"ED5772"},{"b":"400000","o":"E713D4"},{"b":"400000","o":"E5CFA8"},{"b":"400000","o":"E5D05C"},{"b":"400000","o":"759FAB"},{"b":"400000","o":"9F5026"},{"b":"400000","o":"9F2829"},{"b":"400000","o":"1699100"},{"b":"7FF1831D0000","o":"7DF3"},{"b":"7FF182E0F000","o":"F61BD"}],"processInfo":{ "mongodbVersion" : "3.2.1", "gitVersion" : "a14d55980c2cdc565d4704a7e3ad37e4e535c1b2", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "3.14.19-17.43.amzn1.x86_64", "version" : "#1 SMP Wed Sep 17 22:14:52 UTC 2014", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "A931E563057BCC34ED2D1971AAC77D44F47033A9" }, { "b" : "7FFFC40FE000", "elfType" : 3, "buildId" : "8E3D893F8991DFE6C5D9AB55196714E9AF81DC88" }, { "b" : "7FF1843FD000", "path" : "/usr/lib64/libssl.so.10", "elfType" : 3, "buildId" : "22480480235F3B1C6C2E5E5953949728676D3796" }, { "b" : "7FF184018000", "path" : "/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "ADD80D7DBE8B04C3BA8E3242D96F39FF870A862A" }, { "b" : "7FF183E10000", "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "E81013CBFA409053D58A65A0653271AB665A4619" }, { "b" : "7FF183C0C000", "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "62A8842157C62F95C3069CBF779AFCC26577A99A" }, { "b" : "7FF183903000", "path" : "/usr/lib64/libstdc++.so.6", "elfType" : 3, "buildId" : "66F1CF311C61879639BD3DC0034DEE0D6D042261" }, { "b" : "7FF183601000", "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "5F97F8F8E5024E29717CF35998681F84D4A22D45" }, { "b" : "7FF1833EC000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "E77BA674F63D5C56373C03316B5E74C5C781A0BC" }, { "b" : "7FF1831D0000", "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "D48D3E6672A77B603B402F661BABF75E90AD570B" }, { "b" : "7FF182E0F000", "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "DF6DA145A649EA093507A635AF383F608E7CE3F2" }, { "b" : "7FF18466A000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "6F90843B9087FE91955FEB0355EB0858EF9E97B2" }, { "b" : "7FF182BCC000", "path" : "/lib64/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "DE5A9F7A11A0881CB64E375F4DDCA58028F0FAF8" }, { "b" : "7FF1828E7000", "path" : "/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "A3E43FC66908AC8B00773707FECA3B1677AFF311" }, { "b" : "7FF1826E4000", "path" : "/usr/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "622F315EB5CB2F791E9B64020692EBA98195D06D" }, { "b" : "7FF1824B9000", "path" : "/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "B10FBFEC246C4EAD1719D16090D0BE54904BBFC9" }, { "b" : "7FF1822A2000", "path" : "/lib64/libz.so.1", "elfType" : 3, "buildId" : "E492542502DF88A2F752AD77D1905D13FF1AC6FF" }, { "b" : "7FF182097000", "path" : "/lib64/libkrb5support.so.0", "elfType" : 3, "buildId" : "7292C0673D7C116E3389D3FFA67087A6B9287A71" }, { "b" : "7FF181E94000", "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "BF48CD5658DE95CE058C4B828E81C97E2AE19643" }, { "b" : "7FF181C7A000", "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "6A7DA1CED90F65F27CB7B5BACDBB1C386C05F592" }, { "b" : "7FF181A59000", "path" : "/usr/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "803D7EF21A989677D056E52BAEB9AB5B154FB9D9" } ] }}
       mongod(_ZN5mongo15printStackTraceERSo+0x32) [0x12d5772]
       mongod(_ZN5mongo10logContextEPKc+0x134) [0x12713d4]
       mongod(_ZN5mongo11msgassertedEiPKc+0x88) [0x125cfa8]
       mongod(+0xE5D05C) [0x125d05c]
       mongod(_ZN5mongo7Helpers11removeRangeEPNS_16OperationContextERKNS_8KeyRangeEbRKNS_19WriteConcernOptionsEPNS0_11RemoveSaverEbb+0x144B) [0xb59fab]
       mongod(_ZN5mongo17RangeDeleterDBEnv11deleteRangeEPNS_16OperationContextERKNS_16RangeDeleteEntryEPxPSs+0x426) [0xdf5026]
       mongod(_ZN5mongo12RangeDeleter6doWorkEv+0x209) [0xdf2829]
       mongod(+0x1699100) [0x1a99100]
       libpthread.so.0(+0x7DF3) [0x7ff1831d7df3]
       libc.so.6(clone+0x6D) [0x7ff182f051bd]
      -----  END BACKTRACE  -----
      2016-02-15T13:04:40.513+0000 W SHARDING [RangeDeleter] Error encountered while trying to delete range: Error encountered while deleting range: nsmydomain.col from { p: 0, as: ObjectId('5660b4295e27753f5c614e0a'), d: new Date(1455408000000) } -> { p: 2, as: ObjectId('5660b4295e27753f5c614e0a'), d: new Date(1455408000000) }, cause by: :: caused by :: 64 waiting for replication timed out
      2016-02-15T13:04:40.513+0000 I SHARDING [RangeDeleter] Deleter starting delete for: mydomain.col from { as: ObjectId('561ceef4bf86ce4ffd5032c2'), d: new Date(1451685631505), c: ObjectId('56c118ebeec1e72d766a28be'), u: ObjectId('54b538278071936bb168304e') } -> { as: ObjectId('565eb8575e27753f5c3b8c36'), d: new Date(1451713553551), c: ObjectId('56c0deffeec1e72d7269fa72'), u: ObjectId('568764114953771315eb1e4a') }, with opId: 2282185777
      

      Note: this DOES NOT crash the node, it keeps living - but - after exactly 1 hour, the primary becomes unusable and very very slow.

      1. This is a bug.

      2. Why does the range deleter wait for the secondary if it has votes:0 and priority:0?

      3. Any workaround for this? This will happen again.

        Attachments

          Activity

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: