Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-34029

mongod get stuck after upgrade

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Duplicate
    • Affects Version/s: 3.4.5, 3.4.14, 3.6.3
    • Fix Version/s: None
    • Component/s: Querying, WiredTiger
    • Labels:
      None
    • Operating System:
      ALL
    • Steps To Reproduce:
      Hide

      I don't have exact steps, but I can test different versions, usually it happens in few hours, e.g I can enable more logging etc.

      Show
      I don't have exact steps, but I can test different versions, usually it happens in few hours, e.g I can enable more logging etc.

      Description

      One of our mongodb cluster has very weird issue.
      I tried to upgrade our cluster from 3.4.1 to 3.6.3, after upgrade I noticed that one slave server gets stuck after a while, I can connect to that slave, some commands are working, but not all, e.g rs.printSlaveReplicationInfo() is not responding, also that slave starts lagging, there is no cpu usage or disk IO, server is idling.
      And later even primary server got stuck (I suspect it is same issue, but not sure).
      After primary got stuck I decided to rollback cluster to 3.4.13, same thing happened.
      Rolled back whole cluster to 3.4.1, this version works.

      I started playing with one slave server, tried different versions, turned out that 3.4.4 is OK, but 3.4.5 is not. Using git bisect I found that following commit is causing issues for us: https://github.com/mongodb/mongo/commit/c3fda2d13c2f60eb3a68564403985cb1c7e8aa43

      Slave that gets stuck is also used for read-only queries, another slave that is not getting queries works fine.

      Every time slave gets stuck there is are few huge queries like following in log:

       
      2018-03-21T11:16:02.888+0000 I COMMAND  [conn124] warning: log line attempted (644kB) over max size (10kB), printing beginning and end ... command Baz.foocollection command: getMore { getMore: 120605126995, collection: "foocollection" } originatingCommand: { find: "foocollection", filter: { $or: [ { fooObjectId: ObjectId('58931cae526107078bbf71f7'), stamp: 1494514165.463685 }, { fooObjectId: ObjectId('58b412d65e9206070a1dd5f6'), stamp: 1494422652.740239 }, { fooObjectId: ObjectId('58b412d65e9206070a1dd5f6'), stamp: 1494422716.137933 }, { fooObjectId: ObjectId('58b412d65e9206070a1dd5f6'), stamp: 1494422723.481655 }, { fooObjectId: ObjectId('58b412d65e9206070a1dd5f6'), stamp: 1495113378.631109 }, { fooObjectId: ObjectId('58b412d65e9206070a1dd5f6'), stamp: 1494422718.830563 }, { fooObjectId: ObjectId('58b412d65e9206070a1dd5f6'), stamp: 1495113381.569376 }, { fooObjectId: ObjectId('58b412d65e9206070a1dd5f6'), stamp: 1495113388.42224 }, { fooObjectId: ObjectId('58b412d65e9206070a1dd5f6'), stamp: 1495113426.608498 }, { fooObjectId: ObjectId('58931cae526107078bbf71f7'), stamp: 1493802053.254245 }, { fooObjectId: ObjectId('58931cae526107078bbf71f7'), stamp: 1493802049.093342 }, { fooObjectId: ObjectId('58931cae526107078bbf71f7'), stamp: 1493802054.233313 }, { fooObjectId: ObjectId('58931cae526107078bbf71f7'), stamp: 1494329972.558202 }, { fooObjectId: ObjectId('58931cae526107078bbf71f7'), stamp: 1493802055.457206 }, { stamp: 1487772816.953898, fooObjectId: ObjectId('5805bf6e74712800bb012f05') }, { fooObjectId: ObjectId('5805bf6e74712800bb012f05'), stamp: 1487772831.640239 }, { fooObjectId: ObjectId('589c35934e89350783564909'), stamp: 1496404387.08405 }, { fooObjectId: ObjectId('589c35934e89350783564909'), stamp: 1496404383.584682 }, { fooObjectId: ObjectId('589c35934e89350783564909'), stamp: 1496404357.338354 }, { fooObjectId: ObjectId('58931cae526107078bbf71f7'), stamp: 1494513722.518675 }, { fooObjectId: ObjectId('58b412645e9206070a1dd06a'), stamp: 1494423044.121071 }, { fooObjectId: ObjectId('58b412645e9206070a1dd06a'), stamp: 1494423041.918467 }, { fooObjectId: ObjectId('58b412645e9206070a1dd06a'), stamp: 1494423039.961413 }, { fooObjectId: ObjectId('58b412645e9206070a1dd06a'), stamp: 1494423034.819859 }, { stamp: 1501573001.144575, fooObjectId: ObjectId('5812fc35b2a83a00b8c9061f') }, { fooObjectId: ObjectId('5812fc35b2a83a00b8c9061f'), stamp: 1501573004.145434 }, { fooObjectId: ObjectId('5812fc35b2a83a00b8c9061f'), stamp: 1501572997.895948 }, { fooObjectId: ObjectId('5812fc35b2a83a00b8c9061f'), stamp: 1501573008.895141 }, { fooObjectId: ObjectId('5812fc35b2a83a00b8c9061f'), stamp: 1501573005.394891 }, { fooObjectId: ObjectId('5812fc35b2a83a00b8c9061f'), stamp: 1501090931.443829 }, { fooObjectId: ObjectId('58931cae526107078bbf71f7'), stamp: 1497892396.056092 }, { stamp: 1493890957.166266, fooObjectId: ObjectId('58b412d65e9206070a1dd5f6') }, { fooObjectId: ObjectId('58b412d65e9206070a1dd5f6'), stamp: 1493890819.475622 }, { fooObjectId: ObjectId('58b412d65e9206070a1dd5f6'), stamp: 1493890822.902208 }, { fooObjectId: ObjectId('57beea4dd93194b60029ee9a'), stamp: 1490703301.703321 }, { stamp: 1487949706.666049, fooObjectId: ObjectId('5805c00f74712800bb01320a') }, { fooObjectId: ObjectId('58b412d65e9206070a1dd5f6'), stamp: 1497550576.221023 }, { fooObjectId: ObjectId('58b412d65e9206070a1dd5f6'), stamp: 1497550576.95499 }, { fooObjectId: ObjectId('58b412645e9206070a1dd06a'), stamp: 1511271224.294596 }, { fooObjectId: ObjectId('58b412645e9206070a1dd06a'), stamp: 15112712 .......... on.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 } cursorid:120605126995 keysExamined:34140 docsExamined:34140 cursorExhausted:1 numYields:307 nreturned:34140 reslen:11101775 locks:{ Global: { acquireCount: { r: 616 }, acquireWaitCount: { r: 146 }, timeAcquiringMicros: { r: 469372 } }, Database: { acquireCount: { r: 308 } }, Collection: { acquireCount: { r: 308 } } } protocol:op_query 1110ms
      
      

      It doesn't happen each time, most of the time these queries are successful.

      I can provide logs, but not sure how to do that privately as our logs can contain sensitive information.

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: