[SERVER-38357] mongod crashed when building index and updating data at the same time Created: 03/Dec/18  Updated: 28/Feb/19  Resolved: 28/Feb/19

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Jack Assignee: Danny Hatcher (Inactive)
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Participants:

 Description   

I build a replicaset of mongodb 3.4.14 with one primary and two secondary in three VMs. I run the build index command with {backgroup:true} and was updating data on primary at the same time. then the both secondary mongod process crashed.

the backtrace are as follows:



 Comments   
Comment by Danny Hatcher (Inactive) [ 28/Feb/19 ]

Hello Jack,

We haven’t heard back from you for some time, so I’m going to mark this ticket as resolved. If this is still an issue for you, please provide additional information and we will reopen the ticket.

Thanks,

Danny

Comment by Danny Hatcher (Inactive) [ 01/Feb/19 ]

Hello Jack,

We still need additional information to diagnose the problem. If this is still an issue for you, would you please respond to the questions posed in my last comment?

Thanks,

Danny

Comment by Danny Hatcher (Inactive) [ 07/Dec/18 ]

Hello Jack,

I have spun up a replica set on 3.4.14 and run a series of applyOps commands against one namespace while building a background index on the other. However, I have not been able to reproduce your error.

1. Has this happened multiple times or was it a one-time occurrence?
2. Could you please provide your mongod log covering the time of a crash?
3. Could you please provide your the "diagnostic.data" folder (located under the $dbpath)?

Thank you,

Danny

Comment by Jack [ 04/Dec/18 ]

Hello Daniel, thanks for your attention.

Actually I made a mistake, I thought  "there are many update op in this oplog" when I saw the backtrace info involved in so many 'update' message.

While after analysed the local.oplog.rs collection in Secondary today, I found out the operations are composed all of the insert operations. 

So here is the scene:

Firstly, I run a build index command on db :

sun_test.streamLine.createIndex( { skuId: 1 }, { background: true } )

and the corresponding log in mongod.log is:

[initandlisten] build index on: sun_test.streamLine properties: { v: 2, key: { r: 1.0 }, name: "skuId_1", background: true, ns: "sun_test.streamLine" }

Then I use applyOps import in some data continuously, the data sample is as follows:

 {
        "ts" : Timestamp(1543661171, 136),
        "t" : NumberLong(2),
        "h" : NumberLong("8182692319765150402"),
        "v" : 2,
        "op" : "c",
        "ns" : "admin.$cmd",
        "o" : {
                "applyOps" : [
                        {
                                "ts" : Timestamp(1543209770, 1),
                                "h" : NumberLong("3453029526283301851"),
                                "v" : 2,
                                "op" : "i",
                                "ns" : "sun_test.order_serial",
                                "o" : {
                                        "_id" : ObjectId("5bfb832ae4b06e4edec075bb"),
                                        "shop_name_id" : NumberLong(2192),
                                        "shop_iden : NumberLong(800000001),
                                        "business_type" : 1,
                                        "start_time" : ISODate("2018-11-25T16:00:01Z"),
                                        "end_time" : ISODate("2018-11-26T15:59:59Z"),
                                        "order_num" : 1,
                                        "create_time" : ISODate("2018-11-26T05:22:50.029Z")
                                },
                                "o2" : {                                }
                        }
                ]
        }
}

As you can see, those two actions are taken on the same database but the different  collections.

After a while, the Secondaries crashed and  start rebuild index on sun_test.streamLine collection when I restart the process.

After confirming the index build action is done, I run the applyOps again with nothing wrong happened and all things turn back to normal.

Comment by Danny Hatcher (Inactive) [ 03/Dec/18 ]

Hello Jack,

The accurate scene is "I run the build index command with

Unknown macro: {backgroup}

and was using applyOps command importing in some data(a bson format oplog) , and there are many update op in this oplog "

Can you please provide example commands for both the index build and your import?

Thank you,

Danny

Comment by Jack [ 03/Dec/18 ]

 "I run the build index command with {backgroup:true} and was updating data on primary at the same time. " sorry for that the description is not that accurate.

The accurate scene is "I run the build index command with {backgroup:true} and was using applyOps command importing in some data(a bson format oplog) , and there are many update op in this oplog "

[repl index builder 165] Index Build (background): 29344400/277038147 10%
2018-12-01T14:56:34.884+0000 I - [repl writer worker 0] Invariant failure !_inUnitOfWork src/mongo/db/storage/wiredtiger/wiredtiger_recovery_unit.cpp 185
2018-12-01T14:56:34.893+0000 I - [repl writer worker 0]
2018-12-01T14:56:35.084+0000 F - [repl writer worker 0] Got signal: 6 (Aborted).
0x564224eafaf1 0x564224eaed29 0x564224eaf20d 0x7f471d47a100 0x7f471d0df597 0x7f471d0e0c88 0x5642241219cc 0x564224b9e3a8 0x5642244f657b 0x5642244f7b04 0x5642244c48a3 0x5642247d27fa 0x5642247d311b 0x5642247d324d 0x56422462616a 0x5642248a3eb7 0x5642242e366c 0x5642242e54c9 0x56422489fb01 0x5642248a2a17 0x564224985f30 0x56422497f0d1 0x56422497fa1d 0x56422497fb2d 0x5642249846f0 0x564224985b33 0x56422497b0ff 0x56422497c1ea 0x564224e1bc5c 0x564224e1c70c 0x564224e1d0f6 0x56422596ae50 0x7f471d472dc5 0x7f471d1a081d
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"5642238E2000","o":"15CDAF1","s":"_ZN5mongo15printStackTraceERSo"},{"b":"5642238E2000","o":"15CCD29"},{"b":"5642238E2000","o":"15CD20D"},{"b":"7F471D46B000","o":"F100"},{"b":"7F471D0AA000","o":"35597","s":"gsignal"},{"b":"7F471D0AA000","o":"36C88","s":"abort"},{"b":"5642238E2000","o":"83F9CC","s":"_ZN5mongo17invariantOKFailedEPKcRKNS_6StatusES1_j"},{"b":"5642238E2000","o":"12BC3A8"},{"b":"5642238E2000","o":"C1457B","s":"_ZN5mongo11UpdateStage8doInsertEv"},{"b":"5642238E2000","o":"C15B04","s":"_ZN5mongo11UpdateStage6doWorkEPm"},{"b":"5642238E2000","o":"BE28A3","s":"_ZN5mongo9PlanStage4workEPm"},{"b":"5642238E2000","o":"EF07FA","s":"_ZN5mongo12PlanExecutor11getNextImplEPNS_11SnapshottedINS_7BSONObjEEEPNS_8RecordIdE"},{"b":"5642238E2000","o":"EF111B","s":"_ZN5mongo12PlanExecutor7getNextEPNS_7BSONObjEPNS_8RecordIdE"},{"b":"5642238E2000","o":"EF124D","s":"_ZN5mongo12PlanExecutor11executePlanEv"},{"b":"5642238E2000","o":"D4416A","s":"_ZN5mongo6updateEPNS_16OperationContextEPNS_8DatabaseERKNS_13UpdateRequestE"},{"b":"5642238E2000","o":"FC1EB7","s":"_ZN5mongo4repl21applyOperation_inlockEPNS_16OperationContextEPNS_8DatabaseERKNS_7BSONObjEbSt8functionIFvvEE"},{"b":"5642238E2000","o":"A0166C"},{"b":"5642238E2000","o":"A034C9","s":"_ZN5mongo8applyOpsEPNS_16OperationContextERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKNS_7BSONObjEPNS_14BSONObjBuilderE"},{"b":"5642238E2000","o":"FBDB01"},{"b":"5642238E2000","o":"FC0A17","s":"_ZN5mongo4repl19applyCommand_inlockEPNS_16OperationContextERKNS_7BSONObjEb"},{"b":"5642238E2000","o":"10A3F30","s":"_ZNSt17_Function_handlerIFN5mongo6StatusEPNS0_16OperationContextERKNS0_7BSONObjEbEPS7_E9_M_invokeERKSt9_Any_dataOS3_S6_Ob"},{"b":"5642238E2000","o":"109D0D1","s":"_ZN5mongo4repl8SyncTail9syncApplyEPNS_16OperationContextERKNS_7BSONObjEbSt8functionIFNS_6StatusES3_PNS_8DatabaseES6_bS7_IFvvEEEES7_IFS8_S3_S6_bEESC_"},{"b":"5642238E2000","o":"109DA1D","s":"_ZN5mongo4repl8SyncTail9syncApplyEPNS_16OperationContextERKNS_7BSONObjEb"},{"b":"5642238E2000","o":"109DB2D"},{"b":"5642238E2000","o":"10A26F0","s":"_ZN5mongo4repl22multiSyncApply_noAbortEPNS_16OperationContextEPSt6vectorIPKNS0_10OplogEntryESaIS6_EESt8functionIFNS_6StatusES2_RKNS_7BSONObjEbEE"},{"b":"5642238E2000","o":"10A3B33","s":"_ZN5mongo4repl14multiSyncApplyEPSt6vectorIPKNS0_10OplogEntryESaIS4_EEPNS0_8SyncTailE"},{"b":"5642238E2000","o":"10990FF"},{"b":"5642238E2000","o":"109A1EA"},{"b":"5642238E2000","o":"1539C5C","s":"_ZN5mongo10ThreadPool10_doOneTaskEPSt11unique_lockISt5mutexE"},{"b":"5642238E2000","o":"153A70C","s":"_ZN5mongo10ThreadPool13_consumeTasksEv"},{"b":"5642238E2000","o":"153B0F6","s":"_ZN5mongo10ThreadPool17
mongod(_ZN5mongo15printStackTraceERSo+0x41) [0x564224eafaf1]
mongod(+0x15CCD29) [0x564224eaed29]
mongod(+0x15CD20D) [0x564224eaf20d]
libpthread.so.0(+0xF100) [0x7f471d47a100]
libc.so.6(gsignal+0x37) [0x7f471d0df597]
libc.so.6(abort+0x148) [0x7f471d0e0c88]
mongod(_ZN5mongo17invariantOKFailedEPKcRKNS_6StatusES1_j+0x0) [0x5642241219cc]
mongod(+0x12BC3A8) [0x564224b9e3a8]
mongod(_ZN5mongo11UpdateStage8doInsertEv+0x3DB) [0x5642244f657b]
mongod(_ZN5mongo11UpdateStage6doWorkEPm+0xF4) [0x5642244f7b04]
mongod(_ZN5mongo9PlanStage4workEPm+0x63) [0x5642244c48a3]
mongod(_ZN5mongo12PlanExecutor11getNextImplEPNS_11SnapshottedINS_7BSONObjEEEPNS_8RecordIdE+0x19A) [0x5642247d27fa]
mongod(_ZN5mongo12PlanExecutor7getNextEPNS_7BSONObjEPNS_8RecordIdE+0x4B) [0x5642247d311b]
mongod(_ZN5mongo12PlanExecutor11executePlanEv+0x6D) [0x5642247d324d]
mongod(_ZN5mongo6updateEPNS_16OperationContextEPNS_8DatabaseERKNS_13UpdateRequestE+0xEA) [0x56422462616a]
mongod(_ZN5mongo4repl21applyOperation_inlockEPNS_16OperationContextEPNS_8DatabaseERKNS_7BSONObjEbSt8functionIFvvEE+0x7F7) [0x5642248a3eb7]
mongod(+0xA0166C) [0x5642242e366c]
mongod(_ZN5mongo8applyOpsEPNS_16OperationContextERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKNS_7BSONObjEPNS_14BSONObjBuilderE+0x499) [0x5642242e54c9]
mongod(+0xFBDB01) [0x56422489fb01]
mongod(_ZN5mongo4repl19applyCommand_inlockEPNS_16OperationContextERKNS_7BSONObjEb+0x4C7) [0x5642248a2a17]
mongod(_ZNSt17_Function_handlerIFN5mongo6StatusEPNS0_16OperationContextERKNS0_7BSONObjEbEPS7_E9_M_invokeERKSt9_Any_dataOS3_S6_Ob+0x30) [0x564224985f30]
mongod(_ZN5mongo4repl8SyncTail9syncApplyEPNS_16OperationContextERKNS_7BSONObjEbSt8functionIFNS_6StatusES3_PNS_8DatabaseES6_bS7_IFvvEEEES7_IFS8_S3_S6_bEESC_+0x661) [0x56422497f0d1]
mongod(_ZN5mongo4repl8SyncTail9syncApplyEPNS_16OperationContextERKNS_7BSONObjEb+0xFD) [0x56422497fa1d]
mongod(+0x109DB2D) [0x56422497fb2d]
mongod(_ZN5mongo4repl22multiSyncApply_noAbortEPNS_16OperationContextEPSt6vectorIPKNS0_10OplogEntryESaIS6_EESt8functionIFNS_6StatusES2_RKNS_7BSONObjEbEE+0x1A0) [0x5642249846f0]
mongod(_ZN5mongo4repl14multiSyncApplyEPSt6vectorIPKNS0_10OplogEntryESaIS4_EEPNS0_8SyncTailE+0x73) [0x564224985b33]
mongod(+0x10990FF) [0x56422497b0ff]
mongod(+0x109A1EA) [0x56422497c1ea]
mongod(_ZN5mongo10ThreadPool10_doOneTaskEPSt11unique_lockISt5mutexE+0x14C) [0x564224e1bc5c]
mongod(_ZN5mongo10ThreadPool13_consumeTasksEv+0xBC) [0x564224e1c70c]
mongod(_ZN5mongo10ThreadPool17_workerThreadBodyEPS0_RKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x96) [0x564224e1d0f6]
mongod(+0x2088E50) [0x56422596ae50]
libpthread.so.0(+0x7DC5) [0x7f471d472dc5]
libc.so.6(clone+0x6D) [0x7f471d1a081d]
----- END BACKTRACE -----
2018-12-01T14:57:48.772+0000 I CONTROL [main] ***** SERVER RESTARTED *****
 
 
.......
 
 
2018-12-01T14:58:45.286+0000 I INDEX [initandlisten] found 9 index(es) that wasn't finished before shutdown
2018-12-01T14:58:45.348+0000 W FTDC [initandlisten] Error checking directory '/sys/block': No such file or directory
2018-12-01T14:58:45.348+0000 I FTDC [initandlisten] Initializing full-time diagnostic data capture with directory '/opt/mongodb/data/mongod/diagnostic.data'
2018-12-01T14:58:45.349+0000 I INDEX [initandlisten] found 1 interrupted index build(s) on calm_mind_loadtest.WmInBoundFlowLog
2018-12-01T14:58:45.349+0000 I INDEX [initandlisten] note: restart the server with --noIndexBuildRetry to skip index rebuilds
2018-12-01T14:58:45.357+0000 I INDEX [initandlisten] build index on: *******
2018-12-01T14:58:45.357+0000 I INDEX [initandlisten] building index using bulk method; build may temporarily use up to 500 megabytes of RAM
2018-12-01T14:58:48.001+0000 I - [initandlisten] Index Build: 679300/192373792 0%
2018-12-01T14:58:51.001+0000 I - [initandlisten] Index Build: 1435100/192373792 0%
2018-12-01T14:58:54.011+0000 I - [initandlisten] Index Build: 2197100/192373792 1%
2018-12-01T14:58:57.215+0000 I - [initandlisten] Index Build: 2969100/192373792 1%
2018-12-01T14:59:00.190+0000 I - [initandlisten] Index Build: 3956900/192373792 2%
2018-12-01T14:59:03.006+0000 I - [initandlisten] Index Build: 4806700/192373792

Generated at Thu Feb 08 04:48:44 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.