[SERVER-14494] Dropping collection during active background index build on secondary triggers segfault Created: 08/Jul/14  Updated: 08/Apr/15  Resolved: 14/Jul/14

Status: Closed
Project: Core Server
Component/s: Index Maintenance, Replication
Affects Version/s: 2.6.2
Fix Version/s: 2.6.4, 2.7.4

Type: Bug Priority: Major - P3
Reporter: Bernardo Cabezas Serra Assignee: Eric Milkie
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
related to SERVER-17927 Dropping collection during active bac... Closed
is related to SERVER-15871 Segmentation fault in secondary rebui... Closed
is related to SERVER-14505 cannot dropAllIndexes when index buil... Closed
is related to SERVER-15038 Multiple background index builds may ... Closed
Tested
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Completed:
Participants:

 Description   
Issue Status as of Jul 18, 2014

ISSUE SUMMARY
If during a background index build in a collection on a secondary node this collection gets dropped on the primary, the secondary may crash with a segmentation fault once it replicates the collection drop.

USER IMPACT
Only background index builds are affected. Secondary failures affect replica set quorum and can potentially lead to unavailability of the replica set.

WORKAROUNDS
Use foreground index builds, or ensure that the secondaries have finished building the index before dropping the specific collection in which the index is being built.

AFFECTED VERSIONS
Production release versions from 2.6.0 to 2.6.3 are affected by this bug.

FIX VERSION
The fix is included in the 2.6.4 production release.

RESOLUTION DETAILS
The patch now ensures that IndexBuildBlock::fail() can never throw an exception.

Original description

We have a repeated server crash after two days, while performing same operation (drop).

This instance was running as a secondary in a replicaset.

2014-07-08T08:54:19.676+0200 [repl index builder 17755] build index on: 233-340-6496.keyword_adgroup properties: { v: 1, key: { campaign: 1 }, name: "campaign_1", ns: "233-340-6496.keyword_adgroup", background: true }
2014-07-08T08:54:19.676+0200 [repl index builder 17755]          building index in background
2014-07-08T08:54:19.678+0200 [repl writer worker 8] CMD: drop 233-340-6496.keyword_adgroup
2014-07-08T08:54:19.678+0200 [repl writer worker 8] halting index build: { adgroup: 1 }
2014-07-08T08:54:19.678+0200 [repl writer worker 8] SEVERE: Invalid access at address: 0x1c00736e3a
2014-07-08T08:54:19.684+0200 [repl writer worker 8] SEVERE: Got signal: 11 (Segmentation fault).
Backtrace:0x11dae01 0x11da1de 0x11da2cf 0x7f7f08bb7340 0x8d5ec7 0xb856b2 0xa29ba9 0xa2a8d0 0xa2296a 0xa24958 0xa26279 0xe48ab5 0xeb27b9 0xeb2d00 0x117089e 0x121f819 0x7f7f08baf182 0x7f7f07eb430d 
 /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0x11dae01]
 /usr/bin/mongod() [0x11da1de]
 /usr/bin/mongod() [0x11da2cf]
 /lib/x86_64-linux-gnu/libpthread.so.0(+0x10340) [0x7f7f08bb7340]
 /usr/bin/mongod(_ZN5mongo12IndexCatalog23killMatchingIndexBuildsERKNS0_17IndexKillCriteriaE+0x217) [0x8d5ec7]
 /usr/bin/mongod(_ZN5mongo12IndexBuilder23killMatchingIndexBuildsEPNS_10CollectionERKNS_12IndexCatalog17IndexKillCriteriaE+0x12) [0xb856b2]
 /usr/bin/mongod(_ZN5mongo7CmdDrop15stopIndexBuildsEPNS_8DatabaseERKNS_7BSONObjE+0xe9) [0xa29ba9]
 /usr/bin/mongod(_ZN5mongo7CmdDrop3runERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x1c0) [0xa2a8d0]
 /usr/bin/mongod(_ZN5mongo12_execCommandEPNS_7CommandERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x3a) [0xa2296a]
 /usr/bin/mongod(_ZN5mongo7Command11execCommandEPS0_RNS_6ClientEiPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0x1618) [0xa24958]
 /usr/bin/mongod(_ZN5mongo12_runCommandsEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x6c9) [0xa26279]
 /usr/bin/mongod(_ZN5mongo21applyOperation_inlockERKNS_7BSONObjEbb+0x8d5) [0xe48ab5]
 /usr/bin/mongod(_ZN5mongo7replset8SyncTail9syncApplyERKNS_7BSONObjEb+0x749) [0xeb27b9]
 /usr/bin/mongod(_ZN5mongo7replset14multiSyncApplyERKSt6vectorINS_7BSONObjESaIS2_EEPNS0_8SyncTailE+0x50) [0xeb2d00]
 /usr/bin/mongod(_ZN5mongo10threadpool6Worker4loopEv+0x19e) [0x117089e]
 /usr/bin/mongod() [0x121f819]
 /lib/x86_64-linux-gnu/libpthread.so.0(+0x8182) [0x7f7f08baf182]
 /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f7f07eb430d]

Another crash, day before, same operation.

2014-07-07T09:59:36.387+0200 [conn405111] end connection 192.168.0.1:51685 (32 connections now open)
2014-07-07T09:59:36.388+0200 [initandlisten] connection accepted from 192.168.0.1:51968 #405172 (33 connections now open)
2014-07-07T09:59:37.589+0200 [conn405168] end connection 172.17.0.13:37430 (32 connections now open)
2014-07-07T09:59:38.533+0200 [repl index builder 11740] build index on: 558-613-4859.keyword_adgroup properties: { v: 1, key: { campaign: 1 }, name: "campaign_1", ns: "558-613-4859.keyword_adgroup", background: true }
2014-07-07T09:59:38.533+0200 [repl index builder 11740] building index in background
2014-07-07T09:59:38.535+0200 [repl writer worker 8] CMD: drop 558-613-4859.keyword_adgroup
2014-07-07T09:59:38.535+0200 [repl writer worker 8] halting index build: { campaign: 1 }
2014-07-07T09:59:38.535+0200 [repl writer worker 8] halting index build: { adgroup: 1 }
2014-07-07T09:59:38.535+0200 [repl writer worker 8] SEVERE: Invalid access at address: 0x1c00736e3a
2014-07-07T09:59:38.541+0200 [repl writer worker 8] SEVERE: Got signal: 11 (Segmentation fault).
Backtrace:0x11dae01 0x11da1de 0x11da2cf 0x7fbb88e7c340 0x8d5ec7 0xb856b2 0xa29ba9 0xa2a8d0 0xa2296a 0xa24958 0xa26279 0xe48ab5 0xeb27b9 0xeb2d00 0x117089e 0x121f819 0x7fbb88e74182 0x7fbb8817930d 
/usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0x11dae01]
/usr/bin/mongod() [0x11da1de]
/usr/bin/mongod() [0x11da2cf]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x10340) [0x7fbb88e7c340]
/usr/bin/mongod(_ZN5mongo12IndexCatalog23killMatchingIndexBuildsERKNS0_17IndexKillCriteriaE+0x217) [0x8d5ec7]
/usr/bin/mongod(_ZN5mongo12IndexBuilder23killMatchingIndexBuildsEPNS_10CollectionERKNS_12IndexCatalog17IndexKillCriteriaE+0x12) [0xb856b2]
/usr/bin/mongod(_ZN5mongo7CmdDrop15stopIndexBuildsEPNS_8DatabaseERKNS_7BSONObjE+0xe9) [0xa29ba9]
/usr/bin/mongod(_ZN5mongo7CmdDrop3runERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x1c0) [0xa2a8d0]
/usr/bin/mongod(_ZN5mongo12_execCommandEPNS_7CommandERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x3a) [0xa2296a]
/usr/bin/mongod(_ZN5mongo7Command11execCommandEPS0_RNS_6ClientEiPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0x1618) [0xa24958]
/usr/bin/mongod(_ZN5mongo12_runCommandsEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x6c9) [0xa26279]
/usr/bin/mongod(_ZN5mongo21applyOperation_inlockERKNS_7BSONObjEbb+0x8d5) [0xe48ab5]
/usr/bin/mongod(_ZN5mongo7replset8SyncTail9syncApplyERKNS_7BSONObjEb+0x749) [0xeb27b9]
/usr/bin/mongod(_ZN5mongo7replset14multiSyncApplyERKSt6vectorINS_7BSONObjESaIS2_EEPNS0_8SyncTailE+0x50) [0xeb2d00]
/usr/bin/mongod(_ZN5mongo10threadpool6Worker4loopEv+0x19e) [0x117089e]
/usr/bin/mongod() [0x121f819]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8182) [0x7fbb88e74182]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7fbb8817930d]

Mongo 2.6.2 is runnig on top of Ubuntu 14.04 x86_64 (10gen mongo packages)

mongod.conf:

dbpath=/var/lib/mongodb
logpath=/var/log/mongodb/mongod.log
logappend=true
smallfiles = true
oplogSize=20480
replSet=rs0



 Comments   
Comment by Giorgio Sironi [ 08/Apr/15 ]

Done at https://jira.mongodb.org/browse/SERVER-17927

Comment by Ramon Fernandez Marina [ 04/Apr/15 ]

giorgiosironi, a quick look at the gists you provided makes me think the issue you're seeing is different. Can you please open a separate SERVER ticket and upload full logs from your primary and the affected secondary?

Note that I see a "verifyFailed" string in the stack trace, which may indicate data corruption. The most common culprit are unhealthy disks, so please also check that your disks are ok. We can discuss further in a new ticket.

Regards,
Ramón.

Comment by Giorgio Sironi [ 04/Apr/15 ]

We have run into this behavior on 2.6.8, although the crash does not seem to be a segfault but a failed assertion instead:
https://gist.github.com/giorgiosironi/fc6f267bf048b6c3ea28
This is what was happening on the primary:
https://gist.github.com/giorgiosironi/ba5d2a125606cae16c78
I am available for further information such as config files.

Comment by Githook User [ 14/Jul/14 ]

Author:

{u'username': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: SERVER-14494 SERVER-14505 ensure that IndexBuildBlock::fail() can never throw

(backport of 5e06c1745105c8e2b50ba875e4433cd93f471ea4)
Branch: v2.6
https://github.com/mongodb/mongo/commit/8be2bd74167f69fbca94a02c069fde12fb803722

Comment by Githook User [ 14/Jul/14 ]

Author:

{u'username': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: SERVER-14494 SERVER-14505 ensure that IndexBuildBlock::fail() can never throw
Branch: master
https://github.com/mongodb/mongo/commit/a1b737a2146ba380ae9a51acfb1138982745d996

Comment by Ramon Fernandez Marina [ 08/Jul/14 ]

eduherraiz, we're able to reproduce this behavior and we're investigating.

Edit: sorry, I meant to send this to bercab.

Comment by Edu Herraiz [ 08/Jul/14 ]

More info about the problem, the collection (keyword_adgroup) stats:

{
 "ns" : "233-340-6496.keyword_adgroup",
 "count" : 3588,
 "size" : 12727904,
 "avgObjSize" : 3547.353400222965,
 "storageSize" : 25313280,
 "numExtents" : 6,
 "nindexes" : 1,
 "lastExtentSize" : 12742656,
 "paddingFactor" : 1,
 "systemFlags" : 1,
 "userFlags" : 0,
 "totalIndexSize" : 130816,
 "indexSizes" : {
 "_id_" : 130816
 },
 "ok" : 1
}

Generated at Thu Feb 08 03:35:02 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.