[SERVER-18351] 'Invalid epoch' fatal assertion when concurrently dropping and sharding collection Created: 06/May/15  Updated: 19/Sep/15  Resolved: 31/Jul/15

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.1.2
Fix Version/s: 3.1.7

Type: Bug Priority: Major - P3
Reporter: Kamran K. Assignee: Kaloian Manassiev
Resolution: Done Votes: 0
Labels: 32qa
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

'use strict';
 
load('jstests/libs/parallelTester.js');
 
function dropWorkload(host) {
    var mongos = new Mongo(host);
    var mongosAdmin = mongos.getDB('admin');
    var coll = mongos.getCollection('test.drop');
 
    while (true) {
        try {
            coll.runCommand({drop: coll.getName()});
            mongosAdmin.runCommand({shardCollection: coll.getFullName(), key: {_id: 1}});
        } catch (e) {
            // ignore
        }
    }
}
 
// set up the sharded cluster and shard the collection
var st = new ShardingTest({});
var mongosAdmin = st.s0.getDB('admin');
 
mongosAdmin.runCommand({enablesharding: 'test'});
mongosAdmin.runCommand({shardCollection: 'test.drop', key: {_id: 1}});
 
var threads = [];
for (var i = 0; i < 5; i++) {
    var t = new ScopedThread(dropWorkload, st.s0.host);
 
    threads.push(t);
    t.start();
}
 
threads.forEach(function(t) {
    t.join();
});

Sprint: Sharding 7 08/10/15
Participants:

 Description   

This bug was introduced in 3.1.2. It does not affect 3.0.2.

Fatal assertion 28634 BadValue invalid epoch
 
Program received signal SIGTRAP, Trace/breakpoint trap.
[Switching to Thread 0x7f6c04134700 (LWP 11908)]
0x00007f6c09dbe20b in raise (sig=5) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:37
37	../nptl/sysdeps/unix/sysv/linux/pt-raise.c: No such file or directory.
(gdb) bt
#0  0x00007f6c09dbe20b in raise (sig=5) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:37
#1  0x0000000000ff0238 in mongo::breakpoint () at src/mongo/util/debugger.cpp:57
#2  0x0000000000fe25f3 in mongo::fassertFailedWithStatus (msgid=28634, status=...) at src/mongo/util/assert_util.cpp:186
#3  0x0000000000c30d4b in mongo::fassert (msgid=28634, status=...) at src/mongo/util/assert_util.h:218
#4  0x0000000000eb0b92 in mongo::CatalogManagerLegacy::updateCollection (this=0x3365cd0, collNs=..., coll=...) at src/mongo/s/catalog/legacy/catalog_manager_legacy.cpp:847
#5  0x0000000000f79eaf in mongo::CollectionInfo::save (this=0x7f6c04132830, ns=...) at src/mongo/s/config.cpp:149
#6  0x0000000000eac1ca in mongo::CatalogManagerLegacy::shardCollection (this=0x3365cd0, ns=..., fieldsAndOrder=..., unique=false, initPoints=0x7f6c04132fd0, initShards=0x0) at src/mongo/s/catalog/legacy/catalog_manager_legacy.cpp:498
#7  0x0000000000f61112 in mongo::(anonymous namespace)::ShardCollectionCmd::run (this=0x17679e0 <mongo::(anonymous namespace)::shardCollectionCmd>, txn=0x0, dbname=..., cmdObj=..., options=0, errmsg=..., result=...)
    at src/mongo/s/commands/cluster_shard_collection_cmd.cpp:415
#8  0x0000000000f9115c in mongo::Command::execCommandClientBasic (txn=0x0, c=0x17679e0 <mongo::(anonymous namespace)::shardCollectionCmd>, client=..., queryOptions=0, ns=0x7f6bd4000ec4 "admin.$cmd", cmdObj=..., result=...)
    at src/mongo/s/s_only.cpp:102
#9  0x0000000000f915d6 in mongo::Command::runAgainstRegistered (ns=0x7f6bd4000ec4 "admin.$cmd", jsobj=..., anObjBuilder=..., queryOptions=0) at src/mongo/s/s_only.cpp:146
#10 0x0000000000f998aa in mongo::Strategy::clientCommandOp (this=0x334e990, r=...) at src/mongo/s/strategy.cpp:308
#11 0x0000000000f90612 in mongo::Request::process (this=0x7f6c04133b90, attempt=0) at src/mongo/s/request.cpp:121
#12 0x0000000000b473f4 in mongo::ShardedMessageHandler::process (this=0x7fff09cf3b80, m=..., p=0x336a0f0) at src/mongo/s/server.cpp:149
#13 0x0000000001007c7d in mongo::PortMessageServer::handleIncomingMsg (arg=0x336a0f0) at src/mongo/util/net/message_server_port.cpp:227
#14 0x00007f6c09db6182 in start_thread (arg=0x7f6c04134700) at pthread_create.c:312
#15 0x00007f6c09ae347d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111



 Comments   
Comment by Githook User [ 31/Jul/15 ]

Author:

{u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}

Message: SERVER-18351 Invalidate collection after drop

This change adds code to completely remove a collection from the cache
after it has been dropped and also makes all collection drop variants
(sharded or non-sharded to go through the catalog manager).

In order to simiplify the drop code, we also no longer do shard version
checking on collection drop. This operation is done under a distributed
lock on mongos and there is no need to do shard version checking.
Branch: master
https://github.com/mongodb/mongo/commit/e9bd9667785273a1122d1668b5fce1ffed142cea

Generated at Thu Feb 08 03:47:24 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.