[SERVER-16795] Recovery Unit invariant tripped when calling getMore through DBDirectClient Created: 09/Jan/15  Updated: 23/Jan/15  Resolved: 15/Jan/15

Status: Closed
Project: Core Server
Component/s: Internal Code, Querying, Storage
Affects Version/s: 2.8.0-rc4
Fix Version/s: 3.0.0-rc6

Type: Bug Priority: Major - P3
Reporter: Tyler Brock Assignee: J Rassi
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-16658 C++ driver (server fork) methods getI... Closed
is related to SERVER-16659 Cleanup pass for CursorManager/getMor... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

    class MultiBatch : public Base {
    public:
        MultiBatch() : Base( "multibatch" ) {}
        void run() {
            OperationContextImpl txn;
            DBDirectClient db(&txn);
 
            size_t n_collections = 150;
            for (size_t i = 0; i < n_collections; i++) {
                std::stringstream ss;
                ss << ns();
                ss << i;
                db.createCollection(ss.str());
            }
 
            std::cout << "created collections" <<std::endl;
 
            list<BSONObj> colls = db.getCollectionInfos("test");
            ASSERT_EQUALS(n_collections, colls.size());
        }
    };

Note: Code in client.cpp was modified to ask for a batchSize of 10 on the initial listCollections command instead of the standard 4mb.

Participants:

 Description   

2015-01-09T11:29:20.429-0500 I -        _DEBUG build
2015-01-09T11:29:20.429-0500 I -        random seed: 0
2015-01-09T11:29:20.429-0500 I -        _DEBUG: automatically enabling mmapv1GlobalOptions.journalOptions=8 (JournalParanoid)
2015-01-09T11:29:20.468-0500 I CONTROL  [testsuite] git version: cb175f23188f62ba7be0cd9cbce425fdac68bef3
2015-01-09T11:29:20.469-0500 I CONTROL  [testsuite] build info: Linux bigbrock 3.17.6-1-ARCH #1 SMP PREEMPT Sun Dec 7 23:43:32 UTC 2014 x86_64 BOOST_LIB_VERSION=1_49
2015-01-09T11:29:20.481-0500 I JOURNAL  [testsuite] journal dir=/tmp/unittest/journal
2015-01-09T11:29:20.481-0500 I JOURNAL  [testsuite] recover : no journal files present, no recovery needed
2015-01-09T11:29:20.486-0500 I -        [testsuite] going to run suite: client
2015-01-09T11:29:20.486-0500 I -        [testsuite] 	 going to run test: ClientTests::MultiBatch
2015-01-09T11:29:20.487-0500 I INDEX    [testsuite] allocating new ns file /tmp/unittest/test.ns, filling with zeroes...
2015-01-09T11:29:20.499-0500 I STORAGE  [FileAllocator] allocating new datafile /tmp/unittest/test.0, filling with zeroes...
2015-01-09T11:29:20.500-0500 I STORAGE  [FileAllocator] creating directory /tmp/unittest/_tmp
2015-01-09T11:29:20.500-0500 I STORAGE  [FileAllocator] done allocating datafile /tmp/unittest/test.0, size: 16MB,  took 0 secs
2015-01-09T11:29:20.500-0500 I STORAGE  [testsuite] datafileheader::init initializing /tmp/unittest/test.0 n:0
2015-01-09T11:29:20.626-0500 I STORAGE  [FileAllocator] allocating new datafile /tmp/unittest/test.1, filling with zeroes...
2015-01-09T11:29:20.627-0500 I STORAGE  [FileAllocator] done allocating datafile /tmp/unittest/test.1, size: 32MB,  took 0 secs
2015-01-09T11:29:20.627-0500 I STORAGE  [testsuite] datafileheader::init initializing /tmp/unittest/test.1 n:1
created collections
2015-01-09T11:29:20.739-0500 I -        [testsuite] Invariant failure txn->recoveryUnit() == cc->getUnownedRecoveryUnit() src/mongo/db/query/find.cpp 270
2015-01-09T11:29:20.772-0500 I CONTROL  [testsuite] 
 0x1fe45c2 0x1f96a98 0x1f8295f 0x1bde58e 0x1b00404 0x1afd2a0 0x1a09505 0x1878e97 0x1863ee4 0x18613d6 0x14665f9 0x146e615 0x144158f 0x1cae3c8 0x1f7de88 0x1f7c138 0x1f7cad9 0x1df9141 0x1481a86 0x1481b29 0x7f0ebea99040 0x142b429
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"1BE45C2"},{"b":"400000","o":"1B96A98"},{"b":"400000","o":"1B8295F"},{"b":"400000","o":"17DE58E"},{"b":"400000","o":"1700404"},{"b":"400000","o":"16FD2A0"},{"b":"400000","o":"1609505"},{"b":"400000","o":"1478E97"},{"b":"400000","o":"1463EE4"},{"b":"400000","o":"14613D6"},{"b":"400000","o":"10665F9"},{"b":"400000","o":"106E615"},{"b":"400000","o":"104158F"},{"b":"400000","o":"18AE3C8"},{"b":"400000","o":"1B7DE88"},{"b":"400000","o":"1B7C138"},{"b":"400000","o":"1B7CAD9"},{"b":"400000","o":"19F9141"},{"b":"400000","o":"1081A86"},{"b":"400000","o":"1081B29"},{"b":"7F0EBEA79000","o":"20040"},{"b":"400000","o":"102B429"}],"processInfo":{ "mongodbVersion" : "2.8.0-rc5-pre-", "gitVersion" : "cb175f23188f62ba7be0cd9cbce425fdac68bef3", "uname" : { "sysname" : "Linux", "release" : "3.17.6-1-ARCH", "version" : "#1 SMP PREEMPT Sun Dec 7 23:43:32 UTC 2014", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "E7C2FF24B36F99EBA2ACFE9EF8DD1EA718908710" }, { "b" : "7FFFA1FFE000", "path" : "linux-vdso.so.1", "elfType" : 3, "buildId" : "4D43D06EE97DD5E666C76D8E5977ABF8AA35D800" }, { "b" : "7F0EBFA52000", "path" : "/usr/lib/libpthread.so.0", "elfType" : 3, "buildId" : "FC55D81870B38ADB7DBDCC2E41E2819EE5265D2D" }, { "b" : "7F0EBF84A000", "path" : "/usr/lib/librt.so.1", "elfType" : 3, "buildId" : "BA2B76DDEF6333D758B5B2776079BD77E664A608" }, { "b" : "7F0EBF646000", "path" : "/usr/lib/libdl.so.2", "elfType" : 3, "buildId" : "BDC708C88030FC7EB211A0ED22DC23B3C48AF65E" }, { "b" : "7F0EBF337000", "path" : "/usr/lib/libstdc++.so.6", "elfType" : 3, "buildId" : "3DECABE54205531E6CD4DDFF5C8A801C89375A1A" }, { "b" : "7F0EBF032000", "path" : "/usr/lib/libm.so.6", "elfType" : 3, "buildId" : "8A7F63AB878F00D5E528111D84CD6B298592ABA6" }, { "b" : "7F0EBEE1C000", "path" : "/usr/lib/libgcc_s.so.1", "elfType" : 3, "buildId" : "43D85A6FA21B090B3C494BC56EF677B13ECCF5EA" }, { "b" : "7F0EBEA79000", "path" : "/usr/lib/libc.so.6", "elfType" : 3, "buildId" : "F53B8AD377A1988DCF6329BBDFA7B1201431656E" }, { "b" : "7F0EBFC6E000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "C50DE0E139140A77A886D53046B76869279C7F71" } ] }}
 dbtest(_ZN5mongo15printStackTraceERSo+0x27) [0x1fe45c2]
 dbtest(_ZN5mongo10logContextEPKc+0x74) [0x1f96a98]
 dbtest(_ZN5mongo15invariantFailedEPKcS1_j+0xC1) [0x1f8295f]
 dbtest(_ZN5mongo7getMoreEPNS_16OperationContextEPKcixRNS_5CurOpEiRbPbb+0x452) [0x1bde58e]
 dbtest(_ZN5mongo15receivedGetMoreEPNS_16OperationContextERNS_10DbResponseERNS_7MessageERNS_5CurOpEb+0x3F5) [0x1b00404]
 dbtest(_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortEb+0x4B2) [0x1afd2a0]
 dbtest(_ZN5mongo14DBDirectClient4callERNS_7MessageES2_bPSs+0xB7) [0x1a09505]
 dbtest(_ZN5mongo14DBClientCursor4initEv+0xDD) [0x1878e97]
 dbtest(_ZN5mongo12DBClientBase7getMoreERKSsxii+0x76) [0x1863ee4]
 dbtest(_ZN5mongo20DBClientWithCommands18getCollectionInfosERKSsRKNS_7BSONObjE+0x45A) [0x18613d6]
 dbtest(_ZN11ClientTests10MultiBatch3runEv+0x1D5) [0x14665f9]
 dbtest(_ZN5mongo8unittest5Suite13runTestObjectIN11ClientTests10MultiBatchEEEvv+0x21) [0x146e615]
 dbtest(_ZNSt17_Function_handlerIFvvEPS0_E9_M_invokeERKSt9_Any_data+0x1D) [0x144158f]
 dbtest(_ZNKSt8functionIFvvEEclEv+0x32) [0x1cae3c8]
 dbtest(_ZNK5mongo8unittest10TestHolder3runEv+0x1C) [0x1f7de88]
 dbtest(_ZN5mongo8unittest5Suite3runERKSsi+0x52A) [0x1f7c138]
 dbtest(_ZN5mongo8unittest5Suite3runERKSt6vectorISsSaISsEERKSsi+0x341) [0x1f7cad9]
 dbtest(_ZN5mongo7dbtests10runDbTestsEiPPc+0xD2) [0x1df9141]
 dbtest(_Z11dbtestsMainiPPcS0_+0x138) [0x1481a86]
 dbtest(main+0x28) [0x1481b29]
 libc.so.6(__libc_start_main+0xF0) [0x7f0ebea99040]
 dbtest(+0x102B429) [0x142b429]
-----  END BACKTRACE  -----
2015-01-09T11:29:20.772-0500 I -        [testsuite] 
 
***aborting after invariant() failure

At the end of running listCollections/listIndexes we have the following code which effectively creates a new recovery unit for the txn:

cursor->setOwnedRecoveryUnit(txn->releaseRecoveryUnit());
storageEngine* storageEngine = getGlobalEnvironment()->getGlobalStorageEngine();
txn->setRecoveryUnit(storageEngine->newRecoveryUnit());

Then in find.cpp we check if the transaction recovery unit is the same as the clientcontext's recovery unit, which is no longer true:

// Restore the RecoveryUnit if we need to.
if (fromDBDirectClient) {
    if (cc->hasRecoveryUnit())
        invariant(txn->recoveryUnit() == cc->getUnownedRecoveryUnit());

Rassi and I discovered that a direct client should re-use the same recovery context:

if (fromDBDirectClient) {
    cc->setUnownedRecoveryUnit(txn->recoveryUnit());
    ...
}



 Comments   
Comment by Githook User [ 15/Jan/15 ]

Author:

{u'username': u'jrassi', u'name': u'Jason Rassi', u'email': u'rassi@10gen.com'}

Message: SERVER-16795 listCollections/listIndexes don't stash recovery unit
Branch: master
https://github.com/mongodb/mongo/commit/550f8d64315d9b1dc84ddfef44cc1735d680b60c

Generated at Thu Feb 08 03:42:17 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.